Time |
Nickname |
Message |
00:02
🔗
|
|
etudier has quit IRC (Quit: My MacBook has gone to sleep. ZZZzzz…) |
00:41
🔗
|
|
BlueMaxim has joined #archiveteam-bs |
01:22
🔗
|
|
Soni has joined #archiveteam-bs |
02:40
🔗
|
|
dashcloud has quit IRC (Read error: Operation timed out) |
02:41
🔗
|
|
dashcloud has joined #archiveteam-bs |
02:45
🔗
|
astrid |
if IA doesn't have a copy of libgen then i'll eat my hat |
02:46
🔗
|
astrid |
if IA has failed to snag a copy of libgen then we should really reconsider our life choices |
02:51
🔗
|
jrwr |
libgen? |
03:02
🔗
|
astrid |
there was some discussion few hours ago |
03:04
🔗
|
hook54321 |
jrwr: libgen = library genisis |
03:04
🔗
|
SketchCow |
Woooo |
03:06
🔗
|
hook54321 |
Pretty sure this is the official site: http://gen.lib.rus.ec/ |
03:06
🔗
|
hook54321 |
However this is the site listed on wikipedia: https://libgen.pw/ |
03:06
🔗
|
hook54321 |
oh wait, that's one of them. DuckDuckGo was only showing one. |
03:08
🔗
|
hook54321 |
On a side note, whenever we email site owners asking them to cooperate with us, I recommend that we send it to another email address first to see if it ends up in spam. That happened with the owner of imgh.us. |
03:12
🔗
|
second |
Did you guys archive the-eye.eu? |
03:12
🔗
|
second |
It has a lot of data though... |
03:21
🔗
|
hook54321 |
We did not |
03:23
🔗
|
second |
Are there plans to do so? |
03:24
🔗
|
hook54321 |
About how much space does it take up? |
03:25
🔗
|
second |
Well just the MSDN dump is 2.7TB |
03:25
🔗
|
hook54321 |
eh |
03:25
🔗
|
second |
Another dump of comics from whenever (pretty much all the major studios) is about 3TB |
03:25
🔗
|
second |
Rom collection, not sure, pretty big I assume |
03:25
🔗
|
second |
Then there is the reddit rips they have |
03:25
🔗
|
hook54321 |
There is much stopping someone from uploading it to archive.org, maybe a mirror. |
03:25
🔗
|
hook54321 |
*isn't |
03:26
🔗
|
second |
What happens if I upload stuff, would the archive just delete it? |
03:26
🔗
|
second |
I think it should be archived but you'll need to wait til the copyright expires :/ |
03:27
🔗
|
mundus |
second, I have a copy of it |
03:27
🔗
|
mundus |
It's onyl like 8TB |
03:27
🔗
|
mundus |
But most is not legal content |
03:27
🔗
|
second |
yes |
03:27
🔗
|
mundus |
and if it was going to be mirrored, archivist would do it |
03:27
🔗
|
second |
Only |
03:27
🔗
|
second |
archivist is the one who owns it |
03:28
🔗
|
hook54321 |
Disclaimer: Most of us are not employed by archive.org. |
03:28
🔗
|
hook54321 |
From what I've heard however, they wait until a copyright holder sends them a notice. |
03:28
🔗
|
mundus |
Yeah, if he wanted it on IA it would be on IA |
03:28
🔗
|
astrid |
we don't talk about copyright in here, folks |
03:28
🔗
|
astrid |
take it to #scared-shitless |
03:28
🔗
|
hook54321 |
we don't? |
03:28
🔗
|
Frogging |
haha |
03:28
🔗
|
astrid |
or maybe /r/legaladvice |
03:28
🔗
|
second |
#scared-shitless: Nick/channel is temporarily unavailable |
03:29
🔗
|
astrid |
okay what's that tell you |
03:29
🔗
|
hook54321 |
That there was a netsplit recently |
03:31
🔗
|
hook54321 |
I searched for the word "copyright" in the logs: 475 matches in 213 files |
03:34
🔗
|
hook54321 |
Lots of the stuff in the-eye appears to be porn. Still doesn't stop someone from attempting to upload it though. |
03:39
🔗
|
second |
Didn't know that |
03:39
🔗
|
second |
How is it "only" 8TB? |
03:44
🔗
|
|
arkhive has joined #archiveteam-bs |
04:25
🔗
|
|
pizzaiolo has quit IRC (Quit: pizzaiolo) |
04:42
🔗
|
|
kim__ has quit IRC (Ping timeout: 246 seconds) |
04:46
🔗
|
|
Sk1d has quit IRC (Ping timeout: 250 seconds) |
04:53
🔗
|
|
Sk1d has joined #archiveteam-bs |
04:54
🔗
|
Fletcher| |
Worth noting that IA standard procedure seems to be to dark an item instead of deleting it when a copyright claim is received |
05:03
🔗
|
jrwr |
Correct |
05:09
🔗
|
Somebody2 |
darking an item *may* mean that it's entirely gone, however. Or it may not. What it definitively means is that IA has ceased to *distribute* the item. |
05:36
🔗
|
|
Asparagir has quit IRC (Asparagir) |
06:15
🔗
|
|
Mateon1 has quit IRC (Remote host closed the connection) |
06:15
🔗
|
|
Mateon1 has joined #archiveteam-bs |
06:28
🔗
|
|
schbirid has joined #archiveteam-bs |
06:46
🔗
|
|
robink has quit IRC (Ping timeout: 246 seconds) |
06:46
🔗
|
|
robink has joined #archiveteam-bs |
06:49
🔗
|
schbirid |
i threw medium.com into wpull and it OOMd :D |
07:32
🔗
|
|
Honno has joined #archiveteam-bs |
08:06
🔗
|
|
Jonison has joined #archiveteam-bs |
08:24
🔗
|
|
BartoCH has joined #archiveteam-bs |
08:42
🔗
|
|
Mateon1 has quit IRC (Ping timeout: 260 seconds) |
08:42
🔗
|
|
Mateon1 has joined #archiveteam-bs |
09:02
🔗
|
|
Jonison has quit IRC (Read error: Connection reset by peer) |
09:31
🔗
|
|
icedice has joined #archiveteam-bs |
09:31
🔗
|
|
icedice has quit IRC (Remote host closed the connection) |
09:31
🔗
|
|
etudier has joined #archiveteam-bs |
09:39
🔗
|
|
Jonison has joined #archiveteam-bs |
10:18
🔗
|
|
etudier has quit IRC (Quit: My MacBook has gone to sleep. ZZZzzz…) |
10:34
🔗
|
|
etudier has joined #archiveteam-bs |
10:53
🔗
|
|
etudier has quit IRC (Quit: My MacBook has gone to sleep. ZZZzzz…) |
11:21
🔗
|
|
pizzaiolo has joined #archiveteam-bs |
11:32
🔗
|
|
dashcloud has quit IRC (Read error: Operation timed out) |
11:39
🔗
|
|
dashcloud has joined #archiveteam-bs |
11:51
🔗
|
|
mls has quit IRC (Ping timeout: 250 seconds) |
12:03
🔗
|
|
mls has joined #archiveteam-bs |
12:38
🔗
|
|
BlueMaxim has quit IRC (Quit: Leaving) |
12:46
🔗
|
|
etudier has joined #archiveteam-bs |
12:52
🔗
|
|
plue has quit IRC (Quit: WeeChat 1.5) |
13:01
🔗
|
|
Jonison has quit IRC (Ping timeout: 260 seconds) |
13:01
🔗
|
|
etudier has quit IRC (Quit: My MacBook has gone to sleep. ZZZzzz…) |
13:04
🔗
|
|
etudier has joined #archiveteam-bs |
13:09
🔗
|
|
mls has quit IRC (Ping timeout: 250 seconds) |
13:29
🔗
|
|
etudier has quit IRC (Quit: My MacBook has gone to sleep. ZZZzzz…) |
13:30
🔗
|
|
dashcloud has quit IRC (Read error: Operation timed out) |
13:40
🔗
|
SketchCow |
It is not gone if dark'd. |
13:44
🔗
|
|
mls has joined #archiveteam-bs |
13:45
🔗
|
|
etudier has joined #archiveteam-bs |
13:50
🔗
|
|
dashcloud has joined #archiveteam-bs |
13:59
🔗
|
|
etudier has quit IRC (Quit: My MacBook has gone to sleep. ZZZzzz…) |
14:10
🔗
|
|
etudier has joined #archiveteam-bs |
14:17
🔗
|
|
dd0a13f37 has joined #archiveteam-bs |
14:19
🔗
|
dd0a13f37 |
hook54321: The official site is libgen.io (or the IP, 94.something), gen.lib.rus.ec is an official mirror which only has the metadata |
14:19
🔗
|
dd0a13f37 |
hook54321: The official site is libgen.io (or the IP, 94.something), gen.lib.rus.ec is an official mirror which only has the metadata |
14:20
🔗
|
dd0a13f37 |
second: In technological trouble, yes, but the operators will be fine. They have good opsec, have been doing this for 20 years, and the only one who isn't anonymous is a literal fugitive. They all live in the former soviet union too, so copyright is not a big problem there. |
14:20
🔗
|
dd0a13f37 |
libgen.pw, b-ok, and bookza are unofficial mirrors. sci-hub is a sister project run by the aforemented fugitive using libgen as a storage backend and likely have backups too |
14:21
🔗
|
dd0a13f37 |
The torrents are also decently seeded from various residential russian IPs, and there are probably more who aren't seeding the torrents since it's storage-bound |
14:22
🔗
|
dd0a13f37 |
astrid: from what I can see, you're lacking a copy of sci-hub (aka sci-mag), which is really much more important than libgen |
14:25
🔗
|
JAA |
TIL someone thought it'd be a good idea to name a parasitic wasp after Elbakyan. |
14:25
🔗
|
dd0a13f37 |
bit rude |
14:26
🔗
|
JAA |
Yeah, that's what she said as well. |
14:26
🔗
|
JAA |
But regarding the urgency of backing up Sci-Hub: I thought it's just a frontend to libgen? What additional data is there on SciHub? |
14:27
🔗
|
dd0a13f37 |
well, I doubt they'll all be arrested at the same time since they're different projects |
14:27
🔗
|
dd0a13f37 |
It's not that simple |
14:27
🔗
|
dd0a13f37 |
sci-hub uses libgen as a backend |
14:27
🔗
|
dd0a13f37 |
they have tons of "donated" accounts, and they cycle through them |
14:27
🔗
|
dd0a13f37 |
and download articles |
14:27
🔗
|
dd0a13f37 |
Scihub's articles are not in the main libgen collection |
14:28
🔗
|
dd0a13f37 |
libgen is separated into sci-tech (libgen), comics, paintings, russian fiction ,foreign fiction, scimag |
14:28
🔗
|
JAA |
Oh |
14:28
🔗
|
dd0a13f37 |
only sci-tech (libgen) is backed up afaik |
14:28
🔗
|
dd0a13f37 |
maybe foreignfiction/rus fict too |
14:28
🔗
|
JAA |
Hm, I see. |
14:29
🔗
|
dd0a13f37 |
look at the library genesis forum if you're curious about how it works |
14:30
🔗
|
dd0a13f37 |
might be a good idea to use tor depending on where you live |
14:33
🔗
|
dd0a13f37 |
and the libgen collection on IA is not complete from what I can see, https://archive.org/details/gen-lib&tab=about was last updated in 2016 |
14:33
🔗
|
|
drumstick has quit IRC (Read error: Operation timed out) |
14:33
🔗
|
dd0a13f37 |
and the libgen collection on IA is not complete from what I can see, https://archive.org/details/gen-lib&tab=about was last updated in 2016 |
14:35
🔗
|
JAA |
Mar 2017 according to the graph, but keep in mind that this might not be the correct collection. |
14:36
🔗
|
dd0a13f37 |
That's the only one with any amount of activity |
14:36
🔗
|
dd0a13f37 |
Unless they store it under some other name |
14:36
🔗
|
dd0a13f37 |
or don't make it public at all |
14:37
🔗
|
|
Soni has quit IRC (Ping timeout: 250 seconds) |
14:45
🔗
|
DFJustin |
that's not how you see if things have been uploaded to a collection |
14:45
🔗
|
DFJustin |
there are items in that collection from 2 days ago |
14:45
🔗
|
dd0a13f37 |
How do you? |
14:46
🔗
|
DFJustin |
I don't know if there is a public way |
14:46
🔗
|
dd0a13f37 |
Can you see what the name is? Is it something like 2092000? |
14:46
🔗
|
DFJustin |
r_1727000 |
14:47
🔗
|
dd0a13f37 |
sci-hub can probably afford backups, they currently have 67 btc (USD $270k) in their bitcoin wallet, and their expenses are around "a few thousand" a month |
14:47
🔗
|
dd0a13f37 |
that's an official torrent from 17-Aug-2017 |
14:47
🔗
|
dd0a13f37 |
http://libgen.io/libgen/repository_torrent/ |
14:47
🔗
|
SketchCow |
Hey, remmeber the good times when I'd be able to answer Internet Archive questions helpfully |
14:47
🔗
|
SketchCow |
Before edsu implied that the Internet Archive banned him? |
14:47
🔗
|
SketchCow |
Those were good times. |
14:48
🔗
|
SketchCow |
How's that #internetarchive channel doing, anyway, now that I can't go in there |
14:48
🔗
|
dd0a13f37 |
Would it be possible for you to add the later ones? It's as easy as downloading http://libgen.io/libgen/repository_torrent/r0-2092.ZIP and the last few ones, then deriving if I understand correctly |
14:48
🔗
|
SketchCow |
Oh, and why does edsu have op on #archiveteam again |
14:49
🔗
|
SketchCow |
When he wrote a whole essay with half-baked info about what the Internet Archive was going to do with robots.txt and got a wave of hatred? |
14:49
🔗
|
SketchCow |
Not that I'm going to jeopardize my job and ban him, or anything |
14:49
🔗
|
dd0a13f37 |
r_2093000-r_2105000 from the site I linked |
14:49
🔗
|
dd0a13f37 |
Quick summary? |
14:49
🔗
|
SketchCow |
Good times, good times |
14:50
🔗
|
SketchCow |
Heyyyyyy the Ted Nelson scans are going beautifully, and the CD-ROM scanning has a faster workflow |
14:51
🔗
|
SketchCow |
I paid $40 for a program that does nothing but crop |
14:51
🔗
|
SketchCow |
But it crops well! |
14:51
🔗
|
Frogging |
do one thing and do it well |
14:51
🔗
|
Frogging |
:) |
14:52
🔗
|
SketchCow |
This does the one thing very well. |
14:53
🔗
|
SketchCow |
It's called "Batchcrop" |
14:54
🔗
|
SketchCow |
I can say "OK, for the big pile of TIFFs I just scanned... crop away all the white part of the scan, with a X amount of pixels in all directions around the "content", and save it." |
14:54
🔗
|
|
pizzaiolo has quit IRC (Quit: pizzaiolo) |
14:54
🔗
|
SketchCow |
So basically, I can just keep shoving CDs into my scanner, one at a time, and just scan them each into a directory. |
14:55
🔗
|
|
pizzaiolo has joined #archiveteam-bs |
14:55
🔗
|
SketchCow |
The longest part now is typing in names for the scans so they either match CD-ROMs I put up on archive with no scan, or match to rips I just did of same. |
14:55
🔗
|
dd0a13f37 |
Can't most image processing programs do that? Or does it have a sophisticated white space detection algo? |
14:55
🔗
|
SketchCow |
I lent a guy some CDs to do this... 2 years ago |
14:55
🔗
|
SketchCow |
He sheepishly brought the bin back to me last week. |
14:56
🔗
|
SketchCow |
I scanned and cropped all 86 in about 1.5 hours |
14:56
🔗
|
SketchCow |
I'm sure all image processing programs do something. |
14:56
🔗
|
SketchCow |
They are many like it but this one is mine |
14:59
🔗
|
dd0a13f37 |
For the really hostile sites, what about using commerccial proxy providers? I read about LJ on the wiki and they were apparently blacklisting your IPs |
15:01
🔗
|
dd0a13f37 |
>For this project, set it to 1, beacuse LiveJournal tends to ban scrapers! |
15:10
🔗
|
dd0a13f37 |
>Since 2015, Sci-Hub has operated its own repository , distinct from LibGen |
15:11
🔗
|
dd0a13f37 |
If this is true (which I'm not sure of), then that might be why LG sci-mag torrents are unavailable |
15:16
🔗
|
DFJustin |
<dd0a13f37> Would it be possible for you to add the later ones? |
15:16
🔗
|
DFJustin |
obviously somebody is actively working on it so there's no point in recruiting somebody else |
15:18
🔗
|
dd0a13f37 |
All you have to do is upload torrent and derive |
15:18
🔗
|
DFJustin |
put that energy into archiving something that isn't famous |
15:18
🔗
|
dd0a13f37 |
and it's not actively maintained as far as I udnerstand |
15:18
🔗
|
dd0a13f37 |
I am, I'm waiting on some email responses currently |
15:42
🔗
|
|
Mateon1 has quit IRC (Quit: Mateon1) |
15:43
🔗
|
|
Mateon1 has joined #archiveteam-bs |
16:21
🔗
|
schbirid |
http://www.bbc.com/news/uk-england-wiltshire-41267378 |
16:44
🔗
|
|
odemg is now known as xbinwank |
16:46
🔗
|
|
Honno has quit IRC (Read error: Operation timed out) |
16:47
🔗
|
|
xbinwank is now known as odemg |
16:48
🔗
|
dd0a13f37 |
Anyone here speak/understand korean? |
17:31
🔗
|
|
Asparagir has joined #archiveteam-bs |
17:31
🔗
|
|
svchfoo3 sets mode: +o Asparagir |
17:31
🔗
|
|
svchfoo1 sets mode: +o Asparagir |
17:39
🔗
|
|
kristian_ has joined #archiveteam-bs |
17:50
🔗
|
|
dashcloud has quit IRC (Read error: Operation timed out) |
17:50
🔗
|
|
dashcloud has joined #archiveteam-bs |
17:54
🔗
|
dd0a13f37 |
www.korean-books.com.kp/en/packages/xnps/download.pg.php?419 change "en" to "ko en fr sp de ru ch ja ar" to taste and 430 to any number <= 430 |
17:54
🔗
|
dd0a13f37 |
What's the proper way to archive something like this? Do you need WARC for what's just a GET request returning a file? |
17:55
🔗
|
astrid |
the proper way is to gin up a list of urls and submit to archivebot with !ao < http://url/yourlist.txt |
17:56
🔗
|
dd0a13f37 |
Thanks! |
17:57
🔗
|
astrid |
then you can download the warc when the job is done and extract everything from it, if you're so inclined :) |
17:58
🔗
|
dd0a13f37 |
Okay, so what's the proper way when there's also metadata in XML and thumbnails? Parse separately or make script to rename them to their "real" names? |
17:59
🔗
|
astrid |
hm? |
18:00
🔗
|
dd0a13f37 |
They're named like 00000412.pdf |
18:00
🔗
|
dd0a13f37 |
But they have names |
18:00
🔗
|
dd0a13f37 |
one second, site takes a bit to load |
18:01
🔗
|
dd0a13f37 |
They have names like "UNDERSTANDING KOREA (9) (HUMAN RIGHTS)" |
18:01
🔗
|
dd0a13f37 |
Also metadata |
18:01
🔗
|
dd0a13f37 |
"- Book on Common Sense -" |
18:01
🔗
|
dd0a13f37 |
"Foreign Languages Publishing House" |
18:01
🔗
|
dd0a13f37 |
"87 pp" |
18:02
🔗
|
dd0a13f37 |
and an image |
18:02
🔗
|
dd0a13f37 |
This won't be saved if you just have them archive a link list |
18:02
🔗
|
DFJustin |
if you're motivated / have skills, the best way would probably be to upload each pdf as a separate IA book item with metadata |
18:02
🔗
|
astrid |
well i'd add the xml files to the link list then |
18:02
🔗
|
|
fie has quit IRC (Read error: Operation timed out) |
18:03
🔗
|
dd0a13f37 |
It's not XML, you issue a POST request and get an entire page as HTML |
18:07
🔗
|
dd0a13f37 |
So you'd have to parse it |
18:07
🔗
|
dd0a13f37 |
I have neither the skills and there are a few thousand |
18:12
🔗
|
astrid |
ohh |
18:12
🔗
|
dd0a13f37 |
https://pastebin.com/parEbjPK this is what it looks like |
18:13
🔗
|
dd0a13f37 |
after parsing |
18:13
🔗
|
dd0a13f37 |
you send a base64 encoded json dict |
18:13
🔗
|
dd0a13f37 |
and get back a json dict |
18:13
🔗
|
dd0a13f37 |
containing the page html |
18:13
🔗
|
dd0a13f37 |
and it's escaped with backslashes two or three times |
18:14
🔗
|
astrid |
that sounds like a delight |
18:14
🔗
|
dd0a13f37 |
check out their homemade CMS |
18:14
🔗
|
dd0a13f37 |
It's stateful, you set which language you want, it saves it server-side |
18:15
🔗
|
|
ReimuHaku has quit IRC (Ping timeout: 250 seconds) |
18:18
🔗
|
dd0a13f37 |
!ao < https://my.mixtape.moe/nsrkrj.txt |
18:18
🔗
|
dd0a13f37 |
like this? |
18:19
🔗
|
astrid |
you need http:// at the front of your urls |
18:19
🔗
|
astrid |
or https:// |
18:19
🔗
|
astrid |
or ftp:// |
18:19
🔗
|
astrid |
depending |
18:20
🔗
|
dd0a13f37 |
thanks |
18:21
🔗
|
dd0a13f37 |
!ao < https://my.mixtape.moe/tktryb.txt |
18:21
🔗
|
astrid |
these uh |
18:21
🔗
|
astrid |
aren't exactly pdfs |
18:22
🔗
|
dd0a13f37 |
They are |
18:22
🔗
|
dd0a13f37 |
Or does not handle content disposition? |
18:22
🔗
|
astrid |
they're pdfs with a sql statement at the front ??? |
18:22
🔗
|
dd0a13f37 |
I can open them just fine |
18:22
🔗
|
astrid |
hm |
18:23
🔗
|
astrid |
maybe pdf doesn't mind about that |
18:23
🔗
|
|
ReimuHaku has joined #archiveteam-bs |
18:24
🔗
|
astrid |
they seem to all start with |
18:24
🔗
|
dd0a13f37 |
oh yeah I see |
18:24
🔗
|
astrid |
Update PublicationList_ko Set pVisitCount="2" Where pId=127%PDF-1.4 |
18:24
🔗
|
astrid |
it's uh |
18:24
🔗
|
dd0a13f37 |
sqli |
18:24
🔗
|
astrid |
nice job folks |
18:24
🔗
|
dd0a13f37 |
There are numerous other vulnerabilities too |
18:25
🔗
|
astrid |
figures |
18:25
🔗
|
dd0a13f37 |
There's an undocumented way to register an account on KCNA |
18:25
🔗
|
astrid |
okay, well, go ahead and submit that job in #archivebot |
18:25
🔗
|
dd0a13f37 |
which appears to do nothing |
18:25
🔗
|
dd0a13f37 |
but it actually registers you |
18:25
🔗
|
astrid |
lol |
18:25
🔗
|
dd0a13f37 |
and you can log in |
18:25
🔗
|
dd0a13f37 |
and the only thing it does |
18:25
🔗
|
dd0a13f37 |
is add some tracking code |
18:25
🔗
|
dd0a13f37 |
you don't even show up as logged in |
18:25
🔗
|
dd0a13f37 |
there is also a random zip file serving malware |
18:44
🔗
|
dd0a13f37 |
Well, I can't get it to work. Any pointers? It needs a timeout of maybe 5 minutes for the first request, then some IP whitelisting or something happens |
18:44
🔗
|
dd0a13f37 |
So just forcing IA to do a request would be fine |
18:44
🔗
|
astrid |
IA doesn't run archivebot :) |
18:45
🔗
|
dd0a13f37 |
Does it use IA IPs? |
18:45
🔗
|
astrid |
no |
18:45
🔗
|
astrid |
we run archivebot |
18:45
🔗
|
dd0a13f37 |
Does it share an IP with anything else? |
18:45
🔗
|
astrid |
it's a bunch of machines, run by several people in this channel |
18:45
🔗
|
astrid |
generally they have dedicated IPs, but multiple grabbers run per host |
18:46
🔗
|
dd0a13f37 |
Do you run one of them? Can you force it to use a certain machine? |
18:46
🔗
|
astrid |
yes and yes |
18:46
🔗
|
dd0a13f37 |
Do you have SSH access/similar? |
18:46
🔗
|
astrid |
i wasn't getting that whitelisting effect, btw |
18:46
🔗
|
astrid |
it may be that you've got browser keepalives going on |
18:47
🔗
|
dd0a13f37 |
Nope, they do connection:close |
18:48
🔗
|
dd0a13f37 |
I might be mistaking it for something else, but wget takes a long time (minutes) if it even does it |
18:48
🔗
|
dd0a13f37 |
and ff is instant |
18:49
🔗
|
astrid |
archivebot is more similar to wget than to firefox |
18:49
🔗
|
dd0a13f37 |
yeah |
18:49
🔗
|
dd0a13f37 |
oh, apparently I have a phpsessid |
18:49
🔗
|
dd0a13f37 |
well that explains it |
18:50
🔗
|
dd0a13f37 |
"Apache/2.2.15 (RedStar 3.0)", how does it even work |
18:50
🔗
|
dd0a13f37 |
Does it just randomly time out requests? |
18:51
🔗
|
dd0a13f37 |
I managed to get one with wget now, connecting took 20 seconds and downloading 2m:20s (at 15 kbit) |
19:01
🔗
|
|
zyphlar has joined #archiveteam-bs |
19:09
🔗
|
dd0a13f37 |
Well, I can't wrap my head around north korean web magic |
19:10
🔗
|
|
dd0a13f37 has left |
19:48
🔗
|
superkuh |
https://www.eff.org/deeplinks/2017/09/open-letter-w3c-director-ceo-team-and-membership "Effective today, EFF is resigning from the W3C." |
19:49
🔗
|
astrid |
o_O |
19:50
🔗
|
JAA |
Wow |
19:51
🔗
|
JAA |
Ah, the DRM bullshit, right. |
20:11
🔗
|
|
schbirid has quit IRC (Quit: Leaving) |
20:20
🔗
|
hook54321 |
holy crap |
20:22
🔗
|
hook54321 |
I imagine this event will be a bit different now. https://twitter.com/internetarchive/status/909868291249684480 |
20:27
🔗
|
|
kim_ has joined #archiveteam-bs |
20:48
🔗
|
|
Dark_Star has quit IRC (Remote host closed the connection) |
21:11
🔗
|
|
zyphlar has quit IRC (Quit: Connection closed for inactivity) |
21:29
🔗
|
|
Darkstar has joined #archiveteam-bs |
21:43
🔗
|
|
noirscape has quit IRC (Read error: Operation timed out) |
21:43
🔗
|
|
zino has quit IRC (Quit: Leaving) |
21:46
🔗
|
hook54321 |
https://www.youtube.com/watch?v=h94ZKGVg-B8 |
21:46
🔗
|
hook54321 |
I think we should post something about this on the ArchiveTeam twitter account. |
21:56
🔗
|
godane |
who wants to start building rpi librarybox boxies? |
21:59
🔗
|
|
BlueMaxim has joined #archiveteam-bs |
22:00
🔗
|
|
balrog has quit IRC (Read error: Operation timed out) |
22:00
🔗
|
|
JAA has quit IRC (Read error: Operation timed out) |
22:00
🔗
|
|
C4K3 has quit IRC (Read error: Operation timed out) |
22:00
🔗
|
|
ruunyan has quit IRC (Read error: Operation timed out) |
22:00
🔗
|
|
squires has quit IRC (Read error: Operation timed out) |
22:00
🔗
|
|
ZexaronS has quit IRC (Read error: Operation timed out) |
22:01
🔗
|
|
rocode has quit IRC (Read error: Operation timed out) |
22:01
🔗
|
|
ZexaronS has joined #archiveteam-bs |
22:02
🔗
|
|
JAA has joined #archiveteam-bs |
22:02
🔗
|
|
swebb sets mode: +o JAA |
22:02
🔗
|
|
wp494 has quit IRC (Read error: Operation timed out) |
22:02
🔗
|
|
squires has joined #archiveteam-bs |
22:02
🔗
|
|
balrog has joined #archiveteam-bs |
22:02
🔗
|
|
swebb sets mode: +o balrog |
22:03
🔗
|
|
REiN^ has quit IRC (Write error: Broken pipe) |
22:03
🔗
|
|
wp494 has joined #archiveteam-bs |
22:03
🔗
|
|
PotcFdk has quit IRC (Write error: Broken pipe) |
22:04
🔗
|
|
ruunyan has joined #archiveteam-bs |
22:05
🔗
|
|
REiN^ has joined #archiveteam-bs |
22:05
🔗
|
|
tfgbd_znc has quit IRC (Ping timeout: 600 seconds) |
22:06
🔗
|
|
tfgbd_znc has joined #archiveteam-bs |
22:06
🔗
|
|
rocode has joined #archiveteam-bs |
22:07
🔗
|
|
drumstick has joined #archiveteam-bs |
22:07
🔗
|
|
C4K3 has joined #archiveteam-bs |
22:11
🔗
|
|
PotcFdk has joined #archiveteam-bs |
22:15
🔗
|
|
wabu has quit IRC (Ping timeout: 246 seconds) |
22:20
🔗
|
|
ola_norsk has joined #archiveteam-bs |
22:21
🔗
|
hook54321 |
godane: What are those? |
22:21
🔗
|
ola_norsk |
is posting links possible? |
22:21
🔗
|
astrid |
yes definitely |
22:21
🔗
|
ola_norsk |
ok, one sec |
22:22
🔗
|
ola_norsk |
https://pbs.twimg.com/media/DKCW8SnWkAIgpqn.jpg:large |
22:22
🔗
|
ola_norsk |
that is the result of the attempt |
22:22
🔗
|
ola_norsk |
but, let me get the url to the tweet status, so you dont need to retype it from image |
22:23
🔗
|
ola_norsk |
https://twitter.com/JeffHollandaise/status/897970096429084672 |
22:24
🔗
|
astrid |
hm, for some reason twitter has decided that you're coming from germany |
22:24
🔗
|
ola_norsk |
I can view this url..but can't archive it. When i try, i get german twitter |
22:24
🔗
|
astrid |
i'm not sure how it decides that |
22:24
🔗
|
astrid |
probably the source IP that archive.org is using looks like a german IP |
22:25
🔗
|
ola_norsk |
yes, it's not me |
22:25
🔗
|
astrid |
you are more than welcome to join #archivebot and do |
22:25
🔗
|
astrid |
!ao https://twitter.com/JeffHollandaise/status/897970096429084672 --ignore-sets=twitter |
22:25
🔗
|
astrid |
er, also the --phantomjs option |
22:25
🔗
|
ola_norsk |
ill check it out. ty |
22:26
🔗
|
|
wabu has joined #archiveteam-bs |
22:26
🔗
|
ola_norsk |
but, i have to ask..what difference would it really make? |
22:26
🔗
|
astrid |
archivebot is run by us, and i haven't seen any german redirects affecting it |
22:26
🔗
|
astrid |
(archiveteam is not the same as archive.org, we have completely different infrastructure) |
22:27
🔗
|
ola_norsk |
i mean looking like a german IP, would there be any difference in it working or not? |
22:27
🔗
|
astrid |
oh |
22:27
🔗
|
astrid |
uhhh, it shouldn't redirect |
22:27
🔗
|
astrid |
what do you want to happen exactly? |
22:27
🔗
|
astrid |
the --phantomjs option will pull in the css and images and javascript so it'll look and work correctly |
22:28
🔗
|
astrid |
doing it with archivebot will make sure it gets run from a jurisdiction where twitter won't screen out nazi imagery |
22:28
🔗
|
ola_norsk |
i would expect waybackmachine to archive like regular |
22:28
🔗
|
ola_norsk |
ok |
22:28
🔗
|
astrid |
wayback machine's liveweb feature usually works well but sometimes has some issues |
22:28
🔗
|
astrid |
twitter is a difficult website to archive |
22:29
🔗
|
ola_norsk |
i've had no problem so far i think |
22:29
🔗
|
astrid |
hm okay |
22:29
🔗
|
astrid |
maybe it's because the tweet has nazi imagery in it, i know they filter that sort of thing out in some places |
22:30
🔗
|
|
atluxity has quit IRC (Ping timeout: 506 seconds) |
22:30
🔗
|
ola_norsk |
so in german twitter, nazi imagiry (i havent looked close to see if there was any), is screened? |
22:30
🔗
|
astrid |
sometimes? |
22:30
🔗
|
astrid |
it's not clear |
22:30
🔗
|
astrid |
i mean there is some nazi/kkk shit in that tweet |
22:31
🔗
|
ola_norsk |
ok |
22:32
🔗
|
ola_norsk |
anyway, thanks for the help. I can't stand nazism myself, but this was really frustrating |
22:33
🔗
|
astrid |
i'm not a fan either ... |
22:33
🔗
|
astrid |
yeah |
22:35
🔗
|
astrid |
but yeah. #archivebot is a channel on this network where we operate an irc bot that lets you submit links for archival |
22:38
🔗
|
ola_norsk |
btw, i also tried to previously to archive my own https://pbs.twimg.com/media/DKCYoW-W4AAsH_T.jpg:large |
22:39
🔗
|
ola_norsk |
and i can't see how i'm pegged as a nazi |
22:40
🔗
|
Lagittaja |
well, looks like my home "server" build completes faster than I expected. scored a nice (imho) motherboard from the same seller I got the i3-2120 from. intel's dq67ow |
22:40
🔗
|
astrid |
ola_norsk: maybe hm maybe actually, that looks like archive.org's ip space has been blocked from using twitter without logging in |
22:41
🔗
|
|
bluesoul has quit IRC (Read error: Operation timed out) |
22:41
🔗
|
astrid |
:( |
22:41
🔗
|
Lagittaja |
haven't had much experience with Intel's boards in the past other than the DH77EB in my mother's HTPC which actually has been rock solid for the past 4+ years. and this thing was 32�, including shipping. not too shabby |
22:41
🔗
|
|
svchfoo1 has quit IRC (Remote host closed the connection) |
22:41
🔗
|
|
bluesoul has joined #archiveteam-bs |
22:41
🔗
|
astrid |
Lagittaja: i think that's completely offtopic for this channel |
22:42
🔗
|
|
svchfoo1 has joined #archiveteam-bs |
22:42
🔗
|
Lagittaja |
well sorry astrid, I have been having a conversation about this build with another person on this channel and I intend to use it to put more horse power for archiving. so sorry I'll see myself out |
22:43
🔗
|
|
Lagittaja has quit IRC (Quit: Leaving) |
22:43
🔗
|
|
svchfoo3 sets mode: +o svchfoo1 |
22:48
🔗
|
astrid |
ah, sorry, i didn't know |
22:54
🔗
|
|
kristian_ has quit IRC (Quit: Leaving) |
22:54
🔗
|
|
ola_norsk has left |
23:17
🔗
|
|
BartoCH has quit IRC (Quit: WeeChat 1.9) |
23:18
🔗
|
godane |
hook54321: i'm working on a project to add kiwix to slackwarearm 14.2 |
23:19
🔗
|
godane |
https://archive.org/details/slackwarearm-14.2-20170906-kiwix |
23:22
🔗
|
|
drumstick has quit IRC (Read error: Operation timed out) |
23:23
🔗
|
joepie91_ |
https://twitter.com/xor/status/909888462584795136 |
23:24
🔗
|
godane |
i now just need to write a script to mount /dev/sda2 and look for something like /mnt/data/kiwix for all the kiwix files |
23:24
🔗
|
godane |
i have another script to build the library.xml file in /mnt/data/kiwix folder |
23:25
🔗
|
godane |
then its kiwix --library $path/library.xml --port 8000 --daemon somthing |
23:33
🔗
|
|
Soni has joined #archiveteam-bs |
23:48
🔗
|
|
fie has joined #archiveteam-bs |