Time |
Nickname |
Message |
03:57
🔗
|
dashcloud |
hi folks, travis goodspeed would like if anyone send him paper copies or high-res scans of the Japanese, Brazilian, German, or Arabic (Jordan) variants of Byte Magazine? |
04:40
🔗
|
SketchCow |
Good luck on THAT, Travis |
05:48
🔗
|
yipdw |
y |
05:49
🔗
|
yipdw |
oops |
06:24
🔗
|
SketchCow |
http://devslovebacon.com/conferences/bacon-2014/talks/from-colo-to-yolo-confessions-of-the-angriest-archivist |
07:31
🔗
|
yipdw |
SketchCow: that reminds me -- is audio or video of your Build talk available anywhere? |
07:31
🔗
|
yipdw |
seems like the official videos are not up yet |
08:06
🔗
|
SketchCow |
No. |
08:06
🔗
|
SketchCow |
Not up yet. |
08:06
🔗
|
SketchCow |
No idea why they're taking so long, EXCEPT. |
08:06
🔗
|
SketchCow |
This is the last year, so he might be working on a deluxe version of the talk, with the new year as the chaser. |
08:07
🔗
|
asdfsadf |
there some place I can go to read historical b threads? |
08:07
🔗
|
namespace |
asdfsadf: /b/? |
08:08
🔗
|
asdfsadf |
good one |
08:08
🔗
|
namespace |
asdfsadf: It was a serious question. |
08:08
🔗
|
namespace |
What's a b thread? |
08:08
🔗
|
asdfsadf |
i mean like previous years |
08:08
🔗
|
asdfsadf |
oh yea |
08:08
🔗
|
namespace |
Oh okay you do mean /b/. |
08:09
🔗
|
namespace |
Uh. |
08:09
🔗
|
asdfsadf |
never mind |
08:09
🔗
|
asdfsadf |
i thought you meant the joke was all threads are historical haha |
08:09
🔗
|
namespace |
asdfsadf: :P |
08:09
🔗
|
namespace |
asdfsadf: Nah. I think there is but I don't remember where it is. |
08:44
🔗
|
Atluxity |
myopera.com is about to die and I have the list of all non-bannen users. |
08:44
🔗
|
Atluxity |
16 457 047 of them |
08:45
🔗
|
Atluxity |
https://docs.google.com/uc?id=0B8aRlPij6kNrTTdRNHdOdDcxWDA&export=download |
08:45
🔗
|
Atluxity |
I dont know how to take this information from the userlist to a project |
08:46
🔗
|
namespace |
Atluxity: You would need a way for us to use that list to grab the content with wget. |
08:46
🔗
|
namespace |
I don't know much about the process beyond that. |
08:46
🔗
|
namespace |
I think there was an engineering channel for archive team. |
08:54
🔗
|
midas |
I think there was a grab already up |
08:54
🔗
|
midas |
https://archive.org/details/files.myopera.com-initialgrab |
09:12
🔗
|
namespace |
midas: So then we just have to use the method we used last time to grab new data? |
09:17
🔗
|
midas |
probably yeah |
10:38
🔗
|
arkiver |
Atluxity: I will start grabbing some users now and see how fast I can get my speed |
10:44
🔗
|
Atluxity |
cool |
10:56
🔗
|
arkiver |
looks like there are no limits!!!! :D |
10:56
🔗
|
arkiver |
going very fast |
10:56
🔗
|
arkiver |
100 links per second or something like that |
10:57
🔗
|
arkiver |
Atluxity: does my opera also host videos? |
10:59
🔗
|
Atluxity |
I was told there was no limits, the guy just said "please be gentle". I dont know how gentle we can be if we are to get it all withing March 1st |
10:59
🔗
|
Atluxity |
I dont know if myopera hosts videos |
11:00
🔗
|
Atluxity |
it would supprise me |
11:00
🔗
|
Atluxity |
did the list of usernames help? |
11:00
🔗
|
arkiver |
yep |
11:00
🔗
|
arkiver |
running a test on chooseopera |
11:00
🔗
|
arkiver |
it looks like it's quite big |
11:00
🔗
|
arkiver |
so a good test |
11:05
🔗
|
joepie91 |
arkiver; what's the status on wallbase? |
11:05
🔗
|
arkiver |
joepie91: well they have limits |
11:06
🔗
|
joepie91 |
:(? |
11:06
🔗
|
arkiver |
so the crawl is limited but it's going fine |
11:06
🔗
|
arkiver |
around 40000 items per 12 hours now |
11:06
🔗
|
arkiver |
know* |
11:06
🔗
|
joepie91 |
how long do you expect it to take to grab everything including metadata? |
11:06
🔗
|
arkiver |
and they say they don't have plans to shut it |
11:06
🔗
|
joepie91 |
mhmm |
11:06
🔗
|
arkiver |
I'm first doing all the wallpaper pages with the wallpapers and stuff |
11:07
🔗
|
arkiver |
and then I'll start doing the other things like search terms and stuff |
11:07
🔗
|
arkiver |
(zoom works!!!!) |
11:08
🔗
|
arkiver |
doing around 63 pages per minute |
11:09
🔗
|
arkiver |
when is my opera going to shut down? |
11:09
🔗
|
arkiver |
^ Atluxity |
11:11
🔗
|
arkiver |
will upload the first warc of chooseopera when finished |
11:11
🔗
|
arkiver |
it is a lot bigger then the average blog |
11:11
🔗
|
arkiver |
but a good test to start with |
11:12
🔗
|
Atluxity |
arkiver: March 1st |
11:12
🔗
|
arkiver |
hmm march 1st |
11:12
🔗
|
arkiver |
thank you |
11:14
🔗
|
arkiver |
and do you know when my opera was created? |
11:15
🔗
|
Atluxity |
I dont have that information easily availible, do you want me to research it? |
11:17
🔗
|
arkiver |
got it |
11:17
🔗
|
arkiver |
http://en.wikipedia.org/wiki/My_Opera |
11:17
🔗
|
arkiver |
2001 |
11:17
🔗
|
arkiver |
well I need that since there are urls for a calendar |
11:17
🔗
|
arkiver |
that are going forever |
11:17
🔗
|
arkiver |
so even to the midages |
11:17
🔗
|
arkiver |
going to limit it to a time |
11:17
🔗
|
arkiver |
when it was created |
11:19
🔗
|
Atluxity |
a wise move |
11:20
🔗
|
Atluxity |
is this just something you have put on a server you have access to or can it be made to a warrior-project? |
11:23
🔗
|
arkiver |
I don't know how to create a warrior project |
11:23
🔗
|
arkiver |
you should ask chfoo about that |
11:24
🔗
|
arkiver |
I', just excluding everything with /archive/monthly/ now |
11:27
🔗
|
joepie91 |
PANIC |
11:27
🔗
|
joepie91 |
http://www.heise.de/open/meldung/BerliOS-Entwicklerplattform-macht-zu-2104211.html?wt_mc=rss.open.beitrag.atom |
11:27
🔗
|
joepie91 |
BerliOS shutting down April 30 |
11:36
🔗
|
joepie91 |
http://developer.berlios.de/forum/forum.php?forum_id=39220 |
11:40
🔗
|
ersi |
Havn't we grabbed berlios before? :o |
11:41
🔗
|
joepie91 |
ersi: the response I got in another channel was "oh, shutting down again?" so quite possibly |
11:42
🔗
|
joepie91 |
but figure a grab would be important regardless |
11:56
🔗
|
arkiver |
chfoo: can we create a warrior project from my opera? |
12:11
🔗
|
ersi |
joepie91: Indeed |
12:33
🔗
|
midas |
for some sites i almost start to think "shouldnt you be dead yet?" |
12:37
🔗
|
Nemo_bis |
Yes, that's what the Yahoo! CEO thinks of every website loaded in their browser. |
12:38
🔗
|
midas |
true story |
12:38
🔗
|
joepie91 |
midas: haha |
12:38
🔗
|
joepie91 |
"euhm.. why is this still around?" |
12:39
🔗
|
midas |
yeah, the last time berliOS said they would shutdown was like 2 years ago? |
12:40
🔗
|
midas |
2011 |
13:49
🔗
|
arkiver |
hmm |
13:49
🔗
|
arkiver |
I can try to get a script running here to download most of the channels |
13:49
🔗
|
arkiver |
chooseopera is downloaded |
13:49
🔗
|
arkiver |
1,4 GB |
13:50
🔗
|
arkiver |
took a ew minutes only |
13:50
🔗
|
arkiver |
and i think that's one of the biggest accounts... |
13:52
🔗
|
arkiver |
and 45639 urls |
14:01
🔗
|
arkiver |
working on batch script for WarcMiddleware |
14:12
🔗
|
arkiver |
wokring for me... :D |
14:12
🔗
|
arkiver |
testing with the first 45 accounts from the list |
14:18
🔗
|
arkiver |
45 accounts done |
14:18
🔗
|
arkiver |
going to do a crawl of the first 1 million accounts now |
14:21
🔗
|
arkiver |
100.000 accounts* |
14:26
🔗
|
ersi |
okay godane2 |
14:34
🔗
|
arkiver |
doing a test on the first 100.000 accounts |
14:34
🔗
|
arkiver |
if that works, I'll put all of the millions of accounts in it |
14:34
🔗
|
arkiver |
and it should be going then |
14:42
🔗
|
arkiver |
but I'm not sure if I can make it before the deadliner |
14:42
🔗
|
arkiver |
even if it's going this fast |
14:42
🔗
|
arkiver |
so |
14:42
🔗
|
arkiver |
the best thing is to plit it up between people I think |
14:49
🔗
|
arkiver |
I can try to make it go even faster... |
14:51
🔗
|
arkiver |
will try that tonight or tomorrow |
14:51
🔗
|
arkiver |
shall I upload some warc.gz examples to show that the warc's work? |
14:53
🔗
|
arkiver |
according to what I see I should be able to run multiple sessions |
14:53
🔗
|
arkiver |
need 17 sessions then |
14:53
🔗
|
arkiver |
will try that |
15:14
🔗
|
arkiver |
going to start 30 sessions tomorrow |
15:14
🔗
|
arkiver |
to even have sopme days left before the deadline to be sure everything is there |
15:14
🔗
|
arkiver |
will keep you guys informed every day about the progress |
15:14
🔗
|
arkiver |
I'm also still doing wallbase.cc BTW |
15:18
🔗
|
midas |
nice arkiver |
15:18
🔗
|
midas |
if you need any help, let me know |
15:18
🔗
|
arkiver |
thank you midas |
15:18
🔗
|
arkiver |
I'll keep that in mind!! :D |
15:18
🔗
|
midas |
:p |
15:18
🔗
|
arkiver |
tonight I will uplaod some warc's for people to see |
15:18
🔗
|
arkiver |
buuuuuut |
15:19
🔗
|
arkiver |
someone does need to do the forums |
15:19
🔗
|
arkiver |
and the things beside the accounts |
15:20
🔗
|
midas |
my FTP grab just passed the 9TB... |
15:22
🔗
|
arkiver |
midas, wow that's a lot |
15:22
🔗
|
arkiver |
which ftp server? |
15:23
🔗
|
midas |
5 ftp's |
15:23
🔗
|
midas |
tp.tu-chemnitz.de ftp.uni-muenster.de gatekeeper.dec.com |
15:23
🔗
|
midas |
ftp.uni-erlangen.de ftp.warwick.ac.uk |
15:23
🔗
|
arkiver |
haha good job man! |
15:23
🔗
|
midas |
she's still going ;-) |
15:23
🔗
|
midas |
it's going to take me weeks to upload this |
15:27
🔗
|
arkiver |
lol |
15:27
🔗
|
arkiver |
know that |
15:27
🔗
|
arkiver |
download faster then upload... |
15:27
🔗
|
arkiver |
:/ |
15:27
🔗
|
arkiver |
horrible... |
15:30
🔗
|
midas |
yeah |
15:30
🔗
|
midas |
180Mbit down, 100mbit up |
15:31
🔗
|
midas |
still, 32TB/mnd should be doable at max speed, im guessing ill hit 20TB/mnd |
15:39
🔗
|
arkiver |
FYI: I need to exclude all accounts that have |, &, <, >, (, ), ^ and @ |
15:39
🔗
|
arkiver |
so if someone can search for the channels with those thigs in it and download them |
15:39
🔗
|
arkiver |
would be great |
15:39
🔗
|
arkiver |
since I can't do them here with this script |
15:50
🔗
|
Nemo_bis |
/mnd? you german, midas? :) |
15:50
🔗
|
midas |
dutch :p i mean month, /mo? |
15:55
🔗
|
arkiver |
midas: also dutch?? :D |
15:58
🔗
|
midas |
jup :p |
15:58
🔗
|
arkiver |
haha me too! :) |
16:05
🔗
|
midas |
lots of dutchies here |
16:06
🔗
|
ersi |
spoorwaggen |
16:06
🔗
|
ersi |
and shit yo |
16:13
🔗
|
arkiver |
spoortwaggen? |
16:13
🔗
|
arkiver |
spoorwaggen* |
16:13
🔗
|
arkiver |
you mean spoorwagen? :) |
16:14
🔗
|
arkiver |
downloading 1680 accounts per second |
16:14
🔗
|
arkiver |
per hour* |
16:14
🔗
|
ersi |
yeah, I did |
16:14
🔗
|
arkiver |
going to get that up tomorrow to around 50000-60000 |
17:35
🔗
|
chfoo |
anyone can create a project. i'm only writing the recent grab scripts, but someone else adds it master list of warrior projects. |
18:02
🔗
|
yipdw |
if there is no BerliOS channel, join #honeynutberlios |
18:08
🔗
|
SketchCow |
https://archive.org/details/businesscase now in 1.0 |
18:37
🔗
|
DFJustin |
switch back to browser, ctrl-t and start typing url, realize inputs are going into atari 800 visicalc #archiveteamproblems |
19:12
🔗
|
Tony_ |
hello |
19:15
🔗
|
joepie91 |
DFJustin: hahaha |
19:54
🔗
|
SketchCow |
Pretty much. |
20:21
🔗
|
arkiver |
chfoo: I'm already guite good going with that website. :D |
20:21
🔗
|
arkiver |
today just testing |
20:22
🔗
|
arkiver |
tomorrow I'll try to do up to 50000-60000 accounts per hour. |
20:30
🔗
|
chfoo |
sounds good |
20:33
🔗
|
arkiver |
:) |
20:33
🔗
|
arkiver |
well |
20:33
🔗
|
arkiver |
chfoo, this one has no limits, so it isnt't too hard... :) |
20:33
🔗
|
arkiver |
hehe, most account are just empty |
20:33
🔗
|
arkiver |
created and never sometyhing done with |
20:33
🔗
|
arkiver |
but |
20:33
🔗
|
arkiver |
chfoo |
20:34
🔗
|
arkiver |
I download many thousands of accounts now as a test |
20:34
🔗
|
arkiver |
and here it looks like they are working |
20:34
🔗
|
arkiver |
would you mind if I upload some warc's so that you can also test them? |
20:34
🔗
|
arkiver |
(just in case) |
20:35
🔗
|
chfoo |
arkiver: sure, maybe a few others can take a look at them too |
20:35
🔗
|
arkiver |
yes that sounds good |
20:36
🔗
|
arkiver |
just to be 100% sure they work |
20:36
🔗
|
arkiver |
imagine: downloaded millions of accounts and they don't work... |
20:36
🔗
|
arkiver |
O.o |
20:36
🔗
|
arkiver |
going to pack up and upload some |
20:40
🔗
|
arkiver |
chfoo hang on uploading... |
20:47
🔗
|
arkiver |
chfoo: https://www.filepicker.io/api/file/ZizgWffKT1e9PGaCpiLP |
20:47
🔗
|
arkiver |
some warc's |
20:47
🔗
|
arkiver |
I added two big warc's |
20:47
🔗
|
arkiver |
and a lot of small warc's |
20:47
🔗
|
arkiver |
you'll see most of them are just emtpy accounts |
20:56
🔗
|
Smiley |
D: |
20:57
🔗
|
Smiley |
oh right |
20:57
🔗
|
Smiley |
most are empty because theres nothing to download D: |
20:59
🔗
|
arkiver |
no no |
20:59
🔗
|
arkiver |
they are not empty |
20:59
🔗
|
arkiver |
just the account has been created |
20:59
🔗
|
arkiver |
and the creater has done nothing |
21:00
🔗
|
arkiver |
Smiley: like this one: |
21:00
🔗
|
arkiver |
http://my.opera.com/4bass8/ |
21:00
🔗
|
arkiver |
going to stop the test now |
21:01
🔗
|
arkiver |
and start testing with more multiple crawlers tomorrow |
21:07
🔗
|
arkiver |
chfoo: and? do they work for you? |
21:08
🔗
|
chfoo |
arkiver: i'm a bit concerned that you are using requesting gzip compression "Accept-Encoding: x-gzip,gzip,deflate" |
21:09
🔗
|
arkiver |
hmm |
21:09
🔗
|
arkiver |
I could view them well with warc proxy |
21:09
🔗
|
arkiver |
and I uploaded some to the IA |
21:09
🔗
|
arkiver |
https://archive.org/details/arkiver20140131-1 |
21:10
🔗
|
arkiver |
my new packs |
21:10
🔗
|
arkiver |
they are indexed good |
21:10
🔗
|
arkiver |
https://archive.org/download/arkiver20140131-1/arkiver20140131-1.cdx.gz |
21:10
🔗
|
arkiver |
but because my item is not in the web collection I can't view them in the wayback machine |
21:10
🔗
|
arkiver |
would it be possible for you to quickly upload those items ot the wayback machine and see if they work there? |
21:11
🔗
|
chfoo |
arkiver: i can't. i'm not affiliated in any way. |
21:11
🔗
|
arkiver |
:/ |
21:11
🔗
|
arkiver |
ah |
21:11
🔗
|
arkiver |
but they do index |
21:12
🔗
|
arkiver |
so they should work right? |
21:12
🔗
|
arkiver |
and they work in the warc proxy |
21:13
🔗
|
chfoo |
arkiver: it depends on how the wayback machine handles it. i have no idea actually. |
21:14
🔗
|
arkiver |
hmm |
21:17
🔗
|
chfoo |
and i noticed that "WARC-Payload-Digest" is missing as well (but that's optional) |
21:18
🔗
|
arkiver |
yep |
21:18
🔗
|
arkiver |
maybe SketchCow wants to move the items (temporarily) to the web section of the IA just to see if they work?? |
21:22
🔗
|
Dud1 |
So using wget/warc is the best way to archive a site? |
21:22
🔗
|
chfoo |
arkiver: hold on, the warc file isn't valid. there's duplicate WARC-Record-ID |
21:23
🔗
|
arkiver |
hmm |
21:27
🔗
|
arkiver |
testing.... |
21:27
🔗
|
arkiver |
inserting in older uploded pack |
21:27
🔗
|
arkiver |
gosh |
21:28
🔗
|
arkiver |
I hope it works |
21:30
🔗
|
SketchCow |
Nobody can shove them into wayback but employees. |
21:30
🔗
|
SketchCow |
Why would I move them in temporarily? |
21:31
🔗
|
ivan` |
Dud1: or wpull. or use archivebot. |
21:43
🔗
|
arkiver |
SketchCowL to test if they work |
21:43
🔗
|
arkiver |
the warc's from my opera |
21:45
🔗
|
SketchCow |
Yeah, but if they work, they're in. |
21:45
🔗
|
SketchCow |
Why not have them in. |
21:46
🔗
|
Jonimus |
no reason to pull them out if they work |
21:53
🔗
|
arkiver |
SketchCow: ah, yes, of course the warc's I'm producing now seem to be working in warc proxy |
21:54
🔗
|
arkiver |
so if you want to do that |
21:54
🔗
|
arkiver |
would be nice!!] |
21:58
🔗
|
SketchCow |
Give me your internet archive account e-mail. |