Time |
Nickname |
Message |
00:21
🔗
|
SketchCow |
I miss my long hair. |
00:28
🔗
|
Mister_2 |
TEXTFILES BACKUP DOWNLOAD PROGRESS: 4gb/11gb. |
00:28
🔗
|
Mister_2 |
TIME LEFT AS ESTIMATED BY TIXATI: "?:??" |
04:06
🔗
|
omf_ |
comics grab still going. 204mb and counting |
07:13
🔗
|
SketchCow |
Grabbing a mirror of the ftp.netscape.com mirror from bu.edu |
07:31
🔗
|
SketchCow |
http://www.archiveteam.org/index.php?title=DEFCON_19_Talk_Transcript |
07:31
🔗
|
SketchCow |
That was nice of someone. |
07:36
🔗
|
godane |
SketchCow: https://archive.org/details/Backup_of_The_Pirate_Bay_32xxxxx-to-79xxxxx |
07:36
🔗
|
godane |
can you move it to archiveteam |
07:46
🔗
|
SketchCow |
Done |
07:52
🔗
|
omf_ |
SketchCow, when you have a moment could you flip these from text to software collections. I goofed up this first group |
07:52
🔗
|
omf_ |
http://archive.org/details/opensuse-12.3_release http://archive.org/details/opensuse-12.2_release http://archive.org/details/opensuse-12.1_release |
07:52
🔗
|
omf_ |
http://archive.org/details/fedora-15_release |
07:52
🔗
|
omf_ |
http://archive.org/details/opensuse-11.4_release http://archive.org/details/centos-5.9_release http://archive.org/details/centos-6.4_release |
08:08
🔗
|
SketchCow |
Switched over. |
08:08
🔗
|
SketchCow |
I should start thinking about getting you a collection, huh. |
08:11
🔗
|
SketchCow |
I should actually think about getting you a top-level collection in Software, frankly. |
08:12
🔗
|
SketchCow |
If you really push up 3tb of these things, as this set, this really should not be "shareware CDs" |
11:25
🔗
|
SketchCow |
Streetfiles says they're closing later today |
11:25
🔗
|
SketchCow |
So we really should turn up the heat on that. |
11:56
🔗
|
Baljem |
I think we're having to be careful not to bring their servers to their knees :/ |
11:59
🔗
|
SketchCow |
they're going to bring their own servers to their knees |
12:00
🔗
|
Baljem |
well, quite, and all over a petty bloody squabble over who owns the idea, AIUI |
12:01
🔗
|
Baljem |
questions is, how hard do we push it to maximise what we get out? I think we've been oscillating around a few rate-limiting setpoints to try and balance it out |
12:01
🔗
|
SketchCow |
You know, as long as active effort along that lines are going on, I'm fine with it. |
12:01
🔗
|
Ymgve |
should I manually switch over? when doing archiveteam's choice it decides to do formspring, which I guess isn't that important yet since they're not shutting down today |
12:01
🔗
|
SketchCow |
Sometimes, I check that tracker and it seems supremely unmoving. |
12:02
🔗
|
SketchCow |
I mean, getting 10,000+ users into the wayback machine with 48 hours warning is a big win. |
12:03
🔗
|
SketchCow |
From: AOL Employee |
12:03
🔗
|
SketchCow |
Apr 30, 2013 4:12 AM |
12:03
🔗
|
SketchCow |
Date : |
12:03
🔗
|
SketchCow |
Subject : |
12:03
🔗
|
SketchCow |
comics alliance |
12:03
🔗
|
SketchCow |
Message : |
12:03
🔗
|
SketchCow |
hey, i work at aol, and i asked the content people about comics alliance.. there's no plans to take any of the sites down in the immediate future, as far as i know, so you have some time, don't quote me on that, obv. |
12:03
🔗
|
Baljem |
yes. problem is, after a while the site starts taking a minute or more to generate a page - and we're crawling a lot more pages than we are images (to get the most metadata) - and some jobs are massive compared to others, so you see nothing happen for ages then boom, a pile of 100MB+ uploads |
12:04
🔗
|
SketchCow |
Yeah, I see that now. |
12:06
🔗
|
chazchaz |
What's the current high priority project? I've been out of the loop for a bit. |
13:03
🔗
|
sep332 |
chazchaz: i'd say formspring. it's at -14 days and less than 1/4 finished |
13:03
🔗
|
sep332 |
it's also set as "archiveteam's choice" on the warrior now |
13:07
🔗
|
chazchaz |
Ok, I'll get to work. |
13:08
🔗
|
SmileyG |
also you can hit it rather hard with out it seemingly exploding. |
13:09
🔗
|
SmileyG |
we had enough people hitting streetfiles earlier that it slowed down to almost unusable. |
13:29
🔗
|
balrog |
sep332: I don't expect formspring to go away anytime soon though... we'll see I guess |
13:31
🔗
|
chazchaz |
I wish formspring wasn't so CPU intensive. |
13:32
🔗
|
SketchCow |
So, apparently, did formspring |
13:36
🔗
|
SmileyG |
boom tish! |
13:41
🔗
|
ZoeB |
Hmm... Does anyone know how many files you can put in a directory before things start breaking? |
13:43
🔗
|
ZoeB |
I'm trying out this message grabber I'm writing... last night, it got about 33,000 messages, one file each. I'm splitting them up fairly evenly across 256 directories, but I'm starting to wonder whether each of those should have 256 subdirectories, or whether that's unnecessarily complicating things. |
13:43
🔗
|
GLaDOS |
Depends on the filesystem' |
13:43
🔗
|
SketchCow |
I would definitely go for subdirectory |
13:43
🔗
|
Baljem |
depends on the context of 'directory' and what sort of brokenness you're expecting - in my experience, most filesystems just start getting slow on directory access once you get to that sort of size |
13:43
🔗
|
Baljem |
yeah, what they said ;) |
13:44
🔗
|
ZoeB |
OK, sub it is, thanks Jason :) |
13:44
🔗
|
SketchCow |
filesystems are a lot of promises, broken ones |
13:45
🔗
|
ZoeB |
I've occasionally got to the stage where bash can't use wildcards, a directory's so big, but I can't for the life of me remember how big it actually was to do that. |
13:45
🔗
|
Baljem |
SketchCow: don't forget pain and misery. |
13:54
🔗
|
sep332 |
I think bash wildcards fail when the expansion has more characters than the maximum command length |
14:04
🔗
|
ZoeB |
Ah, that'd make sense |
14:31
🔗
|
SketchCow |
Whoops, I think streetfiles just ate it |
14:31
🔗
|
SketchCow |
Wait, no, there it is |
14:32
🔗
|
SilSte |
ate what? |
14:32
🔗
|
SketchCow |
dirt |
14:38
🔗
|
omf_ |
comicsalliance is at 474mb and counting |
14:43
🔗
|
andy0 |
anyone build and run the scripts on CentOS? |
14:45
🔗
|
ZoeB |
Implemented. Arcmesg now uses 256^2 directories. |
14:49
🔗
|
omf_ |
andy0, I have centos experience |
14:49
🔗
|
andy0 |
I'm attempting to build ./get-wget-lua.sh, I've added just about everything to fending off build errors |
14:50
🔗
|
andy0 |
To over kill, I'm currently -> yum groupinstall "Development Libraries" "Development Tools" |
14:50
🔗
|
andy0 |
I'll build again to see if I continue to get errors |
14:51
🔗
|
andy0 |
The end: checking pcre.h usability... no |
14:51
🔗
|
andy0 |
checking for pcre.h... no |
14:51
🔗
|
andy0 |
checking lua.h presence... no |
14:51
🔗
|
andy0 |
checking lua.h usability... no |
14:51
🔗
|
andy0 |
checking pcre.h presence... no |
14:51
🔗
|
andy0 |
checking for lua.h... no |
14:51
🔗
|
andy0 |
checking lua5.1/lua.h usability... no |
14:51
🔗
|
andy0 |
checking lua5.1/lua.h presence... no |
14:51
🔗
|
andy0 |
checking for lua5.1/lua.h... no |
14:52
🔗
|
andy0 |
configure: error: lua not found |
14:52
🔗
|
omf_ |
you just need: make, gcc, gnutls-dev |
14:52
🔗
|
omf_ |
you new lua-dev installed and it must be lua 5.0+ |
14:53
🔗
|
omf_ |
That error is missing lua development files |
14:57
🔗
|
andy0 |
steps toward, now lua-devel installed |
14:57
🔗
|
andy0 |
gnutls.c: In function 'ssl_connect_wget': |
14:57
🔗
|
andy0 |
gnutls.c:427: error: 'GNUTLS_TLS1_2' undeclared (first use in this function) |
14:57
🔗
|
andy0 |
gnutls.c:427: error: (Each undeclared identifier is reported only once |
14:57
🔗
|
andy0 |
gnutls.c:427: error: for each function it appears in.) |
14:57
🔗
|
andy0 |
make[3]: *** [gnutls.o] Error 1 |
14:57
🔗
|
andy0 |
make[3]: Leaving directory `/root/formspring-grab/get-wget-lua.tmp/src' |
14:57
🔗
|
andy0 |
make[2]: *** [all] Error 2 |
14:57
🔗
|
andy0 |
make[2]: Leaving directory `/root/formspring-grab/get-wget-lua.tmp/src' |
14:57
🔗
|
andy0 |
make[1]: *** [all-recursive] Error 1 |
14:57
🔗
|
andy0 |
make[1]: Leaving directory `/root/formspring-grab/get-wget-lua.tmp' |
14:57
🔗
|
andy0 |
make: *** [all] Error 2 |
15:00
🔗
|
andy0 |
Package gnutls-devel-1.4.1-10.el5_9.1.i386 already installed and latest version |
15:01
🔗
|
andy0 |
Package lua-devel-5.1.4-4.el5.i386 already installed and latest version |
15:01
🔗
|
omf_ |
andy0, we have a paste site http://paste.archivingyoursh.it/ use it instead of filling the channel with copy paste |
15:07
🔗
|
andy0 |
omf_ -> http://paste.archivingyoursh.it/derumoquni.coffee :) |
15:28
🔗
|
alard |
Isn't that a fairly old version of gnutls? |
15:29
🔗
|
andy0 |
yes, newest that I know of for CentOS 5 |
15:30
🔗
|
alard |
You could try to build Wget with OpenSSL. |
15:37
🔗
|
SmileyG |
andy0: what repo's you using? |
15:38
🔗
|
andy0 |
building with OpenSSL worked! (Context, I'm generally debian but VPS happeneds to be CentOS) |
15:46
🔗
|
SmileyG |
lol |
15:46
🔗
|
SmileyG |
centos at work, the old packages cause me no end of headaches |
15:51
🔗
|
andy0 |
in my small debian world, things just work for me |
15:52
🔗
|
andy0 |
in CentOS, I've installed python-pip, but can not for the life of me figure how to command: pip install seesaw |
16:04
🔗
|
SmileyG |
whereis pip shows what? |
16:14
🔗
|
andy0 |
(note, I have multiple python versions mucking about, also log from attempting seesaw install -> http://paste.archivingyoursh.it/temediyajo.vhdl) |
16:14
🔗
|
andy0 |
pip: /usr/bin/pip |
16:17
🔗
|
lNTERNET |
http://www.dolekemp96.org/main.htm |
16:17
🔗
|
lNTERNET |
http://www.heavensgate.com/ |
16:17
🔗
|
andy0 |
Attempting Python 2.6 & 2.4 -> http://paste.archivingyoursh.it/lejoqupude.rb |
16:17
🔗
|
lNTERNET |
http://www2.warnerbros.com/spacejam/movie/jam.htm |
16:17
🔗
|
lNTERNET |
http://www.cnn.com/EVENTS/1996/anniversary/flashback.machine/index.html |
16:18
🔗
|
omf_ |
Already grabbed the space jam site - http://archive.org/details/WarnerHomeVideo-SpaceJam |
20:11
🔗
|
Nathan__ |
Three hours left for Posterous! |
20:15
🔗
|
Nathan__ |
Yeah! |
20:15
🔗
|
Nathan__ |
Well. This channel sure has a lot of activity |
20:16
🔗
|
SketchCow |
Sorry we failed your 4 minute test |
20:16
🔗
|
SketchCow |
http://www.youtube.com/watch?feature=player_embedded&v=uIjO0sKBDDw#! |
20:17
🔗
|
SketchCow |
(Posterous) |
20:18
🔗
|
Nathan__ |
Yup! |
20:19
🔗
|
SmileyG |
o_O |
20:19
🔗
|
SmileyG |
was that yahoo?! |
20:19
🔗
|
Nathan__ |
It's a shame they're not accepting signups. |
20:20
🔗
|
Nathan__ |
I'd be the biggest star on there. In 3 hours |
20:20
🔗
|
SketchCow |
It was a posterous plane en route to yahoo! |
20:20
🔗
|
Nathan__ |
It stopped by Formspring, but no one got on |
20:21
🔗
|
SketchCow |
yeah, the remaining guy was saved |
20:21
🔗
|
SmileyG |
we shouldn't joke :< |
20:21
🔗
|
SketchCow |
Library of Congress sent something demanding it come in "Before Close of Business" |
20:21
🔗
|
SketchCow |
I have the window open to press at 4:59 |
20:21
🔗
|
Nathan__ |
I was really confused back there when my Warrior showed -50 hours. |
20:24
🔗
|
Baljem |
SketchCow: good luck with that. my uncle has just been the proud recipient of a late filing fine showing a timestamp of 2013-02-01 00:00:00 -- it seems the deadline was set at 23:59:59 on the 31st... |
20:24
🔗
|
SketchCow |
NICE |
20:24
🔗
|
Baljem |
there's last minute and there's taking the mick ;) |
20:24
🔗
|
SketchCow |
But those are unyielding, tight-assed bureacrats. This is for the Library of Congress! |
20:25
🔗
|
Baljem |
ah, I see you've dealt with the fine folk at Her Majesty's Revenue and Customs before then. although I suppose the taxman is the taxman wherever in the world you find him |
20:25
🔗
|
Nathan__ |
Visit the Library of Congress, where our twitter archive takes 20 hours to search. |
20:51
🔗
|
omf_ |
if you are lucky |
20:52
🔗
|
sep332 |
layabout hippie librarians |
21:16
🔗
|
SketchCow |
http://hh.textfiles.com/ |
21:16
🔗
|
SketchCow |
yay |
21:18
🔗
|
omf_ |
oooh a little bit of book porn |
21:23
🔗
|
flaushy |
http://www.onb.ac.at/ev/austrianbooksonline.htm might be of interest as well |
21:35
🔗
|
SketchCow |
Rebuilding statusboard |
22:14
🔗
|
SketchCow |
http://statusboard.archive.org/ is back! |
22:14
🔗
|
BlueMax |
now that's cool. |
22:18
🔗
|
flaushy |
cool |
22:18
🔗
|
flaushy |
scanning in china, that is awesome |
22:23
🔗
|
frame_at |
haha, posterous has unbanned my ip - for the finale! |
22:26
🔗
|
dashcloud |
so, I left the warrior running, and it sucked up all the space on the partition- how can I get the data off there and safely shrink the disk image back down to a reasonable size? |
22:32
🔗
|
flaushy |
i deleted my whole vm (after stopping the warrior) and imported it freshly .. |
22:52
🔗
|
Nathan_ |
An hour left for Posterous! |
22:54
🔗
|
Nathan_ |
It's a shame, really. |
23:02
🔗
|
omf_ |
Here is a tool I have been using to dirdiff the current working directory against an Internet Archive Item - https://github.com/kimmel/ia-dirdiff |
23:03
🔗
|
omf_ |
I run it after an upload to make sure everything matches up |
23:03
🔗
|
Nathan_ |
Cool |
23:03
🔗
|
omf_ |
I plan to add sha sum checking as well |
23:04
🔗
|
omf_ |
That information comes back automatically in the json feed |
23:04
🔗
|
Nathan_ |
Does it compare md5s or something? |
23:04
🔗
|
omf_ |
right now it is straight file size |
23:05
🔗
|
omf_ |
I get md5, crc32 and sha1 |
23:08
🔗
|
Nathan_ |
What do you use it for? |
23:08
🔗
|
Nathan_ |
Are you the guy who my Warrior sends stuff to? |
23:08
🔗
|
omf_ |
no |
23:09
🔗
|
Nathan_ |
what do you do, then? |
23:13
🔗
|
omf_ |
I upload stuff to the Internet Archive and occasionally build tools to make it easier |
23:22
🔗
|
Nathan_ |
by the way, Posterous is still up, but Warrior says it has 0 hours left |
23:29
🔗
|
flaushy |
it will stay up |
23:30
🔗
|
Nathan_ |
Wha? |
23:30
🔗
|
Nathan_ |
But it says it'll shut down and doesn't accept new accounts |
23:31
🔗
|
flaushy |
right, and as long as we use the right user agent we can download. until posterous decides otherwise |
23:31
🔗
|
Nathan_ |
nice. |
23:31
🔗
|
flaushy |
so keep the warrior running, we get em :) |
23:31
🔗
|
Nathan_ |
Only a million to go@ :) |
23:32
🔗
|
Nathan_ |
how, though? |
23:33
🔗
|
Nathan_ |
Does googlebot get special privilages? |
23:33
🔗
|
flaushy |
in #preposterus are all the gory details :) |
23:33
🔗
|
Nathan_ |
can you give me the cleaned-up summary? |
23:33
🔗
|
flaushy |
they do cooperate with us (within limits) |
23:34
🔗
|
Nathan_ |
By the way, the co-founder is starting another site that will "never go down". Yeah, right. |