#archiveteam 2013-04-30,Tue

↑back Search

Time Nickname Message
00:21 🔗 SketchCow I miss my long hair.
00:28 🔗 Mister_2 TEXTFILES BACKUP DOWNLOAD PROGRESS: 4gb/11gb.
00:28 🔗 Mister_2 TIME LEFT AS ESTIMATED BY TIXATI: "?:??"
04:06 🔗 omf_ comics grab still going. 204mb and counting
07:13 🔗 SketchCow Grabbing a mirror of the ftp.netscape.com mirror from bu.edu
07:31 🔗 SketchCow http://www.archiveteam.org/index.php?title=DEFCON_19_Talk_Transcript
07:31 🔗 SketchCow That was nice of someone.
07:36 🔗 godane SketchCow: https://archive.org/details/Backup_of_The_Pirate_Bay_32xxxxx-to-79xxxxx
07:36 🔗 godane can you move it to archiveteam
07:46 🔗 SketchCow Done
07:52 🔗 omf_ SketchCow, when you have a moment could you flip these from text to software collections. I goofed up this first group
07:52 🔗 omf_ http://archive.org/details/opensuse-12.3_release http://archive.org/details/opensuse-12.2_release http://archive.org/details/opensuse-12.1_release
07:52 🔗 omf_ http://archive.org/details/fedora-15_release
07:52 🔗 omf_ http://archive.org/details/opensuse-11.4_release http://archive.org/details/centos-5.9_release http://archive.org/details/centos-6.4_release
08:08 🔗 SketchCow Switched over.
08:08 🔗 SketchCow I should start thinking about getting you a collection, huh.
08:11 🔗 SketchCow I should actually think about getting you a top-level collection in Software, frankly.
08:12 🔗 SketchCow If you really push up 3tb of these things, as this set, this really should not be "shareware CDs"
11:25 🔗 SketchCow Streetfiles says they're closing later today
11:25 🔗 SketchCow So we really should turn up the heat on that.
11:56 🔗 Baljem I think we're having to be careful not to bring their servers to their knees :/
11:59 🔗 SketchCow they're going to bring their own servers to their knees
12:00 🔗 Baljem well, quite, and all over a petty bloody squabble over who owns the idea, AIUI
12:01 🔗 Baljem questions is, how hard do we push it to maximise what we get out? I think we've been oscillating around a few rate-limiting setpoints to try and balance it out
12:01 🔗 SketchCow You know, as long as active effort along that lines are going on, I'm fine with it.
12:01 🔗 Ymgve should I manually switch over? when doing archiveteam's choice it decides to do formspring, which I guess isn't that important yet since they're not shutting down today
12:01 🔗 SketchCow Sometimes, I check that tracker and it seems supremely unmoving.
12:02 🔗 SketchCow I mean, getting 10,000+ users into the wayback machine with 48 hours warning is a big win.
12:03 🔗 SketchCow From: AOL Employee
12:03 🔗 SketchCow Apr 30, 2013 4:12 AM
12:03 🔗 SketchCow Date :
12:03 🔗 SketchCow Subject :
12:03 🔗 SketchCow comics alliance
12:03 🔗 SketchCow Message :
12:03 🔗 SketchCow hey, i work at aol, and i asked the content people about comics alliance.. there's no plans to take any of the sites down in the immediate future, as far as i know, so you have some time, don't quote me on that, obv.
12:03 🔗 Baljem yes. problem is, after a while the site starts taking a minute or more to generate a page - and we're crawling a lot more pages than we are images (to get the most metadata) - and some jobs are massive compared to others, so you see nothing happen for ages then boom, a pile of 100MB+ uploads
12:04 🔗 SketchCow Yeah, I see that now.
12:06 🔗 chazchaz What's the current high priority project? I've been out of the loop for a bit.
13:03 🔗 sep332 chazchaz: i'd say formspring. it's at -14 days and less than 1/4 finished
13:03 🔗 sep332 it's also set as "archiveteam's choice" on the warrior now
13:07 🔗 chazchaz Ok, I'll get to work.
13:08 🔗 SmileyG also you can hit it rather hard with out it seemingly exploding.
13:09 🔗 SmileyG we had enough people hitting streetfiles earlier that it slowed down to almost unusable.
13:29 🔗 balrog sep332: I don't expect formspring to go away anytime soon though... we'll see I guess
13:31 🔗 chazchaz I wish formspring wasn't so CPU intensive.
13:32 🔗 SketchCow So, apparently, did formspring
13:36 🔗 SmileyG boom tish!
13:41 🔗 ZoeB Hmm... Does anyone know how many files you can put in a directory before things start breaking?
13:43 🔗 ZoeB I'm trying out this message grabber I'm writing... last night, it got about 33,000 messages, one file each. I'm splitting them up fairly evenly across 256 directories, but I'm starting to wonder whether each of those should have 256 subdirectories, or whether that's unnecessarily complicating things.
13:43 🔗 GLaDOS Depends on the filesystem'
13:43 🔗 SketchCow I would definitely go for subdirectory
13:43 🔗 Baljem depends on the context of 'directory' and what sort of brokenness you're expecting - in my experience, most filesystems just start getting slow on directory access once you get to that sort of size
13:43 🔗 Baljem yeah, what they said ;)
13:44 🔗 ZoeB OK, sub it is, thanks Jason :)
13:44 🔗 SketchCow filesystems are a lot of promises, broken ones
13:45 🔗 ZoeB I've occasionally got to the stage where bash can't use wildcards, a directory's so big, but I can't for the life of me remember how big it actually was to do that.
13:45 🔗 Baljem SketchCow: don't forget pain and misery.
13:54 🔗 sep332 I think bash wildcards fail when the expansion has more characters than the maximum command length
14:04 🔗 ZoeB Ah, that'd make sense
14:31 🔗 SketchCow Whoops, I think streetfiles just ate it
14:31 🔗 SketchCow Wait, no, there it is
14:32 🔗 SilSte ate what?
14:32 🔗 SketchCow dirt
14:38 🔗 omf_ comicsalliance is at 474mb and counting
14:43 🔗 andy0 anyone build and run the scripts on CentOS?
14:45 🔗 ZoeB Implemented. Arcmesg now uses 256^2 directories.
14:49 🔗 omf_ andy0, I have centos experience
14:49 🔗 andy0 I'm attempting to build ./get-wget-lua.sh, I've added just about everything to fending off build errors
14:50 🔗 andy0 To over kill, I'm currently -> yum groupinstall "Development Libraries" "Development Tools"
14:50 🔗 andy0 I'll build again to see if I continue to get errors
14:51 🔗 andy0 The end: checking pcre.h usability... no
14:51 🔗 andy0 checking for pcre.h... no
14:51 🔗 andy0 checking lua.h presence... no
14:51 🔗 andy0 checking lua.h usability... no
14:51 🔗 andy0 checking pcre.h presence... no
14:51 🔗 andy0 checking for lua.h... no
14:51 🔗 andy0 checking lua5.1/lua.h usability... no
14:51 🔗 andy0 checking lua5.1/lua.h presence... no
14:51 🔗 andy0 checking for lua5.1/lua.h... no
14:52 🔗 andy0 configure: error: lua not found
14:52 🔗 omf_ you just need: make, gcc, gnutls-dev
14:52 🔗 omf_ you new lua-dev installed and it must be lua 5.0+
14:53 🔗 omf_ That error is missing lua development files
14:57 🔗 andy0 steps toward, now lua-devel installed
14:57 🔗 andy0 gnutls.c: In function 'ssl_connect_wget':
14:57 🔗 andy0 gnutls.c:427: error: 'GNUTLS_TLS1_2' undeclared (first use in this function)
14:57 🔗 andy0 gnutls.c:427: error: (Each undeclared identifier is reported only once
14:57 🔗 andy0 gnutls.c:427: error: for each function it appears in.)
14:57 🔗 andy0 make[3]: *** [gnutls.o] Error 1
14:57 🔗 andy0 make[3]: Leaving directory `/root/formspring-grab/get-wget-lua.tmp/src'
14:57 🔗 andy0 make[2]: *** [all] Error 2
14:57 🔗 andy0 make[2]: Leaving directory `/root/formspring-grab/get-wget-lua.tmp/src'
14:57 🔗 andy0 make[1]: *** [all-recursive] Error 1
14:57 🔗 andy0 make[1]: Leaving directory `/root/formspring-grab/get-wget-lua.tmp'
14:57 🔗 andy0 make: *** [all] Error 2
15:00 🔗 andy0 Package gnutls-devel-1.4.1-10.el5_9.1.i386 already installed and latest version
15:01 🔗 andy0 Package lua-devel-5.1.4-4.el5.i386 already installed and latest version
15:01 🔗 omf_ andy0, we have a paste site http://paste.archivingyoursh.it/ use it instead of filling the channel with copy paste
15:07 🔗 andy0 omf_ -> http://paste.archivingyoursh.it/derumoquni.coffee :)
15:28 🔗 alard Isn't that a fairly old version of gnutls?
15:29 🔗 andy0 yes, newest that I know of for CentOS 5
15:30 🔗 alard You could try to build Wget with OpenSSL.
15:37 🔗 SmileyG andy0: what repo's you using?
15:38 🔗 andy0 building with OpenSSL worked! (Context, I'm generally debian but VPS happeneds to be CentOS)
15:46 🔗 SmileyG lol
15:46 🔗 SmileyG centos at work, the old packages cause me no end of headaches
15:51 🔗 andy0 in my small debian world, things just work for me
15:52 🔗 andy0 in CentOS, I've installed python-pip, but can not for the life of me figure how to command: pip install seesaw
16:04 🔗 SmileyG whereis pip shows what?
16:14 🔗 andy0 (note, I have multiple python versions mucking about, also log from attempting seesaw install -> http://paste.archivingyoursh.it/temediyajo.vhdl)
16:14 🔗 andy0 pip: /usr/bin/pip
16:17 🔗 lNTERNET http://www.dolekemp96.org/main.htm
16:17 🔗 lNTERNET http://www.heavensgate.com/
16:17 🔗 andy0 Attempting Python 2.6 & 2.4 -> http://paste.archivingyoursh.it/lejoqupude.rb
16:17 🔗 lNTERNET http://www2.warnerbros.com/spacejam/movie/jam.htm
16:17 🔗 lNTERNET http://www.cnn.com/EVENTS/1996/anniversary/flashback.machine/index.html
16:18 🔗 omf_ Already grabbed the space jam site - http://archive.org/details/WarnerHomeVideo-SpaceJam
20:11 🔗 Nathan__ Three hours left for Posterous!
20:15 🔗 Nathan__ Yeah!
20:15 🔗 Nathan__ Well. This channel sure has a lot of activity
20:16 🔗 SketchCow Sorry we failed your 4 minute test
20:16 🔗 SketchCow http://www.youtube.com/watch?feature=player_embedded&v=uIjO0sKBDDw#!
20:17 🔗 SketchCow (Posterous)
20:18 🔗 Nathan__ Yup!
20:19 🔗 SmileyG o_O
20:19 🔗 SmileyG was that yahoo?!
20:19 🔗 Nathan__ It's a shame they're not accepting signups.
20:20 🔗 Nathan__ I'd be the biggest star on there. In 3 hours
20:20 🔗 SketchCow It was a posterous plane en route to yahoo!
20:20 🔗 Nathan__ It stopped by Formspring, but no one got on
20:21 🔗 SketchCow yeah, the remaining guy was saved
20:21 🔗 SmileyG we shouldn't joke :<
20:21 🔗 SketchCow Library of Congress sent something demanding it come in "Before Close of Business"
20:21 🔗 SketchCow I have the window open to press at 4:59
20:21 🔗 Nathan__ I was really confused back there when my Warrior showed -50 hours.
20:24 🔗 Baljem SketchCow: good luck with that. my uncle has just been the proud recipient of a late filing fine showing a timestamp of 2013-02-01 00:00:00 -- it seems the deadline was set at 23:59:59 on the 31st...
20:24 🔗 SketchCow NICE
20:24 🔗 Baljem there's last minute and there's taking the mick ;)
20:24 🔗 SketchCow But those are unyielding, tight-assed bureacrats. This is for the Library of Congress!
20:25 🔗 Baljem ah, I see you've dealt with the fine folk at Her Majesty's Revenue and Customs before then. although I suppose the taxman is the taxman wherever in the world you find him
20:25 🔗 Nathan__ Visit the Library of Congress, where our twitter archive takes 20 hours to search.
20:51 🔗 omf_ if you are lucky
20:52 🔗 sep332 layabout hippie librarians
21:16 🔗 SketchCow http://hh.textfiles.com/
21:16 🔗 SketchCow yay
21:18 🔗 omf_ oooh a little bit of book porn
21:23 🔗 flaushy http://www.onb.ac.at/ev/austrianbooksonline.htm might be of interest as well
21:35 🔗 SketchCow Rebuilding statusboard
22:14 🔗 SketchCow http://statusboard.archive.org/ is back!
22:14 🔗 BlueMax now that's cool.
22:18 🔗 flaushy cool
22:18 🔗 flaushy scanning in china, that is awesome
22:23 🔗 frame_at haha, posterous has unbanned my ip - for the finale!
22:26 🔗 dashcloud so, I left the warrior running, and it sucked up all the space on the partition- how can I get the data off there and safely shrink the disk image back down to a reasonable size?
22:32 🔗 flaushy i deleted my whole vm (after stopping the warrior) and imported it freshly ..
22:52 🔗 Nathan_ An hour left for Posterous!
22:54 🔗 Nathan_ It's a shame, really.
23:02 🔗 omf_ Here is a tool I have been using to dirdiff the current working directory against an Internet Archive Item - https://github.com/kimmel/ia-dirdiff
23:03 🔗 omf_ I run it after an upload to make sure everything matches up
23:03 🔗 Nathan_ Cool
23:03 🔗 omf_ I plan to add sha sum checking as well
23:04 🔗 omf_ That information comes back automatically in the json feed
23:04 🔗 Nathan_ Does it compare md5s or something?
23:04 🔗 omf_ right now it is straight file size
23:05 🔗 omf_ I get md5, crc32 and sha1
23:08 🔗 Nathan_ What do you use it for?
23:08 🔗 Nathan_ Are you the guy who my Warrior sends stuff to?
23:08 🔗 omf_ no
23:09 🔗 Nathan_ what do you do, then?
23:13 🔗 omf_ I upload stuff to the Internet Archive and occasionally build tools to make it easier
23:22 🔗 Nathan_ by the way, Posterous is still up, but Warrior says it has 0 hours left
23:29 🔗 flaushy it will stay up
23:30 🔗 Nathan_ Wha?
23:30 🔗 Nathan_ But it says it'll shut down and doesn't accept new accounts
23:31 🔗 flaushy right, and as long as we use the right user agent we can download. until posterous decides otherwise
23:31 🔗 Nathan_ nice.
23:31 🔗 flaushy so keep the warrior running, we get em :)
23:31 🔗 Nathan_ Only a million to go@ :)
23:32 🔗 Nathan_ how, though?
23:33 🔗 Nathan_ Does googlebot get special privilages?
23:33 🔗 flaushy in #preposterus are all the gory details :)
23:33 🔗 Nathan_ can you give me the cleaned-up summary?
23:33 🔗 flaushy they do cooperate with us (within limits)
23:34 🔗 Nathan_ By the way, the co-founder is starting another site that will "never go down". Yeah, right.

irclogger-viewer