#archiveteam 2013-08-23,Fri

↑back Search

Time Nickname Message
00:03 🔗 xmc derive, damn you
04:09 🔗 wp494 some of the younger ones in here may remember a MMO called toontown
04:09 🔗 wp494 but guess what
04:09 🔗 wp494 http://toontown.go.com/closing
04:09 🔗 wp494 it's shutting down
04:11 🔗 wp494 not much UGC on the site though
04:12 🔗 wp494 except for a few pieces of fan art here and there
11:12 🔗 frame_at_ posterous is at -115 days ^^. At which point can we call them undead?
11:13 🔗 Smiley :D
11:13 🔗 Smiley <3 vincent
13:52 🔗 yipdw http://varnull.adityamukerjee.net/post/59021412512/dont-fly-during-ramadan <-- I wonder what the highest impact way of archiving this is
13:52 🔗 yipdw WARCing the page of course is the start (I've done that)
13:52 🔗 yipdw but with blogs you want to capture reactions, etc.
13:53 🔗 yipdw I guess I could also just scrape all the posts and retweets
14:10 🔗 DFJustin for single posts like that I just retrieve it through liveeb
14:10 🔗 DFJustin *liveweb
15:03 🔗 SketchCow When we get all the posterous we can call them dead
15:03 🔗 SketchCow Wait, posterous, is dead. We should shut off that project.
15:24 🔗 DFJustin is it dead dead or just publicly dead but with a backdoor like a while ago
15:25 🔗 yipdw it's only mostly dead
15:26 🔗 yipdw once it's all dead we can go through its pockets and look for loose change
15:30 🔗 SketchCow Well, the point is we pulled all its data off.
15:34 🔗 omf_ Time for organ harvesting :)
15:34 🔗 omf_ Recycle all the things
15:35 🔗 yipdw inconceivable
16:08 🔗 balrog SketchCow: it's dead dead.
16:10 🔗 SketchCow Yeah.
16:10 🔗 SketchCow We should take it out of the tracker.
16:22 🔗 WiK #gitdigger downloaded 3,059 repositories yesterday and github users created 11,342
16:22 🔗 WiK this is quite the uphill battle :)
16:22 🔗 SketchCow Ha ha
16:22 🔗 SketchCow Maybe we need to start coordinating.
16:23 🔗 WiK well i have a fileserver that can hold up to 40TB more (uncompressed repos)
16:23 🔗 WiK trick is how do i get all the data from other downloaders :)
16:24 🔗 SketchCow This is exactly what the tracker is for.
16:25 🔗 WiK well, i could always stop the downloading and just collecting the usernames/repos for now
16:25 🔗 WiK worry about downloading them later
16:26 🔗 * WiK looks at modifying his code
16:31 🔗 Jonimus part of the trick is github can internally store forks as branches of the same repo for data dedup, not sure if you can.
16:32 🔗 WiK Jonimus: not worried about that, i dont want to data dedup
16:32 🔗 Jonimus I just mean they can store the same data in less space and less downloading for backups
16:40 🔗 omf_ WiK, my github mapping is chugging along
16:40 🔗 omf_ it can be used to break off chunks for people to download. I already
16:41 🔗 WiK omf_: i just added a jsononly mode to update my database
16:42 🔗 WiK it doesnt download anything, just added the info i want into my database to be downloaded later
16:42 🔗 omf_ looks like I am 2.5 million in so far
16:43 🔗 omf_ I am keeping the data in 2 forms. The first is text files containing 100 json repo records per file
16:44 🔗 omf_ The other that I use is loading as JSON into postgresql 9.2 and using the json features they built in
16:44 🔗 omf_ postgresql forces json format compliance before insertion so queries work across the data type
16:46 🔗 WiK boom, im now rockin and rolling
16:49 🔗 WiK omf_: im still gonna want to compare my data with yours, i think int he early days of my project i might have missed some repos due to coding errors
16:58 🔗 omf_ yeah no problem
17:04 🔗 omf_ I have already started building a script for updating of the dataset I am building.
17:05 🔗 omf_ I am going to use githubarchive for part of that since they already maintain the data in a public and searchable way
17:05 🔗 SketchCow So, I uploaded 1.3 terabytes of data into Internet Archive yesterday.
17:09 🔗 omf_ So jelly ;)
17:13 🔗 SketchCow Well, it's CHD a riffic.
17:13 🔗 SketchCow I got a lot off of the FOS machine, but so much more to go!
17:15 🔗 SketchCow Anyone here familiar with Javascript? I'd like to improve the javascript for the JSMESS page.
17:15 🔗 SketchCow I'm sure it's easy stuff, too - it's just not my language at all.
17:23 🔗 WiK i know a bit, depends what the problem is and what your tring to do
17:23 🔗 WiK ive got no experience with the more advacned ajax stuff
17:24 🔗 SketchCow Come to #jsmess, please.
20:53 🔗 Nemo_bis http://lists.wikimedia.org/pipermail/wikitech-l/2013-August/071442.html
22:11 🔗 arkhive SketchCow: Your User page wiki on Archiveteam.org has a dead link
22:12 🔗 arkhive under recording audio video. this link
22:12 🔗 arkhive http://recordkeepingroundtable.org/2011/06/25/where-do-old-websites-go-to-die-with-jason-scott-of-archive-team-podcast/
22:13 🔗 DFJustin irony
22:25 🔗 godane arkhive: i found the link to the mp3
22:25 🔗 godane its still working
22:25 🔗 godane i'm grabbing it now
22:29 🔗 arkhive godane: link please.
22:29 🔗 godane http://recordkeepingroundtable.files.wordpress.com/2011/06/recordkeeping-roundtable-220611.mp3
22:31 🔗 arkhive thanks
22:40 🔗 arkhive SketchCow: Also this link gives 404 http://www.archiveteam.org/archives/media/The%20Spendiferous%20Story%20of%20Archive%20Team%20-%20Jason%20Scott%20-%20PDA2011.mp3
22:57 🔗 godane http://web.archive.org/web/20110526034217/http://archiveteam.org/archives/media/The%20Spendiferous%20Story%20of%20Archive%20Team%20-%20Jason%20Scott%20-%20PDA2011.mp3
22:58 🔗 godane also video is here i think: https://archive.org/details/PDA2011-jasonscott
23:19 🔗 arkhive Just wondering.. What was the maximum upload video size/length of Google Video hosted Google videos?
23:31 🔗 dashcloud arkhive: it was at least 2 hours I think- I remember watching a video at least 1.5 hours, probably close to 2

irclogger-viewer