[00:03] derive, damn you [04:09] some of the younger ones in here may remember a MMO called toontown [04:09] but guess what [04:09] http://toontown.go.com/closing [04:09] it's shutting down [04:11] not much UGC on the site though [04:12] except for a few pieces of fan art here and there [11:12] posterous is at -115 days ^^. At which point can we call them undead? [11:13] :D [11:13] <3 vincent [13:52] http://varnull.adityamukerjee.net/post/59021412512/dont-fly-during-ramadan <-- I wonder what the highest impact way of archiving this is [13:52] WARCing the page of course is the start (I've done that) [13:52] but with blogs you want to capture reactions, etc. [13:53] I guess I could also just scrape all the posts and retweets [14:10] for single posts like that I just retrieve it through liveeb [14:10] *liveweb [15:03] When we get all the posterous we can call them dead [15:03] Wait, posterous, is dead. We should shut off that project. [15:24] is it dead dead or just publicly dead but with a backdoor like a while ago [15:25] it's only mostly dead [15:26] once it's all dead we can go through its pockets and look for loose change [15:30] Well, the point is we pulled all its data off. [15:34] Time for organ harvesting :) [15:34] Recycle all the things [15:35] inconceivable [16:08] SketchCow: it's dead dead. [16:10] Yeah. [16:10] We should take it out of the tracker. [16:22] #gitdigger downloaded 3,059 repositories yesterday and github users created 11,342 [16:22] this is quite the uphill battle :) [16:22] Ha ha [16:22] Maybe we need to start coordinating. [16:23] well i have a fileserver that can hold up to 40TB more (uncompressed repos) [16:23] trick is how do i get all the data from other downloaders :) [16:24] This is exactly what the tracker is for. [16:25] well, i could always stop the downloading and just collecting the usernames/repos for now [16:25] worry about downloading them later [16:26] * WiK looks at modifying his code [16:31] part of the trick is github can internally store forks as branches of the same repo for data dedup, not sure if you can. [16:32] Jonimus: not worried about that, i dont want to data dedup [16:32] I just mean they can store the same data in less space and less downloading for backups [16:40] WiK, my github mapping is chugging along [16:40] it can be used to break off chunks for people to download. I already [16:41] omf_: i just added a jsononly mode to update my database [16:42] it doesnt download anything, just added the info i want into my database to be downloaded later [16:42] looks like I am 2.5 million in so far [16:43] I am keeping the data in 2 forms. The first is text files containing 100 json repo records per file [16:44] The other that I use is loading as JSON into postgresql 9.2 and using the json features they built in [16:44] postgresql forces json format compliance before insertion so queries work across the data type [16:46] boom, im now rockin and rolling [16:49] omf_: im still gonna want to compare my data with yours, i think int he early days of my project i might have missed some repos due to coding errors [16:58] yeah no problem [17:04] I have already started building a script for updating of the dataset I am building. [17:05] I am going to use githubarchive for part of that since they already maintain the data in a public and searchable way [17:05] So, I uploaded 1.3 terabytes of data into Internet Archive yesterday. [17:09] So jelly ;) [17:13] Well, it's CHD a riffic. [17:13] I got a lot off of the FOS machine, but so much more to go! [17:15] Anyone here familiar with Javascript? I'd like to improve the javascript for the JSMESS page. [17:15] I'm sure it's easy stuff, too - it's just not my language at all. [17:23] i know a bit, depends what the problem is and what your tring to do [17:23] ive got no experience with the more advacned ajax stuff [17:24] Come to #jsmess, please. [20:53] http://lists.wikimedia.org/pipermail/wikitech-l/2013-August/071442.html [22:11] SketchCow: Your User page wiki on Archiveteam.org has a dead link [22:12] under recording audio video. this link [22:12] http://recordkeepingroundtable.org/2011/06/25/where-do-old-websites-go-to-die-with-jason-scott-of-archive-team-podcast/ [22:13] irony [22:25] arkhive: i found the link to the mp3 [22:25] its still working [22:25] i'm grabbing it now [22:29] godane: link please. [22:29] http://recordkeepingroundtable.files.wordpress.com/2011/06/recordkeeping-roundtable-220611.mp3 [22:31] thanks [22:40] SketchCow: Also this link gives 404 http://www.archiveteam.org/archives/media/The%20Spendiferous%20Story%20of%20Archive%20Team%20-%20Jason%20Scott%20-%20PDA2011.mp3 [22:57] http://web.archive.org/web/20110526034217/http://archiveteam.org/archives/media/The%20Spendiferous%20Story%20of%20Archive%20Team%20-%20Jason%20Scott%20-%20PDA2011.mp3 [22:58] also video is here i think: https://archive.org/details/PDA2011-jasonscott [23:19] Just wondering.. What was the maximum upload video size/length of Google Video hosted Google videos? [23:31] arkhive: it was at least 2 hours I think- I remember watching a video at least 1.5 hours, probably close to 2