[00:13] I am still cleaning up the data but it looks like I could have up to 152,510 yahoo groups [00:14] nice [00:15] do we already have infrastructure/any other projects where we do a historical grab and then try to keep it up to date? [00:15] fanfiction.net [00:15] (hmm, URLteam I guess needs to do something like that, so a solved problem I guess) [00:15] puush is doing that as well [00:15] cool [00:49] omf_: did you get that downloader thing working yet? [00:49] I'd like to test it at some point [00:52] omf_: you're also indexing group names, right? [00:52] now that I have more than a handful of groups I can start working on it [00:52] ~152,510 yahoo groups [00:52] wow [00:52] and all of them require joining to access the "interesting" stuff. [00:52] even so, archiving just messages isn't useless [02:21] everyone and balrog, this is what it feels like working on that yahoo groups code --> http://i.imgur.com/OVvEKxh.gif [02:24] omf_: are you suggesting that it should be rewritten in (say) python ? [02:26] No [02:27] I have to fully understand how this thing works before a determination like that can be made [02:28] I see :/ [02:28] and yes this thing might as well be obfuscated [02:31] maybe to the untrained eye. To me it just reeks of beginner coder [02:31] Too verbose, no language idioms, poor comments [02:31] lack of understanding of the language ecosystem [02:36] ah :/ [02:37] and yeah... I'm not a perl programmer [14:24] omf_: ya there? just got an email from the maintainer of that perl script [15:00] Anyone grabbing the new Wikileaks insurance files? https://www.facebook.com/wikileaks/posts/561645433870573 http://wlstorage.net/torrent/wlinsurance-20130815-A.aes256.torrent (3.6GB) http://wlstorage.net/torrent/wlinsurance-20130815-B.aes256.torrent (49GB) http://wlstorage.net/torrent/wlinsurance-20130815-C.aes256.torrent (349GB) [15:33] here now balrog [15:45] ersi: will check if I ahhve disk space left [16:03] ersi: fisrt torrent going down, ~20 min left will get the others after it [16:04] yeah I don't have the space at present for 349gb on my server :( [16:08] found space in one of my externals (need to get more archiving media) so i'll grab it [16:08] wonder if my isp will notice :p [16:20] first down 2 to go [18:44] omf_: http://sourceforge.net/p/grabyahoogroup/code/128/tree//trunk/yahoo_group/grabyahoogroup.pl?diff=50ee4b9834309d7d9055735b:127 [18:54] did that fix the login issue with new tokens [18:55] http://tech.slashdot.org/story/13/08/17/1747228/yahoo-deletes-journalists-pre-paid-legacy-site-after-suicide?utm_source=rss1.0mainlinkanon&utm_medium=feed [18:57] jesus christ [19:00] Not suprising [19:03] I ran out of outrage :( [19:05] so i'm mirroring the sportesinreview.com [19:05] i was have trouble doing a normal mirror [19:06] so i have to make a index.txt file with pages that should exist [19:06] like up to ?p=8679 urls [19:07] and cat is up to 60 [19:08] i'm glad that i could help mirror that blog [19:10] godane: i mirrored it [19:10] oh [19:11] link please? [19:11] godane: looking for it now, hope that I got the whole site [19:12] https://archive.org/details/Sportsinreview20130816 [19:14] i'm going to download it to see the commands [19:14] used to mirror it [19:14] used http://archiveteam.org/index.php?title=User:Djsmiley2k [19:16] so yahoo used yanked it off.... [19:21] *just [19:47] godane: does it look good? [20:01] i think its good [20:02] omf_: nope. [20:03] to get "tokens" what you do is programmatically log into yahoo then save the cookies [20:14] godane: had trouble starting the mirroring at the index so had to start it one step deep and only then wget would grab the whole site [20:22] ok [21:14] Starting up automated wget's and uploads of various horticulture and gardening forums, particularly GardenWeb.com which has very little coverage of its actual posts (and member photos of their plants) in the WayBack Machine. [21:15] save the plants [21:15] https://archive.org/details/project_flowerpot_-_forums.gardenweb.com_-_california_gardening_-_2013-08-17/ [21:16] Er, that was me, with a messed up IRC nick. :-/ [21:19] There's an amazing amount of information about gardening online, freely shared on many long-running message boards, and it would be a shame not to preserve it. [21:19] indeed. [21:28] I called it (suicide leads to delete) [21:32] SketchCow: oh hi [21:35] SketchCow: we have his website that he put up prior to suicide http://archive.org/details/martinmanleylifeanddeath.com-20130816 [21:36] SketchCow: his sports blog https://archive.org/details/Sportsinreview20130816 [21:36] Good. [21:36] And his suicide page? Right? [21:37] yes [21:37] ha ha [21:37] Archive Team [21:37] both the zeroshare and the original [21:37] zeroshare mirror [21:37] https://www.youtube.com/watch?v=dsboBHlzwcs [21:37] and i think antomatic got the blog that he had prior to sportsinreview [21:40] have looked for a twitter account but couldn't find one but seems we got his digital legacy archived! [21:42] I won't crow about this one in case the family's behind the takedown. [21:43] That seems like a good idea