[00:27] [18:55:59.859] how goes it wp494 ? [00:27] good, you? [00:29] im ok, just chillin [01:12] hm [01:12] I have some fortunecity data taking up space that I kind of need right now [01:12] well, only a gig [01:12] not that then [01:24] looks like I've got a few hundred gigs of orphaned warc files [01:40] pending upload xmc ? [01:47] I don't know what to do with them [01:47] from fortunecity, splinder, mobileme, and picplz [01:47] obviously I've missed the boat on the megawarcs :P [01:49] I just upload loose warcs as new items anyway [01:50] they will get sucked in at some point [01:55] aye [02:47] WiK, I am mentioning gitdigger in my docs talk coming up in september [02:48] http://ascii.textfiles.com/archives/3974 Inside Brewster's Magnificent Contraption [02:50] sweet [02:53] I am watching your bsidesLV talk [02:54] ya, ill add the defcon/derbycon vids there as well when i get them [03:25] SketchCow: what's the link to your latest talk? [03:28] Which one? [03:36] http://www.archiveteam.org/index.php?title=Talks [03:38] weee [03:42] i want to upload the video with the audio from archive.org..i might just do that [03:50] I thought you'd only done one recently. I'll check the list omf_ provided. Thanks! [04:09] I did the one at NDSA [04:09] I did a speech at DEFCON about the documentary, that's not out yet [04:09] http://archive.org/details/20130724JasonScottNDSADigitalPreservation2013ArchiveTeam [04:14] Looks like the draft archive team talk from DC17 isn't on there. [04:15] makes sense though, it was a draft [04:16] The skytalk? [04:16] yeah [04:16] The skytalks are not recorded, on purpose, for this reason. [04:16] ah, that makes sense [04:16] I gave a talk on engineering fame that was a real rough sketch [04:16] It may never see the light of recorded day. [04:16] But I could give it at skytalks [04:17] Yeah, the whole point of skytalks is to give people a chance to fly and for people to see betas [04:17] I really enjoyed the archive team skytalk, totally didn't realize that was the purpose of them [04:17] If you were in the room, you were lucky [04:18] that i was [04:18] also had a cool hallway convo with you afterward. it was fun. [04:23] I'd have to see a picture to remember you, but I'm sure I would. [04:23] I meet a lot of people in the course of a DEFCON. [04:24] People walking with me find it ridiculous [04:25] http://digital-archiving.blogspot.com/2013/08/a-short-detective-story-involving-5.html?utm_source=twitterfeed&utm_medium=twitter [04:29] SketchCow: one on the right. unsure if my hair was as long. it's a picture from around that time though. https://fbcdn-sphotos-f-a.akamaihd.net/hphotos-ak-prn2/167756_10150115961726320_712837_n.jpg [04:30] I linked to the defcon doc on fb yesterday and you responded to one of my friends who called it self-aggrandizing, haha. [05:56] https://archive.org/details/20130801DEFCONDocumentaryPremiereAudienceReaction [08:01] windows is not anle to recognise files without extension, srsly? [08:05] yeah, hehe [10:29] Nemo_bis, it's a fair thing for Windows not to recognize extensionless files [14:26] ambience: Yes, now I recall. [14:44] So is anyone archiving the groups.yahoo.com messages? [15:40] hmm, that's a good question - also whether it's possible to archive all of them (ISTR when I flagged a group I used to run as moribund, it prevented public access to the group archives - WTF) [15:41] Looking at one group I'm a member of as an example, it spans back to 2001, which would be quite a lot of information to lose... [15:42] It seems to be publicly accessible, this example, though scraping the plaintexts of all the messages would require a fairly cunning script [15:45] Even having only the still-active groups would be much better than nothing [16:06] I have to go now... just an idea to think about, anyway. [17:33] I am not sure we're archiving them. We should be. [18:03] on that topic: anyone here good with perl? [18:03] those are ANNOYING to archive, since most groups require you to be a member to see any of the good stuff [18:03] someone wrote a perl archiver that kinda works but yahoo broke it with a recent auth change ;( [18:04] link Baljem [18:04] balrog, [18:04] I meant [18:04] http://sourceforge.net/p/grabyahoogroup/code/127/tree/trunk/ is upstream; https://github.com/balr0g/grabyahoogroup/commits/master is a couple of patches I added to make it more reliable. [18:04] but I don't really get perl that well [18:05] usage is as follows: perl5.10 /path/to/grabyahoogroup.pl --username "user" --password "pass" --group groupname --verbose --verbose --verbose --verbose --verbose [18:05] however yahoo broke auth recently [18:06] the changes I made cause it to archive a lot slower but make it so you don't get 999s at all :) [18:08] Yeah I can fix up the pile of shit you inherited balrog. Looks like Perl from the 90s [18:08] I guess there is no point in making it be able to grab multiple pages at once because of Yahoo throttling [18:09] To me this is a two step problem. 1. get all the groups. 2. get each groups content [18:10] balrog, was that delay time the first one you tried? [18:10] Also do you happen to remember how long it took to get banned [18:13] omf_: no it was like the second or third [18:13] but it was absolutely solid. [18:13] sweet [18:14] the upstream dude has been extremely busy [18:14] yeah the perl looked like a pile of shit which is why I got lost looking at it ;( [18:14] ewww tons of html parsing with regex for no reason [18:14] usually I'm not all that bad with new languages [18:14] yep, that [18:14] this is total crap [18:14] still, if I use an old auth cookie, it just works (in most cases) [18:15] are most of the groups in need of auth or are they just public [18:15] can you salvage any part of this? [18:15] if you care about files, database, photos, or attachments, then yes in need of auth [18:15] It tells me the structure of what to look for and follow which is very useful [18:15] if you only care about messages, then it's 50/50 [18:15] do you need authentication to verify a group exists? [18:15] no [18:16] well maybe for "private" groups [18:16] http://launch.groups.yahoo.com/group/yamahadx/ is a typical group which has messages and attachments "public-available" [18:17] the perl script will detect which sections you have access to and only download those [18:17] so if you use it without auth on that group it should grab messages and attachments [18:17] yes examples are good. I am looking at the groups list to see how easy that will be to build up [18:17] if you use it with auth it will grab the entire thing [18:17] you mean to download all groups? [18:17] yes [18:17] right now I'd just make it work for single groups [18:18] and again I care a lot about files/photos/database since there's tons of good stuff usually [18:18] the "gross hack" is to get around yahoo sometimes returning an empty message [18:18] it's a horrible GOTO :p [18:19] is it still a response 200 even if the post is empty? [18:21] The other reason this is so hard to read is it is 8 separate libraries worth of code in one file [18:23] GrabYahoo, GrabYahoo::Client, GrabYahoo::Logger, GrabYahoo::Messages, GrabYahoo::Files, GrabYahoo::Attachments, GrabYahoo::Members, GrabYahoo::Photos [18:23] are all the library namespaces defined in that file. Standard practice is to only define 1 per file [18:25] How did you get the auth token from your account to try this program? [18:28] by using it before yahoo changed auth [18:29] poop [18:42] omf_: let me know how this goes. I wouldn't mind helping, but this perl is just unreadable ;( [18:53] no, this perl is unreadable :) https://www.cs.cmu.edu/~dst/DeCSS/Gallery/qrpff.pl [18:56] No, *this* perl is unreadable http://www.ex-parrot.com/pdw/Mail-RFC822-Address.html [18:57] lol [19:13] http://www.neogaf.com/forum/showthread.php?t=652647 [19:29] A bright moment in life [19:29] Meanwhile, the archivists have their meetings [19:30] We need to meet to come up with the perfect solution [19:30] Guys, if we did something that turned out to not be perfect then that would be bad [19:30] So let's just wait for perfection [19:31] *allows literally billions of digital records to be destroyed*