#archiveteam 2013-08-15,Thu

↑back Search

Time Nickname Message
00:27 🔗 wp494 [18:55:59.859] <WiK> how goes it wp494 ?
00:27 🔗 wp494 good, you?
00:29 🔗 WiK im ok, just chillin
01:12 🔗 xmc hm
01:12 🔗 xmc I have some fortunecity data taking up space that I kind of need right now
01:12 🔗 xmc well, only a gig
01:12 🔗 xmc not that then
01:24 🔗 xmc looks like I've got a few hundred gigs of orphaned warc files
01:40 🔗 omf_ pending upload xmc ?
01:47 🔗 xmc I don't know what to do with them
01:47 🔗 xmc from fortunecity, splinder, mobileme, and picplz
01:47 🔗 xmc obviously I've missed the boat on the megawarcs :P
01:49 🔗 omf_ I just upload loose warcs as new items anyway
01:50 🔗 omf_ they will get sucked in at some point
01:55 🔗 xmc aye
02:47 🔗 omf_ WiK, I am mentioning gitdigger in my docs talk coming up in september
02:48 🔗 SketchCow http://ascii.textfiles.com/archives/3974 Inside Brewster's Magnificent Contraption
02:50 🔗 WiK sweet
02:53 🔗 omf_ I am watching your bsidesLV talk
02:54 🔗 WiK ya, ill add the defcon/derbycon vids there as well when i get them
03:25 🔗 dashcloud SketchCow: what's the link to your latest talk?
03:28 🔗 SketchCow Which one?
03:36 🔗 omf_ http://www.archiveteam.org/index.php?title=Talks
03:38 🔗 S[h]O[r]T weee
03:42 🔗 S[h]O[r]T i want to upload the video with the audio from archive.org..i might just do that
03:50 🔗 dashcloud I thought you'd only done one recently. I'll check the list omf_ provided. Thanks!
04:09 🔗 SketchCow I did the one at NDSA
04:09 🔗 SketchCow I did a speech at DEFCON about the documentary, that's not out yet
04:09 🔗 SketchCow http://archive.org/details/20130724JasonScottNDSADigitalPreservation2013ArchiveTeam
04:14 🔗 ambience Looks like the draft archive team talk from DC17 isn't on there.
04:15 🔗 ambience makes sense though, it was a draft
04:16 🔗 SketchCow The skytalk?
04:16 🔗 ambience yeah
04:16 🔗 SketchCow The skytalks are not recorded, on purpose, for this reason.
04:16 🔗 ambience ah, that makes sense
04:16 🔗 SketchCow I gave a talk on engineering fame that was a real rough sketch
04:16 🔗 SketchCow It may never see the light of recorded day.
04:16 🔗 SketchCow But I could give it at skytalks
04:17 🔗 SketchCow Yeah, the whole point of skytalks is to give people a chance to fly and for people to see betas
04:17 🔗 ambience I really enjoyed the archive team skytalk, totally didn't realize that was the purpose of them
04:17 🔗 SketchCow If you were in the room, you were lucky
04:18 🔗 ambience that i was
04:18 🔗 ambience also had a cool hallway convo with you afterward. it was fun.
04:23 🔗 SketchCow I'd have to see a picture to remember you, but I'm sure I would.
04:23 🔗 SketchCow I meet a lot of people in the course of a DEFCON.
04:24 🔗 SketchCow People walking with me find it ridiculous
04:25 🔗 SketchCow http://digital-archiving.blogspot.com/2013/08/a-short-detective-story-involving-5.html?utm_source=twitterfeed&utm_medium=twitter
04:29 🔗 ambience SketchCow: one on the right. unsure if my hair was as long. it's a picture from around that time though. https://fbcdn-sphotos-f-a.akamaihd.net/hphotos-ak-prn2/167756_10150115961726320_712837_n.jpg
04:30 🔗 ambience I linked to the defcon doc on fb yesterday and you responded to one of my friends who called it self-aggrandizing, haha.
05:56 🔗 SketchCow https://archive.org/details/20130801DEFCONDocumentaryPremiereAudienceReaction
08:01 🔗 Nemo_bis windows is not anle to recognise files without extension, srsly?
08:05 🔗 ersi yeah, hehe
10:29 🔗 BlueMax Nemo_bis, it's a fair thing for Windows not to recognize extensionless files
14:26 🔗 SketchCow ambience: Yes, now I recall.
14:44 🔗 ZoeB So is anyone archiving the groups.yahoo.com messages?
15:40 🔗 Baljem hmm, that's a good question - also whether it's possible to archive all of them (ISTR when I flagged a group I used to run as moribund, it prevented public access to the group archives - WTF)
15:41 🔗 ZoeB Looking at one group I'm a member of as an example, it spans back to 2001, which would be quite a lot of information to lose...
15:42 🔗 ZoeB It seems to be publicly accessible, this example, though scraping the plaintexts of all the messages would require a fairly cunning script
15:45 🔗 ZoeB Even having only the still-active groups would be much better than nothing
16:06 🔗 ZoeB I have to go now... just an idea to think about, anyway.
17:33 🔗 SketchCow I am not sure we're archiving them. We should be.
18:03 🔗 balrog on that topic: anyone here good with perl?
18:03 🔗 balrog those are ANNOYING to archive, since most groups require you to be a member to see any of the good stuff
18:03 🔗 balrog someone wrote a perl archiver that kinda works but yahoo broke it with a recent auth change ;(
18:04 🔗 omf_ link Baljem
18:04 🔗 omf_ balrog,
18:04 🔗 omf_ I meant
18:04 🔗 balrog http://sourceforge.net/p/grabyahoogroup/code/127/tree/trunk/ is upstream; https://github.com/balr0g/grabyahoogroup/commits/master is a couple of patches I added to make it more reliable.
18:04 🔗 balrog but I don't really get perl that well
18:05 🔗 balrog usage is as follows: perl5.10 /path/to/grabyahoogroup.pl --username "user" --password "pass" --group groupname --verbose --verbose --verbose --verbose --verbose
18:05 🔗 balrog however yahoo broke auth recently
18:06 🔗 balrog the changes I made cause it to archive a lot slower but make it so you don't get 999s at all :)
18:08 🔗 omf_ Yeah I can fix up the pile of shit you inherited balrog. Looks like Perl from the 90s
18:08 🔗 omf_ I guess there is no point in making it be able to grab multiple pages at once because of Yahoo throttling
18:09 🔗 omf_ To me this is a two step problem. 1. get all the groups. 2. get each groups content
18:10 🔗 omf_ balrog, was that delay time the first one you tried?
18:10 🔗 omf_ Also do you happen to remember how long it took to get banned
18:13 🔗 balrog omf_: no it was like the second or third
18:13 🔗 balrog but it was absolutely solid.
18:13 🔗 omf_ sweet
18:14 🔗 balrog the upstream dude has been extremely busy
18:14 🔗 balrog yeah the perl looked like a pile of shit which is why I got lost looking at it ;(
18:14 🔗 omf_ ewww tons of html parsing with regex for no reason
18:14 🔗 balrog usually I'm not all that bad with new languages
18:14 🔗 balrog yep, that
18:14 🔗 omf_ this is total crap
18:14 🔗 balrog still, if I use an old auth cookie, it just works (in most cases)
18:15 🔗 omf_ are most of the groups in need of auth or are they just public
18:15 🔗 balrog can you salvage any part of this?
18:15 🔗 balrog if you care about files, database, photos, or attachments, then yes in need of auth
18:15 🔗 omf_ It tells me the structure of what to look for and follow which is very useful
18:15 🔗 balrog if you only care about messages, then it's 50/50
18:15 🔗 omf_ do you need authentication to verify a group exists?
18:15 🔗 balrog no
18:16 🔗 balrog well maybe for "private" groups
18:16 🔗 balrog http://launch.groups.yahoo.com/group/yamahadx/ is a typical group which has messages and attachments "public-available"
18:17 🔗 balrog the perl script will detect which sections you have access to and only download those
18:17 🔗 balrog so if you use it without auth on that group it should grab messages and attachments
18:17 🔗 omf_ yes examples are good. I am looking at the groups list to see how easy that will be to build up
18:17 🔗 balrog if you use it with auth it will grab the entire thing
18:17 🔗 balrog you mean to download all groups?
18:17 🔗 omf_ yes
18:17 🔗 balrog right now I'd just make it work for single groups
18:18 🔗 balrog and again I care a lot about files/photos/database since there's tons of good stuff usually
18:18 🔗 balrog the "gross hack" is to get around yahoo sometimes returning an empty message
18:18 🔗 balrog it's a horrible GOTO :p
18:19 🔗 omf_ is it still a response 200 even if the post is empty?
18:21 🔗 omf_ The other reason this is so hard to read is it is 8 separate libraries worth of code in one file
18:23 🔗 omf_ GrabYahoo, GrabYahoo::Client, GrabYahoo::Logger, GrabYahoo::Messages, GrabYahoo::Files, GrabYahoo::Attachments, GrabYahoo::Members, GrabYahoo::Photos
18:23 🔗 omf_ are all the library namespaces defined in that file. Standard practice is to only define 1 per file
18:25 🔗 omf_ How did you get the auth token from your account to try this program?
18:28 🔗 balrog by using it before yahoo changed auth
18:29 🔗 omf_ poop
18:42 🔗 balrog omf_: let me know how this goes. I wouldn't mind helping, but this perl is just unreadable ;(
18:53 🔗 DFJustin no, this perl is unreadable :) https://www.cs.cmu.edu/~dst/DeCSS/Gallery/qrpff.pl
18:56 🔗 mistym No, *this* perl is unreadable http://www.ex-parrot.com/pdw/Mail-RFC822-Address.html
18:57 🔗 DFJustin lol
19:13 🔗 DFJustin http://www.neogaf.com/forum/showthread.php?t=652647
19:29 🔗 SketchCow A bright moment in life
19:29 🔗 SketchCow Meanwhile, the archivists have their meetings
19:30 🔗 mistym We need to meet to come up with the perfect solution
19:30 🔗 mistym Guys, if we did something that turned out to not be perfect then that would be bad
19:30 🔗 mistym So let's just wait for perfection
19:31 🔗 mistym *allows literally billions of digital records to be destroyed*

irclogger-viewer