Time |
Nickname |
Message |
00:27
🔗
|
wp494 |
[18:55:59.859] <WiK> how goes it wp494 ? |
00:27
🔗
|
wp494 |
good, you? |
00:29
🔗
|
WiK |
im ok, just chillin |
01:12
🔗
|
xmc |
hm |
01:12
🔗
|
xmc |
I have some fortunecity data taking up space that I kind of need right now |
01:12
🔗
|
xmc |
well, only a gig |
01:12
🔗
|
xmc |
not that then |
01:24
🔗
|
xmc |
looks like I've got a few hundred gigs of orphaned warc files |
01:40
🔗
|
omf_ |
pending upload xmc ? |
01:47
🔗
|
xmc |
I don't know what to do with them |
01:47
🔗
|
xmc |
from fortunecity, splinder, mobileme, and picplz |
01:47
🔗
|
xmc |
obviously I've missed the boat on the megawarcs :P |
01:49
🔗
|
omf_ |
I just upload loose warcs as new items anyway |
01:50
🔗
|
omf_ |
they will get sucked in at some point |
01:55
🔗
|
xmc |
aye |
02:47
🔗
|
omf_ |
WiK, I am mentioning gitdigger in my docs talk coming up in september |
02:48
🔗
|
SketchCow |
http://ascii.textfiles.com/archives/3974 Inside Brewster's Magnificent Contraption |
02:50
🔗
|
WiK |
sweet |
02:53
🔗
|
omf_ |
I am watching your bsidesLV talk |
02:54
🔗
|
WiK |
ya, ill add the defcon/derbycon vids there as well when i get them |
03:25
🔗
|
dashcloud |
SketchCow: what's the link to your latest talk? |
03:28
🔗
|
SketchCow |
Which one? |
03:36
🔗
|
omf_ |
http://www.archiveteam.org/index.php?title=Talks |
03:38
🔗
|
S[h]O[r]T |
weee |
03:42
🔗
|
S[h]O[r]T |
i want to upload the video with the audio from archive.org..i might just do that |
03:50
🔗
|
dashcloud |
I thought you'd only done one recently. I'll check the list omf_ provided. Thanks! |
04:09
🔗
|
SketchCow |
I did the one at NDSA |
04:09
🔗
|
SketchCow |
I did a speech at DEFCON about the documentary, that's not out yet |
04:09
🔗
|
SketchCow |
http://archive.org/details/20130724JasonScottNDSADigitalPreservation2013ArchiveTeam |
04:14
🔗
|
ambience |
Looks like the draft archive team talk from DC17 isn't on there. |
04:15
🔗
|
ambience |
makes sense though, it was a draft |
04:16
🔗
|
SketchCow |
The skytalk? |
04:16
🔗
|
ambience |
yeah |
04:16
🔗
|
SketchCow |
The skytalks are not recorded, on purpose, for this reason. |
04:16
🔗
|
ambience |
ah, that makes sense |
04:16
🔗
|
SketchCow |
I gave a talk on engineering fame that was a real rough sketch |
04:16
🔗
|
SketchCow |
It may never see the light of recorded day. |
04:16
🔗
|
SketchCow |
But I could give it at skytalks |
04:17
🔗
|
SketchCow |
Yeah, the whole point of skytalks is to give people a chance to fly and for people to see betas |
04:17
🔗
|
ambience |
I really enjoyed the archive team skytalk, totally didn't realize that was the purpose of them |
04:17
🔗
|
SketchCow |
If you were in the room, you were lucky |
04:18
🔗
|
ambience |
that i was |
04:18
🔗
|
ambience |
also had a cool hallway convo with you afterward. it was fun. |
04:23
🔗
|
SketchCow |
I'd have to see a picture to remember you, but I'm sure I would. |
04:23
🔗
|
SketchCow |
I meet a lot of people in the course of a DEFCON. |
04:24
🔗
|
SketchCow |
People walking with me find it ridiculous |
04:25
🔗
|
SketchCow |
http://digital-archiving.blogspot.com/2013/08/a-short-detective-story-involving-5.html?utm_source=twitterfeed&utm_medium=twitter |
04:29
🔗
|
ambience |
SketchCow: one on the right. unsure if my hair was as long. it's a picture from around that time though. https://fbcdn-sphotos-f-a.akamaihd.net/hphotos-ak-prn2/167756_10150115961726320_712837_n.jpg |
04:30
🔗
|
ambience |
I linked to the defcon doc on fb yesterday and you responded to one of my friends who called it self-aggrandizing, haha. |
05:56
🔗
|
SketchCow |
https://archive.org/details/20130801DEFCONDocumentaryPremiereAudienceReaction |
08:01
🔗
|
Nemo_bis |
windows is not anle to recognise files without extension, srsly? |
08:05
🔗
|
ersi |
yeah, hehe |
10:29
🔗
|
BlueMax |
Nemo_bis, it's a fair thing for Windows not to recognize extensionless files |
14:26
🔗
|
SketchCow |
ambience: Yes, now I recall. |
14:44
🔗
|
ZoeB |
So is anyone archiving the groups.yahoo.com messages? |
15:40
🔗
|
Baljem |
hmm, that's a good question - also whether it's possible to archive all of them (ISTR when I flagged a group I used to run as moribund, it prevented public access to the group archives - WTF) |
15:41
🔗
|
ZoeB |
Looking at one group I'm a member of as an example, it spans back to 2001, which would be quite a lot of information to lose... |
15:42
🔗
|
ZoeB |
It seems to be publicly accessible, this example, though scraping the plaintexts of all the messages would require a fairly cunning script |
15:45
🔗
|
ZoeB |
Even having only the still-active groups would be much better than nothing |
16:06
🔗
|
ZoeB |
I have to go now... just an idea to think about, anyway. |
17:33
🔗
|
SketchCow |
I am not sure we're archiving them. We should be. |
18:03
🔗
|
balrog |
on that topic: anyone here good with perl? |
18:03
🔗
|
balrog |
those are ANNOYING to archive, since most groups require you to be a member to see any of the good stuff |
18:03
🔗
|
balrog |
someone wrote a perl archiver that kinda works but yahoo broke it with a recent auth change ;( |
18:04
🔗
|
omf_ |
link Baljem |
18:04
🔗
|
omf_ |
balrog, |
18:04
🔗
|
omf_ |
I meant |
18:04
🔗
|
balrog |
http://sourceforge.net/p/grabyahoogroup/code/127/tree/trunk/ is upstream; https://github.com/balr0g/grabyahoogroup/commits/master is a couple of patches I added to make it more reliable. |
18:04
🔗
|
balrog |
but I don't really get perl that well |
18:05
🔗
|
balrog |
usage is as follows: perl5.10 /path/to/grabyahoogroup.pl --username "user" --password "pass" --group groupname --verbose --verbose --verbose --verbose --verbose |
18:05
🔗
|
balrog |
however yahoo broke auth recently |
18:06
🔗
|
balrog |
the changes I made cause it to archive a lot slower but make it so you don't get 999s at all :) |
18:08
🔗
|
omf_ |
Yeah I can fix up the pile of shit you inherited balrog. Looks like Perl from the 90s |
18:08
🔗
|
omf_ |
I guess there is no point in making it be able to grab multiple pages at once because of Yahoo throttling |
18:09
🔗
|
omf_ |
To me this is a two step problem. 1. get all the groups. 2. get each groups content |
18:10
🔗
|
omf_ |
balrog, was that delay time the first one you tried? |
18:10
🔗
|
omf_ |
Also do you happen to remember how long it took to get banned |
18:13
🔗
|
balrog |
omf_: no it was like the second or third |
18:13
🔗
|
balrog |
but it was absolutely solid. |
18:13
🔗
|
omf_ |
sweet |
18:14
🔗
|
balrog |
the upstream dude has been extremely busy |
18:14
🔗
|
balrog |
yeah the perl looked like a pile of shit which is why I got lost looking at it ;( |
18:14
🔗
|
omf_ |
ewww tons of html parsing with regex for no reason |
18:14
🔗
|
balrog |
usually I'm not all that bad with new languages |
18:14
🔗
|
balrog |
yep, that |
18:14
🔗
|
omf_ |
this is total crap |
18:14
🔗
|
balrog |
still, if I use an old auth cookie, it just works (in most cases) |
18:15
🔗
|
omf_ |
are most of the groups in need of auth or are they just public |
18:15
🔗
|
balrog |
can you salvage any part of this? |
18:15
🔗
|
balrog |
if you care about files, database, photos, or attachments, then yes in need of auth |
18:15
🔗
|
omf_ |
It tells me the structure of what to look for and follow which is very useful |
18:15
🔗
|
balrog |
if you only care about messages, then it's 50/50 |
18:15
🔗
|
omf_ |
do you need authentication to verify a group exists? |
18:15
🔗
|
balrog |
no |
18:16
🔗
|
balrog |
well maybe for "private" groups |
18:16
🔗
|
balrog |
http://launch.groups.yahoo.com/group/yamahadx/ is a typical group which has messages and attachments "public-available" |
18:17
🔗
|
balrog |
the perl script will detect which sections you have access to and only download those |
18:17
🔗
|
balrog |
so if you use it without auth on that group it should grab messages and attachments |
18:17
🔗
|
omf_ |
yes examples are good. I am looking at the groups list to see how easy that will be to build up |
18:17
🔗
|
balrog |
if you use it with auth it will grab the entire thing |
18:17
🔗
|
balrog |
you mean to download all groups? |
18:17
🔗
|
omf_ |
yes |
18:17
🔗
|
balrog |
right now I'd just make it work for single groups |
18:18
🔗
|
balrog |
and again I care a lot about files/photos/database since there's tons of good stuff usually |
18:18
🔗
|
balrog |
the "gross hack" is to get around yahoo sometimes returning an empty message |
18:18
🔗
|
balrog |
it's a horrible GOTO :p |
18:19
🔗
|
omf_ |
is it still a response 200 even if the post is empty? |
18:21
🔗
|
omf_ |
The other reason this is so hard to read is it is 8 separate libraries worth of code in one file |
18:23
🔗
|
omf_ |
GrabYahoo, GrabYahoo::Client, GrabYahoo::Logger, GrabYahoo::Messages, GrabYahoo::Files, GrabYahoo::Attachments, GrabYahoo::Members, GrabYahoo::Photos |
18:23
🔗
|
omf_ |
are all the library namespaces defined in that file. Standard practice is to only define 1 per file |
18:25
🔗
|
omf_ |
How did you get the auth token from your account to try this program? |
18:28
🔗
|
balrog |
by using it before yahoo changed auth |
18:29
🔗
|
omf_ |
poop |
18:42
🔗
|
balrog |
omf_: let me know how this goes. I wouldn't mind helping, but this perl is just unreadable ;( |
18:53
🔗
|
DFJustin |
no, this perl is unreadable :) https://www.cs.cmu.edu/~dst/DeCSS/Gallery/qrpff.pl |
18:56
🔗
|
mistym |
No, *this* perl is unreadable http://www.ex-parrot.com/pdw/Mail-RFC822-Address.html |
18:57
🔗
|
DFJustin |
lol |
19:13
🔗
|
DFJustin |
http://www.neogaf.com/forum/showthread.php?t=652647 |
19:29
🔗
|
SketchCow |
A bright moment in life |
19:29
🔗
|
SketchCow |
Meanwhile, the archivists have their meetings |
19:30
🔗
|
mistym |
We need to meet to come up with the perfect solution |
19:30
🔗
|
mistym |
Guys, if we did something that turned out to not be perfect then that would be bad |
19:30
🔗
|
mistym |
So let's just wait for perfection |
19:31
🔗
|
mistym |
*allows literally billions of digital records to be destroyed* |