#archiveteam 2014-02-06,Thu

↑back Search

Time Nickname Message
01:25 πŸ”— dashcloud SketchCow: I'm archiving the Atari section from ftp.inf.tu-dresden.de , and some of the folders have a nice ASCII art greeting when you enter the folder. How should that be preserved for presentation on IA? (right now, I just copied it to a text file, but no idea how it should be handled on IA)
02:02 πŸ”— chfoo i think the #rawdogster grab scripts are ready for grabbing profiles
02:08 πŸ”— SketchCow chfoo: Good
02:08 πŸ”— SketchCow want to do some tests?
02:11 πŸ”— Coderjoe dashcloud: on many servers, that information is in the .message file, which the server loads and send when you enter that directory. check to see if that's the case and make sure to grab it if it is.
02:14 πŸ”— Coderjoe that looks like the case for that site
02:15 πŸ”— Coderjoe and it might be that the client requests it upon changing directories. I don't remember anymore.
02:16 πŸ”— Coderjoe example: ftp://ftp.inf.tu-dresden.de/software/atari/Checkpoint/.message
02:32 πŸ”— chfoo SketchCow: sorry, i'm a bit too tired right now. but i plan to do more testing tomrrow
02:32 πŸ”— chfoo unless someone here wants to test the scripts out, more than welcome to
02:49 πŸ”— SketchCow No problem
03:04 πŸ”— dashcloud thanks Coderjoe !
04:18 πŸ”— dashcloud hi folks, this is a pretty amazing FTP site: ftp.cs.tu-berlin.de tons of content, many mirrors of older content, and an pre-made index file to browse through all the stuff that's there- download INDEX
04:21 πŸ”— SketchCow Want me to grab the copy, or are you
04:22 πŸ”— dashcloud can you?
04:23 πŸ”— godane now this is funny
04:23 πŸ”— dashcloud the index alone is 78 MB (that's pure text)
04:23 πŸ”— godane i found a digg dialogg with timothy geithner i my wall street journal video dumps
06:35 πŸ”— RedType holy shit they have a pirated copy of matlab dated 1992 with an instruction manual from 1981 on that ftp
06:35 πŸ”— RedType Awesome.
06:36 πŸ”— RedType http://www.martinreddy.net/ukvrsig/
07:42 πŸ”— h2u WHAT FORSOOTH, PRITHEE TELL ME THE SECRET WORD
07:43 πŸ”— h2u Yes I take things litterally
07:52 πŸ”— GLaDOS yahoosucks
07:53 πŸ”— h2u haha thanks
07:54 πŸ”— nyu apparently efnet hates me
07:55 πŸ”— nyu Either that or my ident is banned
07:56 πŸ”— nyu_ so guess the french hate me, i was trying to use irc.efnet.fr
07:57 πŸ”— nyu_ Doubt ArKiver is around ?
08:28 πŸ”— SketchCow It's a little late for him.
08:28 πŸ”— SketchCow He's probably tucked into bed with his teddy bear and pudding cup.
08:29 πŸ”— midas SketchCow: would it be a idea to put who is grabbing wich FTP in the wiki?
08:30 πŸ”— SketchCow I was just going to go along grabbing things from lists that people were going to give me.
08:30 πŸ”— SketchCow But who knows where we are now.
08:30 πŸ”— SketchCow If you want to start a wiki page, sure.
08:31 πŸ”— SketchCow But right now I'm just slamming down lists.
08:31 πŸ”— midas im grabbing 5 FTP's right now but they are +12TB in size...
08:31 πŸ”— SketchCow Sweet.
08:32 πŸ”— * SketchCow is currently, this second, ingesting the talks of a January 2014 hacker conference.
08:32 πŸ”— midas how much space do you got in fos anyway? seems that that box cant be filled :p
08:33 πŸ”— SketchCow Well, it has 17tb
08:33 πŸ”— SketchCow Which is enough for me to get things off it pretty wasily.
08:39 πŸ”— SketchCow https://archive.org/details/ShmooCon2014_Attacker_Ghost_Stories (Typical talk)
08:42 πŸ”— Nemo_bis midas: even if there isn't any list, or there is one but you can't find it, noting your 5 sites on [[FTP]] (or [[Talk:FTP]]) can't harm
08:43 πŸ”— midas true, will start that in a minute, first have to get some users off my back ;-)
08:48 πŸ”— godane very funny: http://abcnews.go.com/WNT/video?id=1831172
08:48 πŸ”— godane video will not play at all
08:48 πŸ”— GLaDOS oh god slashdot
08:48 πŸ”— * joepie91 waves at midas
08:49 πŸ”— SketchCow Hooray for grabbing slashdot.
08:49 πŸ”— GLaDOS SCHOOL DONE FOR THE DAY check notifications OOH NEW EMAIL whats this SLASHDOT
08:49 πŸ”— * midas waves at joepie91
08:51 πŸ”— Konata_ hey joepie91
08:52 πŸ”— joepie91 ohai Konata
08:52 πŸ”— joepie91 ohai Konata_
08:52 πŸ”— joepie91 and hai midas :P
08:52 πŸ”— yipdw oh, yeah
08:52 πŸ”— yipdw I forgot that those are online
08:52 πŸ”— yipdw anyway, https://github.com/ArchiveTeam/slashdot-grab
08:53 πŸ”— yipdw I have outlined my preferred strategy in the STRATEGY file
08:53 πŸ”— yipdw please PR, etc.
08:53 πŸ”— yipdw we can grab user accounts and stuff later, but the real value is the discussion and stories
08:53 πŸ”— yipdw IMO
08:54 πŸ”— midas grabbing slashdot again?
08:54 πŸ”— godane so it turns out i maybe able to get some webcasts from april 2006
08:54 πŸ”— yipdw midas: if there was another grab, I don't know about it
08:55 πŸ”— midas oh, somehow i tought it was done already. no worries
08:55 πŸ”— yipdw I need to zzz for now
08:55 πŸ”— midas need to keep those archive up 2 date :p
08:55 πŸ”— yipdw that said, part (1) of STRATEGY is really "URL discovery"
08:56 πŸ”— yipdw and parts (2) and (3) are "content fetch"
08:56 πŸ”— yipdw so if that helps to think about it :P
08:59 πŸ”— midas so added a small list, doing a du -sh now to see how big the folders are again, will take about 4 hours tho :p
09:07 πŸ”— godane good news everyone
09:07 πŸ”— godane i found real server that abcnews hosts things
09:08 πŸ”— godane real path with variables to get the idea: http://cdn.ctnhd.com/storage/naeast1/abcnews.origin.cdn.level3.net/published/${year:2:4}/$month/$filename
09:09 πŸ”— midas nice
09:29 πŸ”— godane so it looks like april of 2006 for the most part will get saved
09:29 πŸ”— godane most of the world news webcast still exist and the full episodes of world news tonight still exist
10:09 πŸ”— GLaDOS Also, I propose #slashdocs for slashdo
10:09 πŸ”— GLaDOS t
10:22 πŸ”— joepie91 so, uh
10:22 πŸ”— joepie91 <incog>Hi Viddlers,
10:22 πŸ”— joepie91 <incog>In 2006, ViddlerҀ™s founding business model was based on the creation of a community site for video enthusiasts and personal sharing. At the time, our business revenue model was driven through advertising. As a Viddler community user, you were a part of this model. As time has passed Viddler is no longer able to support this offering and business model.
10:22 πŸ”— joepie91 <incog>Therefore weҀ™ve made the decision to close our free site and community effective March 11th, 2014.
10:22 πŸ”— joepie91 yesterday
10:22 πŸ”— joepie91 it looks like nobody on the interwebs has caught wind of this yet?
10:23 πŸ”— joepie91 (also, GLaDOS, have you renewed archivingyoursh.it yet :P)
10:24 πŸ”— SketchCow Talked about it here
10:45 πŸ”— GLaDOS ah, right, archivingyoursh.it
10:49 πŸ”— GLaDOS Kenshin: ot
10:49 πŸ”— GLaDOS Kenshin: whoops, sorry.
10:50 πŸ”— GLaDOS joepie91: it should be up in a few hours, just waiting on ns changes
10:51 πŸ”— joepie91 GLaDOS: :D
16:22 πŸ”— arkiver <SketchCow>It's a little late for him.
16:22 πŸ”— arkiver <SketchCow>He's probably tucked into bed with his teddy bear and pudding cup.
16:22 πŸ”— arkiver SketchCow why did you say that?
16:37 πŸ”— Jonimus arkiver: because http://radio.notacon.org/2011/shows/Fuck%20Jason%20Scott.mp3 ;)
16:53 πŸ”— joepie91 lol
17:54 πŸ”— midas right
18:50 πŸ”— Jonimus arkiver: that was a joke if you didn't catch it. ;)
20:59 πŸ”— SketchCow Viddler is now talking to me.
20:59 πŸ”— SketchCow I'm being the strong face
22:23 πŸ”— danneh_ Hey guys, writing a bit of an archive tool for a site I like
22:23 πŸ”— danneh_ Just wondering, is it good practice to grab a warc file for everything I download (images, html pages, epub files, etc)?
22:24 πŸ”— danneh_ I think it is, shouldn't add any data to the request
22:24 πŸ”— danneh_ and if I'm worried about space, I can just buy another hard drive or something :P
22:26 πŸ”— ivan` yes, WARC everything
22:27 πŸ”— danneh_ yep, figured it was the best thing to do
22:27 πŸ”— danneh_ easy importing into wayback later, if we want to
22:31 πŸ”— yipdw danneh_: if the site isn't huge, we also have a bot you can use to grab the site
22:32 πŸ”— Dud1 Are sites like ebay and amazon archied often?
22:33 πŸ”— danneh_ yipdw: it's a bit big, about 170k user stories. It uses a whole bunch of JS for comments and AJAX and all that, so figured writing a custom Py script was probably best
22:33 πŸ”— danneh_ and possibly later on, look into warrior scripts and all that jazz
22:37 πŸ”— yipdw try #archivebot
22:37 πŸ”— yipdw danneh_: we've shoved bigger things (are shoving bigger things) at it
22:37 πŸ”— yipdw unless the whole site is utterly unusable with Javascript, it's easier than coding something custom and verifying the result
22:39 πŸ”— danneh_ yipdw: fair enough. I'll hop in there after work and have a bit of a look
22:40 πŸ”— danneh_ thanks for the info!
23:02 πŸ”— SketchCow https://twitter.com/textfiles/status/430974554219888640
23:02 πŸ”— SketchCow bwah ha ha
23:02 πŸ”— SketchCow WARC is good for a lot of things, especially static data (even if generated on the server)
23:03 πŸ”— SketchCow But if it's heavy javascript or weird constructions, there's sometimes going to be problems.
23:08 πŸ”— danneh_ aha, makes sense. most of it's static files (including static js, it uses ?random to force redownload every time, but I just strip it to the bare url), so I'm not too worried for now

irclogger-viewer