#archiveteam 2013-07-27,Sat

↑back Search

Time Nickname Message
02:49 πŸ”— kyan Hello! I was just looking about at Poetry.com, and it looks like the new management has put most of the old (14million) poems back up, in some sort of "old poems" thing. Is there any project underway to scrape them all?
03:01 πŸ”— GLaDOS At the moment, no, there's no current project to scrape them all.
03:01 πŸ”— GLaDOS Although, that might be a good idea to do..
03:01 πŸ”— omf_ I agree
03:02 πŸ”— omf_ we shall do some research ^_^
18:03 πŸ”— chivist hi there- new member here
18:03 πŸ”— joepie91 why hello there :P
18:04 πŸ”— joepie91 right, to continue on our conversation - in the future, consider using wget + warc for archiving a site... it also saves error responses, headers, etc.
18:04 πŸ”— chivist saved a bunch of websites with httrack, how can I share them?
18:04 πŸ”— winr4r chivist: WHY HELLO THERE
18:04 πŸ”— joepie91 so it gives you a full snapshot of a site
18:04 πŸ”— joepie91 as for your question, my suggestion would be "upload them to IA" but perhaps winr4r has a different insight
18:04 πŸ”— winr4r nope, that's what i was gonna say too
18:05 πŸ”— joepie91 right :P
18:05 πŸ”— winr4r it was gonna be "chivist: upload them to "community texts" on archive.org and tell SketchCow to put them into the archive team collection"
18:05 πŸ”— joepie91 well, that's -somewhat- more specific...
18:05 πŸ”— joepie91 (than what I said)
18:06 πŸ”— winr4r or just talk to SketchCow to get some FTP/rsync space and he'll take care of it
18:06 πŸ”— winr4r what did you grab, chivist?
18:06 πŸ”— chivist tbd.com, a ny-based news website
18:06 πŸ”— chivist they put a redirect on the main page, so I archived individual sections
18:06 πŸ”— winr4r oh, it closed?
18:07 πŸ”— chivist yea the journos there complained it was down, so the company put it -somewhat- back online for a couple of days
18:07 πŸ”— joepie91 aside from that, chivist, I'm not sure if you've used wget before, but in the newer versions there's support for directly saving to WARC - you'll want to use the --mirror switch to put it into mirror mode, and specify --warc-file=something.warc.gz to indicate the filename to save it to... you can then upload the warc.gz to the Internet Archive and it'll have all the important data, and will be importable into the Wayback Machine
18:07 πŸ”— winr4r good goddamn
18:07 πŸ”— joepie91 (well, that took a while to type...)
18:08 πŸ”— chivist yea I'm relatively new to archiving
18:08 πŸ”— winr4r they could have just dumped a static version of it and put it online for like $5/month
18:08 πŸ”— joepie91 chivist: we've all been there :)
18:08 πŸ”— joepie91 well, except for SketchCow maybe
18:08 πŸ”— joepie91 I think he was born an archivist
18:09 πŸ”— chivist anyway the archive team is pretty amazing
18:09 πŸ”— winr4r SAVES COPY OF DNA ON THE WAY OUT
18:09 πŸ”— winr4r chivist: yeah we are fucking awesome
18:09 πŸ”— winr4r stick around
18:09 πŸ”— chivist I'm Canadian
18:09 πŸ”— chivist and we lost a couple great websites in the last year
18:09 πŸ”— winr4r chivist: i forgive you!
18:09 πŸ”— chivist (ah!)
18:09 πŸ”— winr4r aw :/
18:10 πŸ”— chivist I managed to grab a Quebec news website before it went down
18:11 πŸ”— chivist fucking Quebecor, shut down like 8 different papers in Canada
18:13 πŸ”— winr4r fuck man
18:13 πŸ”— winr4r although
18:13 πŸ”— winr4r news sites, in general, don't seem to have an idea of their importance
18:13 πŸ”— winr4r because for the most part
18:14 πŸ”— winr4r once you get outside of the big ones, they fucking SUCK at switching CMSes
18:14 πŸ”— chivist ahah
18:14 πŸ”— chivist ever heard about the CBC?
18:14 πŸ”— chivist they're the national broadcasting corporation
18:14 πŸ”— chivist and their CMS dates from 2003
18:14 πŸ”— winr4r it's like every six months, okay switching CMS and they don't fucking CARE that every link is broken
18:15 πŸ”— winr4r hey, did you know SketchCow (our jason) was on CBC? :)
18:15 πŸ”— winr4r OBVIOUSLY I SAVED A LOCAL COPY
18:15 πŸ”— chivist no! do you have a link?
18:16 πŸ”— winr4r ONE SEC
18:16 πŸ”— winr4r http://j5.video2.blip.tv/9030006914560/Dmisener-JasonScottInterview888.mp3?ir=13096&sr=131
18:17 πŸ”— joepie91 >video2.blip.tv
18:17 πŸ”— joepie91 >mp3
18:17 πŸ”— joepie91 ok.png
18:17 πŸ”— winr4r joepie91: you're adorable
18:17 πŸ”— joepie91 D:
18:17 πŸ”— winr4r this was the full uncut interview
18:17 πŸ”— chivist how much data have you saved so far>
18:17 πŸ”— chivist ?
18:18 πŸ”— winr4r chivist: at a guess, over half a petabyte
18:18 πŸ”— winr4r i remember jason saying that he, personally, had put about 200 terabytes into archive.org
18:18 πŸ”— chivist gimme a second, I'm picking up my jaw on the floor
18:18 πŸ”— chivist HALF A PETABYTE
18:18 πŸ”— winr4r or some crazy-ass figure like that
18:19 πŸ”— winr4r mobileme alone, was 272 terabytes
18:20 πŸ”— winr4r and jason has just been pumping stuff almost 24/7
18:20 πŸ”— winr4r ("almost" because he has to sleep once a month)
18:20 πŸ”— joepie91 lol
18:21 πŸ”— chivist ahah
18:21 πŸ”— chivist does he have a dedicated server grabbing websites?
18:21 πŸ”— chivist I can't believe how much our internet sucks over here
18:21 πŸ”— winr4r chivist: probably
18:23 πŸ”— winr4r chivist: so how did you find us? :)
18:23 πŸ”— chivist I was looking at the Wayback machine
18:23 πŸ”— chivist and found you on archive.org
18:23 πŸ”— winr4r ah, gotcha
18:24 πŸ”— winr4r what are your talents? :)
18:24 πŸ”— chivist well
18:25 πŸ”— chivist besides eating a lot of cheese
18:25 πŸ”— chivist I don't really have any talents
18:25 πŸ”— winr4r it's okay, that was not some test to see if you are worthy
18:26 πŸ”— winr4r we're self-described as "rogue archivists, programmers, writers and loudmouths", we need loudmouths too :)
18:26 πŸ”— joepie91 cheese!
18:27 πŸ”— chivist well I'm more like a quiet guy
18:27 πŸ”— chivist very sneaky
18:28 πŸ”— winr4r chivist: are you a blogger, or a twitterer or anything?
18:28 πŸ”— winr4r you can do a lot for us just by spreading the word
18:29 πŸ”— chivist yes a journalist but I don't want to mix both- the bullshit concept of "objectiveness" prevents me from being involved publicly in a lot of things
18:30 πŸ”— winr4r fuck yeahhhh
18:30 πŸ”— winr4r go write a story about us
18:30 πŸ”— winr4r because we're awesome
18:30 πŸ”— chivist it's on my to-do list
18:30 πŸ”— winr4r you can do a whole lot by just spreading the word
18:30 πŸ”— winr4r PUT IT ON YOUR "DO NOW" LIST MOTHERFUCKER
18:30 πŸ”— chivist We don't have WIRED in canada
18:31 πŸ”— chivist so most technology stories areҀ¦ lame
18:31 πŸ”— chivist yea yea
18:31 πŸ”— * joepie91 takes winr4r, puts him down in the corner, and gives him a cookie
18:31 πŸ”— joepie91 :P
18:31 πŸ”— * winr4r snuggles joepie91.
18:31 πŸ”— chivist well there are tons of things happening, nobody reporting on it
18:31 πŸ”— winr4r chivist: jason is so hilarious that you *need* him in whatever publication you are doing
18:32 πŸ”— chivist SketchCow?
18:32 πŸ”— winr4r yup, that's our jason!
18:33 πŸ”— chivist so I'm looking at Wget
18:33 πŸ”— chivist the benefit of WARC is that it can be used in the Waybackmachine?
18:33 πŸ”— SmileyG yes, or anyone else can use it
18:33 πŸ”— winr4r chivist: yes
18:33 πŸ”— joepie91 that, and it holds more data
18:33 πŸ”— SmileyG theres various tools that will load them
18:33 πŸ”— joepie91 it has all the headers, for example, iirc
18:33 πŸ”— winr4r it saves the headers and other metabollocks
18:33 πŸ”— joepie91 and can store error pages
18:34 πŸ”— joepie91 so even the error pages are archived!
18:34 πŸ”— chivist so do I run Wget first
18:34 πŸ”— chivist then WARC
18:34 πŸ”— winr4r chivist: psst go listen to that interview, because jason can just TALK FOREVER
18:34 πŸ”— chivist or both
18:34 πŸ”— SmileyG chivist: you run a wget which has warc support and outputs a warc
18:35 πŸ”— joepie91 WARC is a format, not an application :)
18:35 πŸ”— chivist oook
18:35 πŸ”— winr4r chivist: WARC is an output format, wget can save WARCs
18:35 πŸ”— joepie91 <joepie91>aside from that, chivist, I'm not sure if you've used wget before, but in the newer versions there's support for directly saving to WARC - you'll want to use the --mirror switch to put it into mirror mode, and specify --warc-file=something.warc.gz to indicate the filename to save it to... you can then upload the warc.gz to the Internet Archive and it'll have all the important data, and will be importable into the Wayback Machine
18:35 πŸ”— joepie91 also refering back to my earlier
18:36 πŸ”— * winr4r pets joepie91
18:36 πŸ”— chivist I'm looking at the archive team page about it
18:36 πŸ”— chivist random question: where do you store everything?
18:36 πŸ”— winr4r chivist: archive.org
18:36 πŸ”— chivist I mean, do you rent servers/use your own
18:36 πŸ”— winr4r if you like giving to charity, archive.org is the best value-for-money that there is
18:37 πŸ”— winr4r $1.5 million a year for storing petabytes of shit
18:37 πŸ”— SmileyG chivist: we have some members who have vps's which they allow us to use.
18:37 πŸ”— winr4r SmileyG: we use that intermediately
18:37 πŸ”— chivist I assume you don't have an office right?
18:37 πŸ”— chivist beside IRC
18:37 πŸ”— winr4r lol no
18:38 πŸ”— chivist with a giant poster "We're going to rescue your shit"
18:38 πŸ”— winr4r we're a bunch of folks from all over the world
18:38 πŸ”— winr4r #archiveteam IS our office, chivist
18:38 πŸ”— SmileyG We have Jason's home? :D
18:39 πŸ”— joepie91 I guess Jasons information cube is kind of an office
18:39 πŸ”— joepie91 except only the CEO works there
18:39 πŸ”— joepie91 :p
18:39 πŸ”— winr4r :D
18:39 πŸ”— winr4r yes
18:39 πŸ”— SmileyG GRAND HIGH POOHBAR.
18:40 πŸ”— winr4r chivist: we're not an organisation in the normal sense of folks who get together in person and then do things
18:40 πŸ”— winr4r we're more like a global lynch mob
18:40 πŸ”— chivist nice
18:42 πŸ”— winr4r we are to the library of congress or any other archive what a court is to 122 guys in rigger boots
18:44 πŸ”— winr4r anyway YOU ARE A JOURNALIST, go interview jason, he's fucking awesome
18:47 πŸ”— chivist ever had issues with copyright trolls?
18:48 πŸ”— winr4r chivist: nope
18:48 πŸ”— SmileyG Yes, No, we don't give a fuck?
18:48 πŸ”— Famicoman I find this exchange funny
18:48 πŸ”— SmileyG chivist: is this an offical interview?
18:48 πŸ”— winr4r chivist: nobody does that because they don't want to die
18:49 πŸ”— winr4r if they want their stuff removed, we're happy to do it
18:49 πŸ”— joepie91 SmileyG: I don't think it is? at least, that's not how it started :P
18:49 πŸ”— chivist definitely not
18:49 πŸ”— SmileyG IA will blackout anything with a valid request to do so.
18:49 πŸ”— SmileyG chivist: then fine, we can continue to chat :D
18:49 πŸ”— winr4r (and by "die", i mean "suicide by email")
18:50 πŸ”— chivist I'm pretty transparent about interviewing/sources/etc
18:50 πŸ”— winr4r chivist: :)
18:50 πŸ”— SmileyG Ok
18:50 πŸ”— SmileyG just don't think you can quote me without asking :D
18:50 πŸ”— winr4r you can quote me on anything because i am awesome
18:50 πŸ”— * winr4r pets SmileyG
18:51 πŸ”— chivist also, most news outlets want names
18:51 πŸ”— chivist which can be REALLY annoying
18:51 πŸ”— SmileyG you can find my name quite easily.
18:51 πŸ”— winr4r same here
18:51 πŸ”— winr4r Lewis Collard if you want a name
18:51 πŸ”— SmileyG shush lewis
18:51 πŸ”— winr4r SORRY SMILEY
18:51 πŸ”— SmileyG Oh, and we all have no offical sanction to speak on behalf of Archive Team either.
18:52 πŸ”— SmileyG Now i've said that, I think I can say wtf I want?
18:52 πŸ”— * joepie91 points at his doxedness
18:52 πŸ”— winr4r yeah, who gets to speak for archive team is ill-defined
18:52 πŸ”— winr4r it's mostly jason
18:52 πŸ”— chivist https://si0.twimg.com/profile_images/1855468868/DSC_0192-square.JPG
18:52 πŸ”— chivist nice hat btw
18:53 πŸ”— joepie91 winr4r: benevolent dictator kind of thing I guess? :P
18:53 πŸ”— winr4r i started wearing hats again because jason makes hats cool again http://i.imgur.com/2wre4pN.jpg
18:53 πŸ”— winr4r joepie91: yes
18:54 πŸ”— SmileyG http://i.huffpost.com/gen/1058928/thumbs/r-JASON-SCOTT-ARCHIVE-TEAM-large570.jpg?9
18:54 πŸ”— SmileyG thats a HAT>
18:54 πŸ”— winr4r chivist: hey you figured out my twitter <3
18:55 πŸ”— winr4r SmileyG: jason can pull that off
18:55 πŸ”— SmileyG and as we veer wildly off topic, can we take it to #archiveteam-bs please ;)
18:56 πŸ”— winr4r yes, we can
19:30 πŸ”— SketchCow You people never stop talkin' about me.
19:30 πŸ”— SketchCow I'm interviewing Apple II nerds!
19:32 πŸ”— SmileyG :D
19:33 πŸ”— andy0 http://www.archiveteam.org/index.php?title=ArchiveBox -> Became Archive Warrior?
19:34 πŸ”— SketchCow No.
19:34 πŸ”— SketchCow But the philosophy was the same.
19:34 πŸ”— SketchCow Easier access to contribute downloading abilities that would produce the best data.
19:35 πŸ”— andy0 was an 'ArchiveBox' for debian made?
19:35 πŸ”— andy0 I like asking questions
19:35 πŸ”— andy0 ( Γ’ΒˆΒ™_Γ’ΒˆΒ™)>Γ’ΒŒΒΓ’Β–Β -Γ’Β–Β  (Γ’ΒŒΒΓ’Β–Β _Γ’Β–Β )
19:38 πŸ”— SketchCow I don't know, you'd have to chase it down
19:45 πŸ”— joepie91 SketchCow: you're misunderstanding. we're just always talking about you when you're not here :)
21:59 πŸ”— yipdw hmm
21:59 πŸ”— yipdw hey guys, can we remove this from the AT Github account? https://github.com/ArchiveTeam/heroku-buildpack-archiveteam
21:59 πŸ”— yipdw I ask because we also have https://github.com/ArchiveTeam/heroku-buildpack-archiveteam-warrior
21:59 πŸ”— yipdw which is much more recent and uses wget-lua vs. wget-warc
22:00 πŸ”— yipdw or, if not remove, I guess I can throw in a note that says "don't use this, use that instead"
22:02 πŸ”— * yipdw does so

irclogger-viewer