[17:19] morning. [17:19] slow day! [17:19] I am at the digital preservation 2013 conference. [17:20] the one where we're getting an award? [17:20] hey SketchCow [17:21] yes. award blowjob visit. [17:21] keynote is bit.ly lead scientist. [17:22] "Get your popcorn, Iâm speaking after the lead engineer of bit.ly." [17:22] haaahaha [17:22] i hope this is being streamed somewhere [17:22] going to put urlte.am up [17:22] next to me is aaron cope who co designed flickr [17:24] SketchCow: Haha, scientist! [17:25] Also fucking fuck, I should bring up urlteam's tracker >_> [17:26] you have an hour. [17:30] I'll see what I can do. (At least http://urlte.am/ is up) [17:37] SketchCow: are you recording? should probably go here when it's done: http://archiveteam.org/index.php?title=Talks [17:37] also that page isn't linked from anywhere [18:03] we should fix that. [18:04] You mean the "isn't linked from anywhere" issue? [18:07] * winr4r fixed it already [18:07] * ersi pats winr4r on the back [18:16] I am dedicating august to catchup. [18:17] I will need help with book scanning. [19:12] what should I put in the "Choose an identifier" field when uploading a wgetted site? [19:15] it doesn't really matter, I usually do something like example.com_20130723 [19:16] and I can use ftp? [19:16] it does say "You may upload your audio, movie, or text file using an FTP Client" [19:17] it is possible, godane uses it, s3 is preferred though [19:17] no info about the s3 uploader on the site [19:18] http://archive.org/help/abouts3.txt [19:19] well I'm already using ftp and it is fast [19:20] DFJustin: now I have to choose a collection [19:21] opensource, aka community texts [19:21] that's what's used for web archives? [19:21] I did think there was an AT collection [19:21] they should go in archiveteam but you don't have permission to access that [19:21] so an admin needs to move it later [19:22] I use cyberduck as a nice s3 gui but it's mac & windows only [19:23] this probably belongs in the afk collection in any case. [19:25] anyway, http://archive.org/details/w4rnl.net46.net_20130723 [19:27] alright, that's incomplete :/ [19:28] you can edit it from the web interface and add metadata [19:28] or maybe not... [19:29] yep, there are some fails [19:29] DFJustin: can i add updates later? [19:30] yep [19:30] you can add or replace files using s3 or the web interface [19:31] maybe ftp I've never used it [19:32] re-using the same item for different things is discouraged though, like if you re-crawl later just make a new item [20:24] ha ha [20:24] blew the roof off. [20:24] BLEW THE ROOF OFF [20:24] I warned of the journalist-brogrammer complex [20:24] hey SketchCow [20:25] hey godane. [20:25] i got most of torrentbytes.com forums archived [20:25] I cannot wait to see/hear that talk. [20:25] fantastic. godane. [20:26] luckly its looks like the same format as underground gamer [20:26] detroir, we should expand on. [20:26] detroit. [20:26] there are eight detroit students here who want to help us save detroit [20:27] Iam a dope, we should have called the project Doctor Detroit [20:27] i gave you a list detroit sites for you [20:28] on #OCP [20:28] Also, I punched the bit.ly in the face [20:28] hahahahaha [20:28] to the lead engineer of bit.ly in front row. [20:28] you're welcome [20:28] I uploaded something but it's incomplete [20:28] going to have to fix that tomorrow [20:28] fucking free hosting with daily usage limits [20:31] SketchCow: Totally worth it. Bit.ly's a bunch of tards. [20:31] you know what worries me more? [20:31] I've seen an explosion of on.fb.me [20:31] and similar shorteners [20:32] that bothers me. a lot [20:32] more that bit.ly [20:33] http://www.popuparchive.org/ what is this I don't even, can you make a less usable website? [20:34] hahah, I thought it was an archive of popup ads [20:34] that as my first guess. [20:35] my first question is, so you guys are throwing this all on archive.org too right? ;) [20:35] or somewhere else that isn't gonna just up and disappear [20:41] ha ha YOU FOUND OUR LAIR [20:41] We partner with Internet Arcbive and Pirate Bay [20:48] SketchCow: I meant the popuparchive people [20:48] I of course know you guys partner with IA and TPB. [21:01] http://blogs.loc.gov/digitalpreservation/2013/02/video-game-preservation-at-scale-an-interview-with-henry-lowood/ [22:28] HEY ARCHUVE TEAM. [22:28] I HAVE SOMEONE HERE WHO WONDERS WHAT TEAM MEMBERS DO [22:28] HEY HEY CAPT [22:28] WHAT D YIU DO [22:28] GO [22:29] I've added support for a few URL shorteners into our distributed shortcode unroller - as well as tried to keep on adding URL shortners to our ArchiveTeam Wiki page. [22:30] I got a few improvements to some Internet Archive software cooking. I run or have run most of our distributed tasks [22:31] Manually bruteforce grabbing/crawling sites with wget (into WARCS) [22:31] so slow [22:32] you have disappointed our new recruit [22:32] Helping people get up to speed with our projects and trying to help answer questions [22:32] etc [22:32] what do we personally do or what does archiveteam do [22:33] Oy! New recruit! Come join us for a few hours at your leisure - I'll assure you, you'll get ideas! [22:33] archiveteam or you if it must [22:33] what archiveteam does is easy [22:33] We make web servers burn and invade and tear our datas [22:33] saves stuff that's about to hit the bit bucket becuase someone (typically in management, typically after a merger) decides that old data is no good and doesn't generate revenue as effectively as new data [22:34] run the warrior and download sites, keep a look out for sites announcing shutdown or that look creaky in general [22:34] Sometimes we calmly sneak data out and try to not burn servers and data centers into the ground [22:34] Save data, irregardless. If it's worth saving or not, is not up to the one doing the archiving. Grab as much as possible [22:35] under the wider umbrella, go grab big collections of data that it would suck if they disappeared and back them up to archive.org or elsewhere [22:35] Someone, sometime - might have or find uses for it. Time will tell and things disappear EVERY DAY [22:35] so much data, so much culture - so many references - are dead. "What you put up on the net, stays on the net" is patently false. [22:38] We make some code that in general help archivists (pro or hobby doesn't matter). Things like getting WARC support into wget was done by Archive Teamers. (Mainly the ATer alard) [22:38] and while it's so easy to delete data now (nothing physical to do) it's also so easy and cheap to store things [22:43] basically when people go "oh shit I hope someone got a copy of that", our job is to be that someone [22:55] FUCK YEAH ARCHIVE TEAM [22:55] OO RAH [22:56] Hey guys, I'm looking into doing a bit of archiving myself, and wondering about the best place to launch it (Amazon instance, a VPN somewhereâ¦ not sure where to do it and not get my IP banned) [22:57] ip bans happen regardless of where you do it from [22:57] you want an IP that you aren't sentimental about [22:57] amazon is good for disposable IPs [22:57] Well, if you certainly are going to get banned - live somewhere where the IP ranges are large and the accessability is good. Amazon would be a good bet. [22:58] It can be quite expensive if it's going to run for a while though [22:58] Hmm, alrighty then [22:59] another possiblity might be to actually make your archiving target an AT project and do a pipeline for seesaw and we could distribute it to all ATers and/or any other interested person [22:59] then they could scale out as much as they'd want [23:00] ^ best [23:02] * ersi tucks himself into bed and hits 'Hibernate!' [23:03] I'll look into it, is for a story archive so I'm not sure how much interest there'll be :P [23:03] 'night ersi, thanks for the help [23:04] the warrior is fully automated, interest is not a big factor :) [23:11] Ah, shiny :) [23:15] SketchCow: I mirror pirate sites forums that are at risk of disappearing