[00:04] ah. just random stuff [00:14] They're uploading 320gb of new stuff soon [00:17] "just random stuff" referring to the DDR and traffic crash slides [00:18] I think, this weekend, I will locate and prepare my stage6 data for upload [00:33] haha I just noticed underscor's t-shirt and that he is standing in front of what appears to be the IA building [00:38] doh [00:38] http://games.slashdot.org/story/12/02/25/0011254/inventor-of-the-modern-pinball-machine-dies-at-100?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+Slashdot%2Fslashdot+%28Slashdot%29 [00:41] GAH [00:41] missed out on the floppy diskette shirt.woot [00:42] tried ordering it last night only to run into trouble. made a deposit today that would allow me to use a different card, but now it is sold out [01:14] Oh no! What did the shirt look like? [01:16] http://sale.images.woot.com/1971-Floppy_Diskette2wwDetail.png [01:16] from http://shirt.woot.com/Forums/ViewPost.aspx?PostID=4880141&PageIndex=1&ReplyCount=179 [01:18] Schbirid: Aww, thanks <3 [01:18] hahahahahaha [01:18] Coderjoe_: They'll have it again tomorrow for $15 [01:19] SketchCow: Relevant to your interests? http://techcrunch.com/2012/02/23/and-now-theres-a-kickstarter-for-porn/ [01:19] :D [01:19] undersco2: after 2girlsetc i'm not sure the internet needs to have much of a voice in exploring new areas for porn [01:20] hahahah [01:23] hey now. the internet didn't make 2girls or swap.avi [01:23] they just happened to stumble upon them [01:27] hello.jpg [01:29] the internet funded swap.avi [01:29] hahaha [01:36] OK, heading out again. But I'll see stuff in here. [01:37] pda12 talk went well!!!!!! [01:47] So I'm an unemployed archivist, and while I don't have the heavy-duty bandwidth/storage most of your projects seem to need, I'd like to contribute [02:03] HOLY [02:03] Bibliotik has shutdown all operations. We are no longer able to assume the risks involved. The staff would like to apologize for the sudden (but necessary) decision and thank everyone that participated and made Bibliotik such a great place for so long. We love you guys! [02:03] http://bibliotik.org/ [02:03] wohamagod [02:04] what was it? [02:04] ebook/elearning private torrent tracker site [02:04] best besides library.nu i think [02:05] brace for "library of alexandria" comments [02:06] :( [02:12] shaqfu: the archive team motto is "We are going to rescue your shit!" http://archiveteam.org/index.php?title=File:Archiveteam.jpg so start looking out for user-content sites shutting down, and back them up [02:15] here's some smaller projects that all sorts of people who saw something happening or about to happen decided to archive them: http://www.archive.org/search.php?query=collection%3Aarchiveteam-fire&sort=-publicdate&page=1 [02:17] dashcloud: Sounds good; thanks [02:17] and if you have lots of time, but not much storage or bandwidth, there's vast amounts of items that could use metadata- magazines and shareware CDs [02:19] Gotcha; is there a project list of those, or should I go through and start marking up anything that needs it? [02:21] email SketchCow at metadata@textfiles.com and ask for one- he's good at getting back to you [02:22] Got it [02:22] actually- http://archiveteam.org/index.php?title=Metadata_warriors [02:22] that gives you a better idea of the whole thing, and some suggestions [02:23] you'll want to do the online version- it works a lot better than the PDF in my experience [02:28] Seems straightforward enough [02:31] I gotta head out in a bit, but I'll send off an email tonight [02:31] if you have any questions, just ask [02:34] Are we preferring human or machine readability? [02:35] take a look at one of the examples in the wiki page- it should have some structure, and be consistent thru all the issues you do [02:35] Gotcha [02:38] it's really fascinating though because you get to see the evolution of a magazine the way not many have because you can look at them one after another and immediately notice how it's changing [02:39] Yeah [02:39] I've done a lot of manuscript processing/metadata before; that's the best part about it [02:42] Zzap64 sounds fun - ten years of C64 talk [03:36] mmm [03:39] hey at&t: while I am thankful that you're not contributing to buffer bloat hell, I would also appreciate it if you would expand capacity on the first three hops from my house so you're not giving 90% packet loss. thanks. [06:03] arrith: http://gen.lib.rus.ec [06:03] I'm almost halfway done downloading the entire contents [06:03] undersco2: nice [06:03] Then I'll probably make it available for a short time to people who want a mirror [06:04] i've heard people linking that. people are trying to port stuff [06:04] It's the best 'tik replacement I've found so far [06:04] the hydra needs to kick in, as in a bunch of bibliotik clones and library.nu clones need to spring up [06:04] it'd be nice if one of those sites at least released their db/index [06:04] Yeah [06:05] Well, I'm friends with some of the 'tik admins, trying to see what they'll share [06:05] But something big must have happened [06:05] They're very shaken [06:11] dang [06:11] i figured if tpb and demonoid are still up why wouldn't these ebook sites do fine [06:11] i mean the RIAA has been the big threatening body and what.cd is still doing *fine* [06:12] undersco2: yeah at least a list of the books/formats would be nice. so there's at least something for a new site to build on [06:25] what.cd takes a LOT of precautions though [06:26] i've not used what.cd, but don't they have some fake wiki-looking front or something? [06:27] getting tired of at&t's routers losing 90-100% of packets [06:29] Coderjoe: That's waffles [06:30] what.cd just has a poem [06:31] ah [06:37] good thing no one's blowing their cover in a publicly logged irc channel [06:38] that would be terrible [06:40] never used waffles or what [06:40] what cover [06:40] 8D [07:25] woot. floppy shirt ordered [07:25] http://shirt.woot.com/friends.aspx?k=24671 [07:31] ggfdggesfgdfds [07:31] I need to go find $10 [07:31] er, $15 [07:31] That is so awesome [07:31] Perfect for my collection [07:31] (I've started to get the reputation that I'm the awesome-shirt-guy) [07:32] Random people I don't know will seek me out to see what I'm wearing [07:32] (in school) [07:32] It's awesome [07:34] undersco2: haha, i used to do that [07:34] undersco2: far too many dollars spent at thinkgeek [07:34] Yeah, there, and woot [07:35] I honestly could probably go 2 or 3 months without doing laundry, shirt-wise [07:35] One of my friends wrote this https://github.com/habnabit/shurtapp [07:35] Been meaning to set it up for myself [08:25] i just go with "a shirt i didn't wear yesterday" [08:25] assuming i went somewhere and saw somebody [08:31] http://escism.net/omfgdogs/ [08:32] wtf happened to hampsterdance.com [08:32] best site ever [08:32] they sold out [08:32] I remember discovering that song in 5th grade [08:33] haha [08:33] i think it was like 4th or 5th for me [08:33] word of mouth meme [08:34] it had to have been one of the first [08:34] yeah [08:34] :D [08:34] damn. make me feel old [08:36] btw, loving that hershey's censor [08:39] my boss was messing around with the camera, threatening to take a picture if i took my hand away from my face [08:39] had the bar on my desk, decided to see if i could get it to stick [08:39] haha [08:40] So, I got a full backup of Convore's data [08:40] how do package this up for an acceptable archive? [08:41] i'm planning to make it a fully browsable website [08:42] undersco2: omfgdogs on a 30" screen is awesome [08:42] :D [08:42] and max sound [08:42] ;) [08:42] and f11 [08:43] oh man [08:43] this is so trippy [08:43] :D [08:43] I wonder what it's like stoned [08:43] hahahaha [08:43] it froze my browser [08:44] It's probably a good pixel massager [08:44] kennethre: Need better browser? :D [08:44] I'm gonna wrap it in a Quartz Composer setup [08:44] haha [08:44] and use it as my screen saver on my work OS X machine [08:44] it will be genius [08:45] undersco2: unfortunately they don't get much better :) [08:45] aw uses flash for sound [08:45] :( [08:45] If only FF played MP3 [08:45] yipdw: omg [08:45] That's awesome [08:45] :D [08:46] well [08:46] I think that's possible [08:46] if not I'm sure there's a way to reconstruct that from omfgdogs.{gif,mp3} [08:47] man i forgot about screensavers in general [08:47] * kennethre downloads electric sheep [08:47] don't forget after dark and more after dark [08:48] FLYING TOASTERS [08:49] http://29.media.tumblr.com/tumblr_lzoh9mh6Bd1qhhhaco1_500.gif Did I already link this? [08:49] Oh man [08:49] After Dark [08:49] That brings back memories [08:51] Oh maaaaan [08:51] I love holiday lights [08:51] Any of you use that? [08:51] I had that running on the G3AIO in my room in like 4th grade [08:51] Thought I was the coolest shit [08:51] haha [08:51] http://downloads.yahoo.com/software/macintosh-knick-knacks-holiday-lights-s10668 [08:54] http://www.archive.org/details/tucows_204921_Holiday_Lights [08:55] http://www.youtube.com/watch?v=StA81MNuqB8 [08:55] http://support.tigertech.net/old-software [08:55] <3 [15:45] SketchCow mentioned a way to browse iso files on archive.org by adding a slash or so. how would that work eg for http://www.archive.org/details/CdZoneIssue48march1997 ? if it is already public [15:45] oh i am so smart [15:45] http://www.archive.org/download/CdZoneIssue48march1997/cdzone48_march1997.iso/ [15:45] tried it on the direct http link before [15:46] fantastic [15:55] http://www.reddit.com/r/compsci/comments/q4e57/help_save_the_worlds_first_webserver_we_need_to/ [15:55] updates: found copies of versions 0.0, 0.1, and 0.2 on an MIT mirror of the cern ftp server. [15:57] someone sould possibly contact MIT about getting a mirror into IA. (or something. yes, we're assholes, but I would rather not have MIT see tons of traffic and pull the files because of many redditors plus others attempting to mirror it) [15:59] heh [15:59] http://www.reddit.com/r/compsci/comments/q4e57/help_save_the_worlds_first_webserver_we_need_to/c3uq8w5 [16:02] O_O [16:02] http://www.openafs.org/ [16:09] damn, that iso viewer serves files with no proper timestamp on download [16:09] cool idea, but i'd rather keep something at arms-length from kernelspace [16:10] er, something like that [16:43] SketchCow: I'm currently doing a test run with the direct-to-the-archive mobileme script, but I think it may take too long to fill 10GB. [17:46] alard: Do you want me to try it? I've got gbit. [17:48] undersco2: Well, to be honest, I tried one on batcave. :) It's just the slowness of mobileme. [17:48] Oh, alrighty, hehe [17:49] That's odd, though [17:49] I can fill 10GB in around an hour [17:49] Well, if you start wget --mirror it takes a long time. [17:49] Yeah, some of those do [17:49] It's just one at a time, not multiple. [17:49] Ohhhh [17:49] Nevermind then ^^; [17:50] The idea is that kennethre will run heroku instances that run a seesaw script, and each instance would download 10GB, create a tar file and upload it to the archive. [17:51] But these instances are killed every 24 hours, so if it takes too long to finish 10GB you'll lose a lot of data. [17:53] The upload was successful, by the way: http://www.archive.org/details/archiveteam-mobileme-hero-1 [17:54] (They wouldn't let me add it to the archiveteam-mobileme collection.) [17:55] Cute! [17:55] The permissions can be fixed I hope. [17:57] Permissions? [18:07] SketchCow: I think I've got a solution for the 10GB-is-a-bit-much problem. So if you like the upload (and perhaps find a way to fix the collection) I think we're ready. [18:19] Do the instances have 10GB of local disk? [18:19] (Oh, I suppose that's a better question for kennethre, heh) [18:44] undersco2, he said it's basically unlimited [18:44] oh okay, cool [20:16] well you could say upload 10x 1gb files to each item [20:21] huh [20:21] heroku is down [20:21] kennethre killed it [20:21] :P [20:22] well, at least i can't get to www.heroku.com [20:30] lol [20:39] for more lol, see https://twitter.com/#!/search/heroku [21:06] looks like more shareware is on the way :) http://www.archive.org/details/classicpcgames http://demu.org/news-detailed/11 [21:07] Ooh [21:07] Make sure to tell SketchCow so he can connect them [21:15] * Nemo_bis preparing to send 100 kg of stuff to him [21:37] i wish http://www.archive.org/details/F-15StrikeEagleIiiDemo would include version numbers [21:37] err [21:37] http://www.archive.org/details/classicpcgames i mean [21:39] Schbirid, what do you mean? [21:39] He's doing a good job with metadata it seems. [21:39] sometimes there are multiple versions/demos of games [21:40] hm [21:40] Some items on http://www.archive.org/details/open_source_software look slightly offtopic. [21:41] heh [21:52] well, you know, there's a lot of software in the world. [22:14] yipdw: yikes [22:15] seems to be alive again [22:15] yeah we're back in action [22:15] i slept through it [22:15] heh [22:18] Wjaaaaa [22:21] just found out how to get forks in github i think [22:22] curl -s http://github.com/api/v2/yaml/repos/show/cjdelisle/cjdns/network [22:22] I just mailed swizzle. [22:23] godane: github v3 [22:24] godane: curl https://api.github.com/repos/kennethreitz/requests/forks [22:25] SketchCow/kennethre: The script is ready, I think. http://www.archive.org/details/archiveteam-mobileme-hero-1 [22:25] fantastic [22:25] does anythng need to happen before I start using it? [22:26] A quick check by SketchCow, perhaps a fix to get the items in the archiveteam-mobileme collection (I wasn't allowed to do that). [22:26] alard: where's the actual script? [22:27] The general idea is: instance downloads at least 10GB of data, makes a tar file, uploads to archive.org, dies. Heroku starts again. [22:27] perfect [22:27] exactly what i need [22:27] So that in theory the instance will never be running for more than 24 hours, and will not be recycled by heroku. [22:27] any dyno can die at any time [22:28] but typically it's once every 24 hours [22:28] yeah they get 10GB within an hour or two [22:28] so it won't be a problem [22:28] Yeah, so the probability of dying increases over time, I thought. [22:29] The script is still with me, perhaps there'll be a few changes. I used your Heroku repository and am currently running one instance. [22:30] What are the scrape, scrape2, scrape3, scrape4 for? [22:30] alard: i can only scale each process to 100 [22:30] Does that mean that four scripts running in the same dyno? [22:30] Ah, good. [22:30] nah, all seperate dynos [22:30] every process is isolated [22:31] alard: how long did it take to run the one instance? [22:31] Since you can run multiple seesaw scripts in the same directory, but the new script assumes it's on its own. [22:31] I did a test run on batcave, that took five to six hours. [22:32] There was one user with a very large web.me.com site, so that took a while. [22:32] The heroku instance is up for 1 hour, says heroku ps, and is at 4% of 10GB. [22:33] hmm [22:33] i feel like making it smaller (e.g. 5GB) would make things easier [22:33] but perhaps not [22:33] It really depends on the selection of users. My first run had 8 to get 10GB, but the current run already has more than that for 4% [22:36] We should ask SketchCow about the file size. I think 5GB might be better, but 10GB will probably work too. [22:37] the smaller, the better the chance that everything goes smoothly [22:37] but whatever works best for the archive [22:37] that's a lot of data [22:38] alard: does this still interact with the tracker dashboard? :) [22:39] Of course. And you'll get credit as soon as you've downloaded something, even if you don't upload. (That's because that's easiest. We should do a clean-up step later.) [22:40] oh i thought the tracker only did uploads [22:40] awesome [22:40] not that it matters, obviously, but it makes things more f un [22:41] Evening everyone [22:41] hello [22:42] I run Yotsuba Society [22:42] HEre's deal. [22:42] Oof [22:42] ... [22:42] Here's deal? [22:42] Here's the deal. My concern is not that the individual filesizes are anything, be they 5gb or 10gb or whatever. It's item-size I care about. [22:43] So all I care about is making them, say, 100-200gb apiece. [22:43] So we're only dealing with a few thousand items or less, instead of tens of thousands of items. [22:43] 200gb an item, that's going to be, like, 1200 items. [22:44] So that's mostly it. Injecting files into the same items until it's known the items are at something near 200gb, then moving onto the next item. [22:45] sounds simple enough [22:47] Yes. [22:47] anyone got tips for simplemachines forum mirroring? [22:48] The needs of the archive's backend are a little different than the needs of our running scripts, I just mostly try and deal with those personally in my ingestion scripts. So since we have to load kenneth with the ingestion, that's how. [22:48] I'd REALLY like us to get on forum mirroring. Does the wiki have something on that? [22:48] Jkid: hi there, you've walked into the middle of a conversation but what's up [22:49] forum mirroring is sad since it's like mirroring a wiki: if you could get a db dump it's so much cleaner. otherwise it's all this grabbing pages. [22:50] Well I'm Ndee "Jkid" Okeh [22:50] The Co-Founder and head admin of Yotsuba Society. [22:50] ideally there'd be something to scan a bunch of static pages from a dump of a site and try to insert them into a db that can be used with mediawiki, but i don't know of anything [22:50] :o [22:50] He's not going to like that. [22:50] Me however Me, I like that very, very, very much. [22:51] Anyway, back on it. [22:51] I agree, I'd prefer the db method, but we're not able to, so getting some sort of functional ripper would be good. [22:51] We have all these dying PHPbb things. [22:51] hm, for SMF the print function seems nice. you get one html page per thread, no pagination [22:52] if you are just after the posts' contents [22:52] avatars, signatures etc would be missing [22:52] in-post links are written with the url behind the text [22:53] a big thing with forums is getting all the off-site images from imageshack etc [22:53] hm, images are not embedded but only shown as url [22:53] yeah [22:54] since getting 14 pages of "LOL LOOK AT THAT CAT" is not too useful :) [22:54] So, I got a full backup of convore [22:54] it's an api dump essentially [22:54] how do we make this a legit archive [22:56] How big [22:57] small [22:57] less than 100MB for the text [22:57] 200MB of avatars [22:58] i'm working on reconstructing the whole thing into a browsable website [22:58] it's pretty piecemeal, as is, but it's every single peice of public information on the site [22:58] a forum-software-specific ripper would probably be best. since phpbb, smf, etc each put stuff in special places. would make maintaining the crawler/ripper easier. [22:59] I bet someone's written such a thing [22:59] then a separate thing could be done later to reform it into a usable db, but i guess that's quite lower in priority to just getting the sites mirrored [22:59] DFJustin: that would be neat [23:00] So I am for putting a dump on archiveteam, and then a separate project to make archive. [23:00] Fast action first, backed up and there, followed by fix-up [23:01] SketchCow: awesome, I have a tarball [23:02] recommended wget settings for a "brute force" forum mirror? how about: -m -nv -nH --append-output=my.log --no-parent --no-clobber --continue --timeout=10 --adjust-extension --convert-links --page-requisites --keep-session-cookies [23:02] kinda old but http://www.phpbbhacks.com/forums/phpbb-mirror-grabbs-boards-get-the-code-here-now-vt64073.html [23:02] last time i tried --span-hosts to also get embedded external images iirc i ended up with much more than that [23:05] kennethre: Just upload it to archive.org, I'll put it in the right place after you do. archiveteam-convore-panic-grab or whatever you want, then tell me. [23:07] SketchCow: So items with 40 files of 5GB would be OK? [23:07] Yes. [23:08] And is there a way to add the items to the right collection? [23:08] alard: i'd love to start running this today [23:08] kennethre: Nearly there, I think. [23:08] alard: let me know if there's anything i can do to help [23:09] This 'community texts' looks a bit weird: http://www.archive.org/details/archiveteam-mobileme-hero-1 [23:11] i added --header="accept-encoding: gzip" [23:13] now it stops after fetching index [23:13] wicked [23:14] http://lists.debian.org/debian-user/2003/12/msg02235.html [23:15] well, this is a shame [23:16] SketchCow: I saw you moved the item to the right collection, but is there a way to do that in the s3 upload? [23:16] Yes, I need to know the account doing the uploads (e-mail it to me) and an admin will change it in the next day or two. [23:17] Schbirid: this page has a guy mentioning hearing about a patch to wget to support gzip: http://unix.stackexchange.com/questions/15176/using-wget-what-is-the-right-command-to-get-gzipped-version-instead-of-the-actu [23:17] Okay. So we'll start downloading and correct/upgrade later, is that OK? [23:18] Schbirid: googling for wget gzip patch gives: http://lists.gnu.org/archive/html/bug-wget/2010-01/msg00032.html [23:20] cheers [23:22] definitely not pretty. could do a big convoluted wrapper script around wget but it makes so much more sense to just have wget support that. --decompress-gzip or something [23:22] kennethre: I'll send you the script. [23:22] alard: i'll be waiting :) [23:22] i wonder if that patch is going to be merged and if not, why not [23:24] A MUCH better idea than a patch to gzip wget, to be honest, is if wget can export the downloads to stdout [23:26] Isn't that just wget -O - ? [23:28] Is it? if so, then why spend time with .gz wehn you can just export to - and do gz, bz, 7z, ferretz or mozarella-z [23:28] SketchCow: how's the availability of the s3 api? [23:28] How is it? Mostly good. [23:28] Sometimes it shits itself. [23:28] so it shouldn't be a concern? [23:28] I would always have error checking in place. [23:28] No, I use it constantly. [23:28] As do others. [23:28] awesome [23:28] i am getting quite a lot "WARNING: Upload failed: /myfile ([Errno 32] Broken pipe)" but s3cmd retries automatically and then it usually works [23:29] "wget -O -" combines all output, but in the case of this gzip thing it would just feed out that one gzipped page, which is the same place you'd be in if you just saved it to disk. [23:29] The curl command in the script is configured to retry the uploads until it succeeds (or at least stops returning errors). [23:29] still have to go into that file and grab the resources and dl other pages [23:29] curl might work better with gzipped stuff, dunno about warc support and curl though [23:29] I think it's underfunded and I'm about to make it less underfunded [23:30] But curl doesn't do recursive downloads. [23:30] I knew someone here would want that floppy shirtwoot [23:30] ah, didn't know curl didn't recurse [23:31] http://www.youtube.com/watch?v=l9n_TkFIttk AWT Cybertag VR Demo from 1994 [23:31] awesome video [23:37] back in action! [23:37] \o/ [23:37] thanks, alard / SketchCow [23:38] Let's hope the script works. [23:38] alard: We should get that script running. [23:38] So you want to edit it, or me edit it? [23:39] Well, there's not much to edit save a few filesystem references and the name of the redis queue. [23:39] I can edit it if you give me the new locations, or you can follow the instructions and do it yourself. [23:39] Up to you. [23:40] let me write a thing that lets you see this real quick [23:40] it's pretty amazing