[01:50] SketchCow: So yeah, got an upload slot for a friendster tarball? [01:54] PepsiMax: you should put it on archive.org. [01:54] it's really easy, just requires that you sit down for a few minutes and sort out the metadata [02:24] Oh wow, I just realised...I should get an eSATA connector for this box. I'm limited by the USB interface. [02:30] esata apparently grows hair on your balls [02:30] good for you [02:30] makes you a man [02:30] etc [02:31] man, hairy woman, whatever [02:45] ....right. Well anyway, I can reasonably get away with...oh wait, I just realised: I'm limited by my network interface too. This thing only has a 10/100 card. :/ [02:46] So yeah, we're under pretty light load on the weekends, so _in theory_ I can get away with 200-250mbit. [02:53] How much is a gigabit PCI card these days? [02:55] dozen bucks? [02:57] Oh, well that's not too bad. And I think this thing already has a spare SATA header...I need to find more stuff to upload. [03:01] hi guys, is there supposed to be a difference between dd and ddrescue floppy images? 7zip seems to be able to open the dd images, but not the ddrescue images- anyone else seeing this? [04:35] This week, we start getting upload slots again. [04:43] Ahh, none right now? All right, I'll bide my time. Thanks. [04:44] Huh, limited to nine characters? [04:44] yep, welcome to efnet [04:45] This is the first I've needed that many. ~_~;; [04:45] Well, "needed" in a very loose sense. [04:45] wanted? [04:46] Close enough. [04:46] :P [04:51] :D [04:51] t SketchCow Give me root and I'll set up rsync modules! [05:12] heh [05:43] I'm trying to get him to give in by badgering him with all sorts of package installation requests [05:46] took it to vegas and back [05:48] lol [06:07] hey all. anyone know W4r3zh4ck? [06:08] it appears he ripped archiveteam.org or at least referenced the rip, created or referenced a rip of ED, and started a blog called THE ARCHiVERS [06:09] not sure if spam [06:10] hahahahahaha [06:10] link? [06:10] http://archiveteam.org/index.php?title=Special:Contributions/W4r3zh4ck [06:12] i come in occasionally to prune/admin the wiki, he's the only one i'm wondering about [06:13] hmmm. [06:13] let it stand. but don't let the ED page say "online": http://archiveteam.org/index.php?title=Encyclopedia_Dramatica&diff=prev&oldid=6263 [06:14] indeed [06:14] was it an official project? is the archiving still in progress? [06:15] * chronomex shrugs [06:15] ED is dead, man [06:15] indeed [06:15] wondering how legit the .ch mirror is [06:17] I don't think he understands the difference between "mirror" and "backup". http://archiveteam.org/index.php?title=Frequently_Asked_Questions&diff=prev&oldid=6240 [06:21] good catch [06:22] I say leave him alone unless he does something monumentally stupid [06:22] cool yeah his links are weird but don't appear malicious [06:24] A while back, Jumpline bought Christian Web Host. Most of their sites are for churches, but sometimes you get a jewel like this: apostolic-sceptre.org [06:26] marquee, hit counter, tables \o/ [06:26] splash page [06:26] [06:26] it's almost as good as north korea's website [06:26] (BTW, if you notice something broken there, please let me know; I'm attempting to migrate that site to a new server.) [06:27] wait, apostolic-sceptre? insert easter eggs. [06:27] I think I saw a whole directory about hexes while I was tarring it up... [06:27] I rather like my job, thanks. [06:30] Though Plesk can die in more fires than the sun would know what to do with. [06:31] CMSes and control panels: so people who don't know how to manage websites can screw it up and then pay someone to do it for them anyway [06:36] It's interesting as a software artifact. [06:36] And the places they use perl, it's a lot cleaner than cPanel ever could hope to be. [06:37] I wish people would write management systems according to the way the thing that's being managed wants to be managed [06:37] instead of making creative new folder/config/scripting schemes [06:37] I'm with you. Sphera users are generally pretty happy and that's just a chroot with a mess of hardlinks. [06:37] if your config files say ## DO NOT EDIT BY HAND ## you've failed a bit [06:39] I think I'd rather see that than . [06:40] ooh [06:40] ## DO NOT EDIT BY HAND ## THIS IS NOT ACTUALLY A CONFIG FILE ## CHANGES WILL BE IGNORED [06:40] ## I DON'T ACTUALLY KNOW WHAT THIS FILE DOES BUT DON'T TOUCH IT JUST IN CASE ## [06:41] hahahhaa [06:41] my coding is a bit like that [06:54] I saw something on stackexcahnge, I think it was, recently asking about the "best comments you've seen". There was one in there too, to the effect of //This code has been automatically generated. Changes will be ignored. [15:56] http://i.imgur.com/XZ6GY.jpg OFFICIAL ARCHIVETEAM SUSTENANCE [15:57] You should have seen SketchCow when we went by there [15:57] He had to have like 50 [15:59] i hope you made photos! [16:10] hmm, boo -- my me.com username scraper stopped at 525 usernames [16:11] today, I also learned that Google's search interface will give you a maximum of 100 pages of results [16:32] yipdw: does this apply when using an API key? Furthermore, those things still work right? [16:32] Jofo: haven't tried with the API key; my scraper just screen-scraped google.com/search?q= HTML [16:32] I'll give it another shot after work [16:34] I may have to use alard's method of grabbing an IPv6 /112 and hitting Google round-robin style [16:42] Back [16:42] deep-fried cheesecake does sound pretty good [16:42] W4r3zh4ck is real. [16:42] By the way. [16:43] Slavishly copy-bot, but still real. [16:51] SketchCow: Tell them about your deep fried cheesecake addiction [16:51] I'm working through it [16:51] Down to 6 a day [16:51] Only when I can't wait [16:51] hahaha [16:52] The archive's sarcasticness level has fallen sharply since you left [16:52] I'm going to have to compensate [16:55] deep-fried cheesecake sounds like an excellent way to die painfully [16:58] Or deliciously [16:59] SketchCow: hey, I was told to ping you about this: any interest in me scanning ~2 years worth of new scientest mags from the early 2000's? Their content is available behind a paywall online, so I figured it was a no [16:59] but I figured I'd ask before I recycled it [16:59] I'd like to see those issues contributed somewhere. [17:00] I can give you a mailing address. [17:00] But scanning, no. [17:00] contributed works, I suppose [17:00] assuming the USPS doesn't raep [17:00] Media Mail [17:01] k cool, great preso at defcon btw [17:01] almost had me in tears :3 [17:01] I had you so rapt you wanted to scan New Scientist [17:02] ALMOST BUT NOT FOR REAL *flexes muscles* [17:02] haha [17:02] An activity slightly more interesting than watching someone watching paint dry [17:02] well, I've got a sheet fed scansnap thang, so it would be easy enough to do [17:02] 1. Those don't really work [17:02] slice binding, toss in scanner, press go [17:02] 2. That means you'd destroy the new scientists [17:02] I fucking HATE destroying original material to scan [17:02] yeah, well, so would recycling :) [17:02] which was my original plan [17:03] Yes, that's good. [17:03] It's good you decided to rape the girl, not just shoot her in the head [17:03] gettin' some. [17:03] Mail address at the ready when you look up media mail costs, which will be trivial. [17:03] sure, pm? [17:04] E-mail me. [17:04] jason@textfiles.com [17:04] sure [17:04] done [17:15] SketchCow: oh, yeah. It'll be like three bones for media mail [17:22] http://www.youtube.com/watch?v=yzC4hFK5P3g [17:27] Out of curiosity, is it worth the effort to non-destructively scan issues of Boardwatch? I keep hoping that someone else has already done it, so I don't have to. [17:28] And by "scan", I mean "setup a tripod with a camera on it, and take pictures, page by page.". [17:28] Here's the deal. [17:28] I have someone working on low-cost scanners, that will do a great job. [17:28] And so at this point, I'm willing to just take the items in. [17:31] I remember seeing a google video about a low-cost, non-destructive scanner that was intended for scanning textbooks [17:31] it was something like a couple pieces of glass (or plastic) as the "bed", and then it used two inexpensive digital cameras, each pointing at a different page [17:34] SketchCow: Cool. Thanks for the update. [17:44] Jofo: yeah, th open-source bookscanner. guy named Dan made it. [17:44] now he works for archive.org. [17:45] Where he's working on the next one. [17:45] oic! [17:46] I figured trying to drop some archive knowledge in here was going to get me one-upped :) [17:48] heh [17:50] I should go to more movie themed restaurants, then I at least get a meal even if the movie sucks [17:56] are there that many? [17:56] I'd never heard of a movie-themed restauraunt before [17:57] me neither [18:03] Someone wants archiveteam to produce a backup of his livejournal he lost from 2008. [18:03] I am skeptical. [18:08] I'm still sad I can't find my old geocities page. I wonder if it wasn't archived or if it was purged for disuse [18:13] Checked Reocities? [18:16] SketchCow: lost, eh? [18:20] SketchCow: yeah. Tried a google site: search on the area51/vault url for my name, which I'm pretty sure was on the page. At one point I even think I went through 'em manually. No go :( [18:27] chronomex: He ran some backup thing and he lost it all [18:34] The backup thing actually deleted the information? (That would be nice.) [18:35] more likely he lost the backup [18:35] Heh, yeah. (Less interesting, though.) [18:36] true :) [18:37] No, it apparently didn't back up something it did [18:38] http://www.facebook.com/permalink.php?story_fbid=119726521459178&id=100002654938831 [18:41] SketchCow: I have a Heritrix/WARC backup of Akoha.com (+/- 1GB), but it's probably easier to wait for the new upload slots? [18:41] yea, let me know when you've got slots handy [18:41] I've got a warc of the Google Friends Newsletter [18:42] Ah, you managed to get it? Good. [18:43] yea [18:43] it was dumb [18:43] I was forgetting to set a user agent [18:45] By the way: I tried to turn wget into a WARC extractor like you suggested. [18:46] I got part of the way, that is: it successfully extracted files and rewrote some of the urls. [18:46] ooh, cool [18:46] The difficulty is in the different alternative urls: with and without index.html, trailing slashes, 302 redirects. [18:49] Plus it's not really what wget is supposed to do, and the implementation is quite hackish. I gave it up, for the moment, but I can upload the code if you're interested in having a go at it. [18:49] make it a git branch [18:50] I want to change the way it handles --timestamp when warcs are enabled [18:51] I want it to download all the files and put them in the warc, but to only overwrite the mirror on disk if the file is newer [18:51] instead of ignoring the --timestamp option entirely [18:51] my filesystem has copy-on-write semantics [18:52] so if wget overwrites the file, the filesystem stores that as a change from the last snapshot [18:53] should be a good project for this weekend [18:53] I heard wget. [18:54] emijrp: havce you seen alard's wget-warc project? [18:54] no [18:54] http://archiveteam.org/index.php?title=Wget_with_WARC_output [18:56] example of headers? [18:56] sure, one moment [18:58] lol, compare this http://en.wikipedia.org/wiki/List_of_bulletin_board_systems with this http://bbslist.textfiles.com/usbbs.html [18:59] ALL HUMAN KNOWLEDGE. [18:59] but only notable one. [19:01] db48x: https://github.com/alard/wget-warc/tree/warcextract [19:01] cool [19:02] emijrp: well, they don't collect trivia or "in popular culture" [19:03] oh really? [19:03] http://pastebin.com/vv6mK7f1 [19:03] emijrp: there's a war on "In popular culture" sections. [19:04] im inclusionst [19:05] I'm hateionist [19:05] motivated by hate, awww yeah [19:05] did you mean sysop? [19:05] oh, that reminds me [19:06] alard: I don't see my custom warc header in the warc file [19:07] alard: I mentioned the WARC/WGET thing to the archive.org meeting to great love and admiration [19:07] (other than in the resource record where it echos the wget command line) [19:07] I just need to get the new slots going today. [19:07] Actually, alard [19:07] you already have a slot on the old one. [19:08] Just pump it there, while I get my ass together over here [19:08] Another few gb aren't going to hurt. [19:09] 104G . [19:09] cache# du -sh . [19:09] FAN TASTIC [19:12] SketchCow: WARC/WGET in meeting: cool. Upload to old one: will do. [19:12] db48x: Your header is there, it's the last line in the warcinfo record. [19:13] (If you look between these awkward hex codes.) [19:13] http://www.nytimes.com/2011/08/16/arts/music/springsteen-and-others-soon-eligible-to-recover-song-rights.html?_r=2&pagewanted=all <- yeehaw [19:13] It's not a header that appears in every record, just in the warcinfo. It's more of a warc-field, really. [19:13] oooh, in the content of the warcinfo record [19:14] tricky [19:16] OK, will be back, mailing out a ton of GET LAMPS I owe [19:29] Schbirid: did you hear about money, lawyers, money, lawyers, money, lawyers and money? [19:29] Extending that 35 years is easy. [19:30] By the way. Fuck that. I use Jamendo. [19:31] yeah [19:31] but fuck jamendo [19:31] so much [19:31] gah [19:31] vorbis is broken for weeks [19:32] they do not post why things were deleted [19:32] etc etc etc [19:32] i love them [19:32] but it is hard [19:32] ogg broken? torrent you mean? [19:32] no, streaming is broken now too [19:33] and the full albums are not craeted anymore either afaik [19:33] ah streaming, you can download ogg albums using a trick [19:33] i know [19:33] i have most of them locally ;) [19:33] http://www.jamendo.com/en/user/The%20Chilling%20Spirit <- [19:33] me [19:33] ah spirit [19:33] yeah [19:34] people never get the schbirid :) [19:48] hrm [19:48] I can't remember my archiveteam.org password [19:48] Hmm, "The material sent must be educational media. It can’t contain advertising, video games, computer drives, or digital drives of any kind." That's kind of unfortunate. [19:49] for media mail? yea [19:49] They don't have the chart for weights over 20lbs up anymore, either. [19:49] so much for digital media [19:49] I didn't know that ... [19:49] And archiving games or non-educational magazines(I think?)... [19:50] Is oregon trail a video game? [19:50] but on the other hand, it's not like they actually open the mail to see what's in it [19:51] http://pe.usps.com/text/dmm300/173.htm [19:51] Uuuuh. [19:51] Bullet two: Media Mail can be examined by postal staff to determine if the right price has been paid. If the package is wrapped in a way that makes it impossible to examine, it will be charged the First-Class rate. [19:51] check out 4.1 I): Computer-readable media containing prerecorded information and guides or scripts prepared solely for use with such media. [19:51] the DDM, quoted here, is the final arbiter of mailability [19:52] s/DDM/DMM/ [19:53] Wyatt: where are you reading this? because it contravenes the DMM. [19:53] Huh, so a different part of usps.com has different rules. Interesting. [19:53] https://www.usps.com/send/media-mail.htm [19:53] Under the Rules and Restrictions tab [19:53] yeah [19:54] well. that is wrong. [19:54] if the postman questions it, cite the DMM at him. [19:54] Sweet. [19:54] Wyatt: so what're you mailing? [19:55] http://archiveteam.org/index.php?title=Car_Loans_For_Bad_Credit_Issues_12 [19:55] At the moment, nothing. I just hadn't heard about media mail and Jason mentioned it. [19:55] probably spam [19:55] Sounded handy, so I googled. [19:55] Wyatt: ahh [19:55] yup [19:55] Though I do have a couple boxes of magazines for him once I head back for a visit to my folks. [19:57] (Though it's tempting to just drive to New York and visit Strong Museum as a mini vacation sometime.) [20:55] MEDIA MAIL [20:56] MEDIA MAAAIILLLLLLLLLLLLLLLLLLLLLLLLLL [20:56] YOU'LL NEVER FAIL / WITH MEDIA MAIL / IT MOVES LIKE A SNAIL / BUT SO CHEAP SO SALE [20:57] watch out for the sketchcow [20:57] or he will raise his left brow [20:57] stare you down with evil sight [20:57] "hope you got your backups right" [20:58] http://www.archive.org/details/playboybraile00nlsu by the way [20:58] I'm writing a blog entry on it, and when I do, I want reddit juice from you bastards [20:58] haha [21:02] * underaway juices SketchCow's reddit [21:05] * alard thinks it would be a fun image processing exercise to write a braille OCR tool. [21:05] definitely [21:08] http://www.hitechcrimesolutions.com/?p=58688 discusses archiveteam, right at the end. [21:11] good night [22:09] http://i.imgur.com/TkwQ9.gif [22:18] Win [22:26] awww [22:33] CurateCamp Hangout! https://talkgadget.google.com/hangouts/3cf9fc8494ac1c6454f1104c38d5e4827e1fdfc5?authuser=0&hl=en [22:33] Hop in [22:33] Don't be loud, just watch the show [22:52] fuck google [22:52] just fuck google [22:52] with a rusty chainsaw [22:54] On it [23:07] https://talkgadget.google.com/hangouts/b239ac58e9dd5b06df6b5176a7bb1e78685a52d4?authuser=0&hl=en-US#