[00:06] http://thechive.files.wordpress.com/2012/02/a712c3b009a8175cbb751b33ce62573aec0b20ab.gif [00:20] hmm the textfiles bitsavers mirror is rather behind by the looks of it [00:23] * Coderjoe grumbles about pulling cable [00:24] even better.... pulling cable through a ceiling after all kinds of furnature is already in place [00:24] with instructions to leave out some runs to be completed at a later date [00:25] Archive Team loose in the warehouse again. https://lh3.googleusercontent.com/-Z4Xr4BRbg6U/TqfBUmSO08I/AAAAAAAAf0s/R2uUdpFbUN4/s270/1276570199204.gif [00:25] the bitsavers mirror is behind because I do them by hand [00:25] I see them delete documents out [00:25] So I have to set time to move those away [00:32] that's an excellent gif. [00:32] SketchCow: Sent again, hopefully [00:37] Working now [00:37] I hate watching gifs like that :( [00:43] Awesome [00:44] I know I'm 3 years too late, but anyone want ~16g of geocities? I've got 7z running on it, but it'll take a while. [00:47] Of course. [00:47] It's NEVER too late to get more geocities. [00:48] ^ [00:48] I keep running into missing sites. [00:48] It'll only extract on unix systems because of weird filenames. Hopefully 7z won't lose anything compared to tar/bz2, but it might give me better compression. [00:48] tar/7z is good ;) [00:49] oh, didn't think of that. Won't 7z itself do better? [00:49] actually tar/xz is what I've found to work well [00:49] xz is lzma like 7zip [00:50] I'll do tar/xz then. My desktop has 8gb ram, that should be enough [00:54] of course you remember that lzma is heavily biased towards slow compress/fast decompress ;) [00:55] How long should it take on a netbook? If it's overnight, I'm fine [00:55] should be fine then [01:35] anybody home? [01:35] I would like to help with mobile.me [01:35] or whatever else is on fire [01:38] hybernaut: helping with mobileme is easy.. all you really need is a good amount of free HD space and a linux install [01:38] have you looked at this yet? http://archiveteam.org/index.php?title=MobileMe [01:39] no, that's what I was looking for, thanks [01:51] hmm web.archive.org/http://foo doesn't work anymore [01:55] DFJustin: http://wayback.archive.org/web/*/http://foo [01:55] yeah I know [01:56] but a) that's more typing and b) it doesn't take you to the newest version automatically [01:56] I am archiving [01:57] kenethre: you here? [01:58] hybernaut: yo [01:58] are you responsible for the MeMac Dash? [01:58] I am not [01:58] there's more than one heroku-humper around here? excellent [01:58] haha, nah [01:58] well I think it's very well done [01:58] hybernaut: my scraping is unrelated to the dashboard [01:59] he humped heroku so much he married it [01:59] compliments to the chef [01:59] well i gave them rights to all the intellectual property I didn't think to claim as my own [01:59] so i guess that counts as marriage [02:00] marriage is much, much worse [02:02] hahahaha [02:02] who are you kidding. you're their bitch [02:07] re: MeMac, does it make sense to run multiple processes on my one machine? [02:08] not usually, unless you have insane bandwidth [02:08] 50Mbit+ [02:08] hybernaut: the bottleneck is typically your box, not mobileme [02:09] ok, this one should suffice, then, thank you [02:09] it tends to be few large files, so network speed is the limiter rather than rtt [03:03] SketchCow: http://outer-court.com/basic/echo/T1084.HTM [03:04] A deceased friend of mine, getting help in 1992 [03:04] just throwin that out there. [03:20] You surely must know how much of my time is being called for and taken right now. [03:20] sorry [03:20] I just wanted to share [03:20] Save it and the thing it's a part of, or let me know why I should be checking it out more than the look. [03:20] because its' like your talk [03:20] that something is so meaningless at one time [03:20] & then so imporant later [03:20] It IS like a few of my talks. [03:20] Right. [03:21] So save that, and write why it's important. [03:21] That has power. [03:21] oh I have [03:21] I didn't meant to waste your time [03:21] just wanted to share [03:21] I'm rendering out Bill Budge talking about why Michael Abrash's Quake assembly language routine is so amazing [03:22] GDC Vault? [03:34] No, this is my documentary [03:34] One of them [03:38] http://cargo.dcurt.is/weird_ipad.png [03:38] ha ha pwn [03:38] What's that? [03:39] the world's first WQXGA iPad [03:40] huh [03:40] unfortunately it is also 30" diagonal [03:40] It was well within his power to become what in hip-hop is called a .weed carrier.. This is the guy who acts as a hype man at concerts, maybe gets a couple solo tracks or an opening set, and depending on the ritziness of the situation may literally be in charge of holding the drugs so the stars don.t have to worry about getting busted. It.s a good job. There.s a guy called Spliff Star who.s been doing this for Busta Rhymes for 15 years now . I bet [03:41] that's Bill Budge again? [04:15] Yes [04:15] The weed carrier. [04:21] Phew [04:21] I was feeling good, then I scrolled down the bitsavers list... [04:43] The more you do, the more I have. [04:43] It's not about being The Person Who Did All Of Them [04:48] I know [04:48] But still, sheesh, that's a lot of documents [04:49] Bitsavers has been at it since 1996. [04:49] I think Al scans while he's doing other work, in a corner [04:49] (did my last mail go through?) [04:50] Awesome, it did, okay [05:25] SketchCow: i'm watching your talk at shmoocon 2010 [05:26] i also backing up shmoocon videos [05:27] also shmoocon 2011 videos links are dead [05:29] the bass in your video is off again [06:30] worse audio recording ever!!! [06:34] https://twitter.com/maxfenton/status/174744236699303936 [08:03] man. there are some MADs in here that use some kick ass hand-drawn artwork [08:03] ooh~ MAD :3 [08:04] i think i have two collections of MAD at my parents' place [08:05] what the hell... [08:18] what [08:28] some old movie clip or something that I'm not sure how to describe [08:28] heh [08:29] footage: Rozen Maiden, audio: Cruel Angel's Thesis (Evangelion opening theme) [08:47] hahah [08:47] a japanese-subtitled version of the German rage kid [08:57] heh [08:57] lol [08:57] video: Lucky Star, audio: Hare Hare Yukai [09:29] DFJustin: http://liveweb.archive.org/http://domain works fine for me [14:09] 8.5g compressed. Now where to put it without running people out of bandwidth. I think my dropbox can handle it, split up, if someone downloads the pieces as I put them up [14:10] whatcha got cookin? [14:11] geocities data [15:59] freenode has a channel for internet archive #archive and there is people talking [16:00] i remember to enter long time ago and it was empty, i have it in autojoin, and now, months later, people inside [16:01] looks like archive.org staff [16:05] oh [16:05] lol fail [16:05] it is archive.vg [16:09] sigh [16:10] : D [16:12] http://www.archive.org/post/38352/who-wants-to-start-an-irc-channel [16:12] weird [16:13] the founder is msikma, looks like later some people from archive.vg occupied it [16:15] Nice sending a response on a six year old forum thread [16:15] It's not that weird actually :) [16:16] i use to reply very old messages in wikipedia talk pages, common [16:17] I'm sure that works out nicely [16:21] yes, it is not about persons but the article content, so, it has sense [16:41] emijrp, you should ask the founder to come back, set the topic, op you and kickban the others :p [16:49] hmm [16:49] ha [16:50] http://archiveteam.org/index.php?title=Main_Page&useskin=monobook gets me around the viagra spam [16:50] fascinating [17:17] topaz: ahh. what skin do you normally use? [17:21] db48x: I am new to the archiveteam wiki, actually [17:21] but the crux of it is that adding &useskin to the URL parameters seems to work around the spam crud [17:22] actually changing my skin in my preferences doesn't help [17:39] I've never run into a problem with spam on the AT wiki [17:39] however, I don't browse it [17:39] I just go to various pages [17:40] maybe that's why [17:43] topaz: interesting [17:52] argh what the fuck [17:52] someone here has a webapp that has 500 MB of private dirty RAM [17:52] per instance [17:52] Ruby developers [17:52] * yipdw sighs [17:52] oh wait, wrong channel [17:53] meh whatever [17:59] heh [17:59] heh [18:00] I don't mind 500 MB of private dirty if it's really justified, but per worker instance is ridiculous [18:00] and as we run dozens of applications on just a few server it's not just an annoyance but rather a real bogarting problem [18:35] topaz are you there? [18:35] hey [18:36] what address do you get for www.archiveteam.org? [18:42] $ host www.archiveteam.org [18:42] www.archiveteam.org has address 69.163.228.28 [18:42] I'm still curious where that spam is coming from [18:44] I'm assuming that it's giving it to me because it's convinced that my IP address belongs to a search engine, or something [18:46] I'm really not chuffed about it and sorry I kicked up such a fuss, now that I know it's chiefly affecting me for some bizarre reason and now that I know a workaround :-) [18:46] but it's certainly puzzling [18:56] yea, I don't really understand what's going on, I'm just curious [18:59] if you do curl http://www.google.com do you get junk ? [19:02] emijrp: no [19:03] I get a big mess of JavaScript as you might expect but I am not sure I'm prepared to say with 100% assurance that it's legit [19:04] though the returned code includes links to the Gioachino Rossini Google Doodle they're currently running [19:05] ok [19:09] topaz, try ?foo=bar and you'll probably get the same result [19:09] it's just working around the cache I guess [19:10] correct [19:10] which cache, though? [19:11] it's not my browser cache [19:21] dunno, whatever cache the wiki uses [19:21] you could try ?action=purge , who knows [19:23] if it's the wiki caching the page on the server side I'd expect someone other than me to be seeing it :-) [19:24] lol, funny when people start talking with random tech words [19:24] and you're right, adding &foo=bar or &givemeviagra=yes or anything else disables it. fascinating. [19:24] because those URLs are not cached [19:25] if the wiki is returning me a cached spam page, then why isn't it returning the same cached page to you? [19:25] no idea [19:25] simplest answer: I'm logged in [19:26] anyway, retry the main page [19:27] $ curl -s 'http://archiveteam.org/index.php?title=Main_Page' | grep -i cialis | head -1 [19:27] [19:38] topaz: I see the same [19:44] hybernaut: you see the same output from curl? but not in your browser? [19:45] yes, but I believe the engine returns different content based on your user-agent string [19:45] I don't know, tho [19:46] if you do the same curl with '-A Mozilla', you don't get it [19:47] I do [19:47] $ curl -A Mozilla -s 'http://archiveteam.org/index.php?title=Main_Page' | grep -i cialis | head -1 [19:47] [20:33] part 1 is up, 2 coming as fast as I can upload. If someone can figure out where there's enough space, I can send links as I go. [20:33] of what [20:34] ~8.5gb of compressed geocities rips [20:36] Have you asked SketchCow? [20:37] Not yet, I'm just putting it up on my dropbox. Once done, can just wget it from there as the parts complete [20:37] And where should it go, eventually? [20:37] I have no idea [20:37] archive.org, perhaps? [20:38] That's as good of a place as any [20:39] What kind of files do you have? Is there a link to have a look? [20:39] It's just one enormous tar.xz split into 8 parts. I ran wget/some custom python script on a crapload of fanfiction sites [20:40] Well, then it should probably become an item on archive.org, with your 8 parts, for starters. [20:40] the 8 parts are just split with unix split, you'll have to recombine them [20:43] Do you have a link somewhere? Then I'll download it and put it back together. [20:43] I have no idea what's involved in getting something up on archive.org. Would they accept it? [20:43] Yes, as far as I know they accept anything, and they take it down when someone complains. [20:43] Working on that, my upstream is slow. I'm feeding the parts to dropbox now [20:44] You just create an account, create an item and upload the files. [20:44] Okay, let me know when you have a link. [20:44] Or links. [20:45] * tsp_ nods [21:05] I've modified the mobileme tool to enforce bandwidth limits [21:05] anyone know if is this worth a pull request? [21:19] Why not. Put your fork on GitHub and perhaps it's useful to someone. [21:21] tsp_: What is it originally? I mean inside the tar.xz :P Besides "A part of geocities"? [21:28] any idea why the following example a stops right away but the example b works fine? [21:28] a) wget -m -np http://forumplanet.gamespy.com/quake_1_mapping/b50020/ [21:28] b) wget -m -np http://forumplanet.gamespy.com/quake_1_mapping/b50022/ [21:31] I guess that depends on a) what the server responds with b) more precisely, what content it responds with [21:33] it looks identical to me, just different links/content inside though [21:33] identical in terms of http and being html etc [21:34] hmm, I'd make wget output a log and go through that firstly [21:34] ersi: just wgetted/pulled html files [21:34] site rips [21:34] yeah, but a random random selection? or what? :o [21:35] mostly me googling for fanfiction and pulling anything that seemed to be related [21:37] ah [21:37] stupid me [21:37] ersi: thanks for the nudge. --debug showed me how it decides to crawl further [21:38] and i was the culprit [21:39] awesome :) Good going on finding it ^_^ [21:40] heh [21:40] i will try to get forumplanet backed up [21:41] ign stopped giving a fuck [21:41] there are incredible amounts of spam [21:43] still havent found out the proper way to combine --span-hosts and --page-requisites [21:43] i only want page requisites from those other hosts [21:43] archive.. ALL THE SPAM! [21:49] .... archive ALL the spam? [21:50] topaz, yes. Compresses nicely with 7z due to repetition [21:51] yeah, first step is getting it all. [21:51] it seems to grab 2-3 times the same page in different URLs too [21:51] You gotta catch 'em all! [21:52] but url structure is nice so this is an easy (but long) job [21:52] Is there any way to send a 66 MB long POST request through curl or something? [21:52] * Nemo_bis whistles [21:52] Of course there is Nemo_bis [21:53] ersi, tell me :D [21:53] But it depends on the target, if he chooses to accept that kind of long/large data [21:53] well, let's try [21:53] misconfiguration happends [21:53] sorry, I thought ersi was making a hyperbole and a half reference [21:53] I was just doing the call-and-response. [21:53] "man curl", I'll say. :P curl -X POST for just the method though [21:54] :p [21:54] topaz: I'm always half serios and half a jack ass [21:54] I guess that makes me fit in here with all you serious jack asses [21:54] ;) [21:56] Nemo_bis: Seems like it's just curl -X POST -d http://target/derp/api/herp?apikey=archiveteaaaaaaaam [21:56] oh good, then I won't be alone. [21:56] ersi, I think -X POST is not needed even [21:57] that'll be formencoded though. If you want binary you need to --data-binary it seems [21:57] at least according to https://www.mediawiki.org/wiki/Manual:Parameters_to_Special:Export [21:57] and with -d @somefile you'll post the content of the file instead. [21:57] Nemo_bis: Never hurts, hehe [21:57] alard: neat [21:57] but he told me it was too long [21:57] I do like the manual page for curl though. It's very useful and easy to search in [22:02] it's actually a bash error; bash: /usr/bin/curl: Elenco degli argomenti troppo lungo [22:02] (list of arguments too long) [22:05] i love bash scripting [22:07] hm, yes, what alard said is the solution anyway [22:12] 800 pages with 20MB -> 280KB 7z :) [22:14] a MediaWiki page I downloaded once was compressed over 5000 times [22:15] I wonder whether this poor server will send me a pony back after this 66 MB request taking ages [22:16] mirroring sites is weird [22:16] "should i go brute force and hammer the server quickly so they do not notice me in time to stop" [22:16] or "should i be nice and slow, hoping they will not notice at all" [22:19] If they're closing down shortly, rape the fuck out of it [22:19] did the latter now trying the former [22:19] if it's uncertain/you don't know, rape the fuck out of it [22:19] just kidding, do both in variations :) [22:20] heh [22:20] i am more a guy for consensual [22:21] Duct tape makes "No no no" to "mmh mmh mmh" [22:23] Why look, a swede! [22:25] Too bad emijrp's offline. That dude from the IA forum about the #archive chan just joined #archive [22:25] Nemo_bis: ^ [22:25] ersi, oh, good [22:25] He's gonna come over here [22:25] there he is, it's dada_ :-) [22:26] hello [22:26] we're a loose bunch of hobby archivists ^_^ [22:27] with contacts at IA/archive.org [22:27] hello again dada_ :) [22:28] what are you going to do, push the occupiers out? :p [22:29] I'm gonna go catch some sleep, but hang around - check out our site and chat with people about doing awesome archival stuff~ \o [22:30] hm, looks like (40million posts/~8) threads [22:30] but nicely done with ~15 posts per page [22:30] I was actually thinking about an archive project not too long ago. although it would be a bit big in scope. an archive of all DOS games ever released (these archives actually exist and are complete for a couple of consoles). it would be hard due to various technical reasons, though [22:30] there are folks doing that [22:31] e.g. DOSCollection [22:31] getting each page 2-3 times but since it compresses so well i dont care [22:31] there is a project for that on underground-gamer [22:31] amazing work [22:31] http://www.underground-gamer.com/wiki/index.php/Projects [22:32] yeah, I hear there are a few projects like that underway already, although iirc they don't use lossless disk images to do it? [22:33] i think http://www.underground-gamer.com/wiki/index.php/Redump.org_IBM_PC_Compatible does [22:33] looks like they have solid archives free of cruft, though [22:34] (one thing I was thinking of: some older games used technical copy protection means, such as bad sectors on the floppy disk. you'd be unable to use those games unless the disk image had that information encoded in it.) [22:34] SPS is theoretically going to do PC floppies eventually but they're kind of glacial [22:35] when you get to later years (mid 90s) games start becoming huge due to the FMV game hype [22:37] red book audio is a space hog too [22:39] there's also http://www.demu.org/ which is transitioning to IA [22:40] I honestly don't think there even is a disk image format that supports these copyright protection tricks. maybe the amiga disk image format does... [22:41] IPF [22:41] hm, can i >/dev/null but get stderr output "like" stdout? [22:42] so if i was running a script and used this inside, i would get that line's stderr as stdout of the script [22:48] Schbirid: 2>&1 [22:48] i thought that was directing that to the same location like its stdout? [22:49] emijrp: hey, dada_'s here. He's the one that made that forum post. And registered #archive on freenode ;p [22:49] OH! What you want is &>/dev/null [22:49] Now I'm really off for some sleep, fixed up some boring household chores [22:49] Then everything goes down the drain [22:49] nah [22:49] hm, i will try 2>&1 [22:50] can you explain what you actually want? [22:50] i tried [22:50] Schbirid: it depends on where the 2>&1 appears. "foo > /dev/null 2>&1" is different from "foo 2>&1 > /dev/null".... this is a bit tricky to get right (I also have to try it anew every time ;-) [22:50] Like, "redirect stderr to stdout" [22:51] yes, i want to redirect stderr to stdout AND redirect stdout to devnull. so in the end i want to see stderr as stdout, not being sent to devnull [22:51] Schbirid, dada_: i'm not sure if they do older stuff but redump.org does stuff similar to that archival project [22:51] try "foo 2>&1 > /dev/null" [22:51] that should work [22:52] arrith: that is only metadata afaik [22:52] Yeah, 2>&1 must be first for some reason [22:52] ok, cheers [22:52] I though it was the other way around [22:52] as I said, the outcome is different [22:52] Schbirid: well like the GoodSets or MAME releases, i think people release redump sets [22:52] if you put it the other way round, everything goes to /dev/null [22:54] "foo 2&>1 > /dev/null" can be interpreted like "overwrite file descriptor 2 with the value of file descriptor 1, *and then* overwrite the file descriptor of #1 with a new one going to /dev/null" [22:55] if you do it the other way round it becomes "overwrite fd #1 with a new one for /dev/null, *and then* overwrite fd #2 with the value of fd #1" [22:55] it is already past my bedtime, you aremaking it worse :P [22:55] sorry ;-) [22:56] alright, started some big forums, so i better leave this unattended and killnig kittens [22:56] good night :D [22:56] Yeah. But I too ahev to rediscover this each time I need to do it [23:39] OKAY BACK [23:41] greetz [23:52] you missed some 6502 action in #messdev :) http://git.redump.net/mess/commit/?id=e262ae8ed2a4438dd05ce633a632f42ac89a3bca