[00:01] I see a number of 500 errors on italian profiles [00:01] though I suppose those are blocked by the .incomplete check [00:02] hmm [00:02] data/it/m/ma/man/mani1985/wget-phase-3-0f08s3.splinder.com.log:2011-11-22 04:03:01 ERROR 504: Gateway Time-out. [00:02] data/it/m/ma/man/mani1985/wget-phase-3-0f08s3.splinder.com.log:2011-11-22 04:38:35 ERROR 502: Bad Gateway. [00:04] possibly also masked by a .incomplete file [00:10] yipdw^: I added a file to my gist with the times of the noconn pages [00:27] hmm. should the script only check italian profiles for splinder_noconn.html? [00:29] Wyatt: in case you see this, you still need to upload your Belios grab; get a rsync slot from SketchCow [00:46] closure: I'm aware. Also have some google groups stuff. I'll bring my external to work tomorrow [00:47] thanks. need to finish berlios soon [00:52] alard: ./dld-me-com.sh: line 324: 6893 Segmentation fault (core dumped) $WGET_WARC -U "$USER_AGENT" [00:52] alard: probably when my disk filled up [00:53] closure: Yeah, meant to get that up two weeks ago before I left town. Sorry I've been behind on that. [01:00] winr4r: Glad to hear I could help your buddy. [01:01] 254M categories [01:01] 4.0K letter.txt [01:01] 53G poems [01:01] [db48x@celebdil unified]$ du -chs * [01:01] 4.0K robots.txt [01:01] 120M site [01:01] [01:04] oooh: http://aktuell.ruhr-uni-bochum.de/pm2011/pm00386.html.en [01:08] I need to study various MitM techniques more; I feel like there's a lot of ground to cover there. [01:20] yea, it's the main way cryptographic protocols are broken [01:56] Blasting older magazines into the archive now. [01:59] SketchCow: So is this still the phase where you're aggregating things that are available from elsewhere? [02:00] Telling tracker that 'it:the_Godfather' is done. [02:18] Yes, still DEEP in that. [02:18] There's so many left to do. [02:19] Oh wow, it's still got to wind down on 3 users. [02:20] That's really impressive how many disparate full magazine runs are floating around. [02:28] What's the copyright status of these things? Has anyone come forward to C&D their magazine archive? [02:28] Ask me another one [02:31] SketchCow: I have somewhere around, I don't know, a couple of hundred users. [02:32] I need an rsync slot. [02:32] Ah, I see. ;) [02:32] wait, you CAN rsync while the script's wind- nevermind. [02:32] there's literally 3 users left before it's finished winding down. [02:33] i've been on 3 to 7 users for two days [02:33] (two different machines) [02:37] Crap. [02:38] Well, time to clean incomplete users up. [02:38] be careful. the ones still running are also incomplete [02:39] Finish it, then we'll go [02:44] Right. [02:52] http://www.archive.org/search.php?query=collection%3Acomputer-magazine-rack&sort=-publicdate [02:54] This is going to be quite a spetacular set of variant magazines [03:12] http://batcave.textfiles.com/spec/HappyComputer/Issue8405/Pages/HappyComputer840500001.jpg [03:13] :D [03:18] mmm, pepperoni [03:22] Happy Computer was pretty awesome, loading them in now. [03:22] You're on a roll today :D [03:23] I'm on a roll every day. [03:23] But not everyone see's the day's roll. [03:24] 2453122 [03:24] [db48x@celebdil poems]$ time find -name '*.html' | wc -l [03:24] real 134m16.059s [03:24] sys 5m33.791s [03:24] user 0m32.564s [03:25] SketchCow: do you have anything more from lulupoetry, that you know of? [03:25] Might show up [03:26] ok, then I'll wrap this up [03:27] will I be able to amend the existing item on archive.org, or do you have to do that since you created it? [03:30] I can add anything [03:30] I can change anything [03:30] rename anything [03:30] rederive anything [03:30] BUT I CANNOT LOVE [03:30] WHY CAN'T I LOVE [03:30] aww [03:31] ls [03:31] WHY CAN'T I LS [03:31] . .. [03:31] 000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 [03:34] SketchCow: listen to Miserere by Gregorio Allegri; that will help with the love thing [03:35] on the other hand, you would have to spend 10 minutes not archiving anything in order to listen to it [03:35] so it's a trade-off [03:36] db48x: not really worth it. [03:36] Oh hey, it finished grabbing another user! [03:36] Approximately 20 minutes ago. [03:36] heh [03:37] alard: hey, are you about? [04:16] http://www.archive.org/details/whichmicro-1983-04 [04:33] git status [04:34] er [04:34] whoops [04:37] So, dreamhost just took an enormous shit [04:39] What's the copyright status of these things? Has anyone come forward to C&D their magazine archive? [04:39] USER WAS BANNED FOR THIS POST [04:39] Dreamhost did? [04:40] They do that a lot [04:40] I don't think my VPS has had more than a week of uptime in the past 3 months [04:40] my dreamhost site is up [04:40] Mine is most of the time but it jsut seems that the server crashes itself every few days [04:41] A lot of mine are on the same server, which means down. [04:41] Dreamhost is pretty fucked [04:41] Their shit is pwned [04:42] Care to elucidate? pwned you say? [05:13] Yep, that machine is down [05:13] Good job, boys [05:13] I'll just keep doing my other tasks until then [05:14] Down to 60 items in mailbox, this is a big deal [05:16] http://tools.ietf.org/html/rfc2324 we need to get this implemented. [05:17] Weird, I thought they were using ceph storage nodes for everything. [05:17] I don't care if it's an aprils fools joke. [05:19] "418 I'm a teapot" [05:20] Huh, that one may even be possible. To be sure, if people can implement RFC 1149 and put NetBSD on a toaster, surely this is within reach! [05:21] It must be! [05:22] Huh, missed that one: "In an attempt to anthropomorphize the bit streams on countless physical layer networks throughout the world, we propose a TCP option to express packet mood [DSM-IV]." [05:25] Wyatt|Wor: Just sites get hacked a lot on there [05:26] cf archiveteam wiki [05:26] Never try and use RFC 1149 over IEEE 1394 as it results in dead carrier [05:27] Yeah, they splatted this thing [05:28] underscor: "sites get hacked" can mean a lot of things, though. People think they've been hacked when they install a malignant theme and it defaces their homepage. [05:28] Right. I don't mean that kind [05:29] I mean the "people installed scripts/modified shit in my public_html" from within the server [05:30] "OTHER people installed scripts/modified shit in my public_html" [05:33] Ah, interesting. I wonder how they're doing that? Perhaps a variation on the symlink attacks we've been trying to find a good containment strategy for here. [05:35] cf: archiveteam wiki (seen here: http://multimedia.cx/eggs/cloaked-archive-wiki/ ) [05:42] And the compromise vector? [05:44] i don't know the vector. [05:45] but the start and location of the compromise: [05:45] May 13 11 22:17:23 Ah, found it. [05:45] May 13 11 22:17:32 WebStart.php: Wyatt|Wor: OH, do you work at dreamhost? [05:50] underscor: No, but I work for a hosting company. [05:50] Oh, good [05:51] I was like, "Whoops, I'm a dick" [05:51] hahaha [05:53] i don't use dreamhost, so I don't know what's going on there. pretty sure it is other people that have accounts with them, though. [05:53] If I did I probably wouldn't be offended anyway. Security isn't the sort of thing you can bluff, bluster, and deny until it stops nagging you. [05:54] Some companies think otherwise [05:55] There's this interesting ongoing problem where +FollowSymlinks allows any user to view / and family. [05:56] Yeah, I usually disable that [05:57] Why not chroot it? [05:57] I do that with my thttpd processes to shield that attack [05:58] I think it's called mod_security [05:58] +FollowSymlinksIfOwnersMatch [05:59] Wyatt|Wor: Jumpline, or a reseller of Jumpline? [05:59] underscor: A few things. It can be reenabled in a local .htaccess. Symlinks can be uploaded already in place. Symlinks are provided by the kernel; ln isn't the only way to make them. [05:59] +FollowSymlinksIfOwnersMatch [06:00] Also, chroots fix that [06:00] ln -s / would just link to the chroot [06:00] chroot is a kernel api. a process in a chroot cannot access stuff outside it (not inclding any kernel vulns that might be discovered) [06:00] Coderjoe: http://httpd.apache.org/docs/2.2/mod/core.html#options [06:02] i know the options. check out AllowOverride [06:02] There are patches for apache that turn FollowSymlinks into SymlinksIfOwnersMatch, but I haven't gotten clearance to deploy that yet [06:02] and in a shared hosting environment like dreamhost, all the users on a machine share an apache instance [06:02] I LOVE SECURITY LIKE THAT [06:02] yea [06:03] Yeah, it's not exactly tenable, but this is the reality I walked into. [06:03] the option to make it check the permsissions is the way to go [06:04] underscor: Jumpline itself. (Also Hostican, Christian Web Host, HostingRails, GemStream, OpenSourceHost, 100MegsWeb Hosting, Web Intellects, WebHost Giant, and possibly a couple more I've forgotten). [06:05] That's... [06:05] A lot of brands [06:06] Yeah, tell me about it. :/ [06:07] I think the Gemstream and HostingRails brands are getting killed soon. [06:08] Cannibalism as a business model has its drawbacks. [06:14] That's quote worthy [06:14] Do they do different things, or is it just all rebranded shared hosting? [06:15] love it [06:16] underscor: They're all formerly separate hosting companies. The Man shops around, buys a company, and we migrate them in. [06:16] Apparently it's quite lucrative. [06:16] Aha [06:16] That's interesting [06:17] Yeah, I was surprised too. But it's more common than I would have guessed; I hear Bluehost and Dreamhost both do a lot of that. [06:17] I suppose it makes sense [06:18] I wouldn't want my hosting transferred to either of those companies though [06:18] * underscor buys hosting from jumpline so he can directly complain to Wyatt|Wor in here [06:18] heh [06:20] Oh right, Coderjoe, AllowOverride would work, but apparently it would break a LOT of Joomla installs. [06:30] underscor: i can't get js-wikireader working [06:34] http://imgur.com/gallery/Thedr [06:56] * SketchCow is down to 48 letters in inbox. [06:56] Hurrrrrraaahhhhhhhhhhhhhh [06:57] time to mailbomb sketchcow [06:58] http://imgur.com/gallery/53uLJ [06:59] I just got rid of an e-mail [06:59] I wrote this "here's what I'm up to" [07:00] Someone just wrote, essentially, "So, I am hijacking your announcement to say I would love your talents in pointing out to me a bunch of historical textfiles related to bandits and being considered heroes". [07:00] And he is a history student. [07:00] Well, I decided at this late hour that my spending 20 years collecting all sorts of materials connected to these subjects to then have them all done by a search engine is PERHAPS ENOUGH WORK [07:01] I sat on it a while before deciding. [07:01] My suggestion to Jason is to delegate, delegate, delegate. Find someone else to build/dupe the torrent. Coordinate, but clearly point out who is actually in charge of each effort. [07:01] For the donation/DVD stuff, I humbly recommend a part-time office helper. Maybe just a friend or family member, but having some help could make the difference. Of course, my expectations were always low since I know he does this work as a side project, but it would be nice to have a slightly better response rate. [07:01] WHat where what. [07:02] * underscor wouldn't mind being a Jason Office Monkey™ [07:02] http://ascii.textfiles.com/archives/3395 [07:05] Acceptance, 2011-11-21, 11:43:00, LANSDOWNE, PA, 19050 [07:05] Arrive USPS Sort Facility, 2011-11-23, 05:33:00, KANSAS CITY, MO, 64121 [07:05] Processed through USPS Sort Facility, 2011-11-21, 21:17:00, PHILADELPHIA, PA, 19176 [07:05] what the fuck [07:05] why would you go from pennsylvania to missouri to get to northern virginia? [07:06] question not the ways of shippers [07:06] All of this over a 1GB disk? [07:06] 1TB, no? [07:06] (re: the escalation post) [07:06] ah [07:07] "I'd be mailing a 1gig external usb" [07:07] He typoed [07:07] gotcha [07:07] yeah geocities = 0.9 gigabytes. [07:08] SketchCow: would love to hear if he responds [07:18] He doesn't know about the post. [07:18] He sent me mail saying if I didn't respond by Monday, he'd pull the trigger [07:21] just getting the the threats [07:21] So wait [07:21] You guys get a site if you go to ascii.textfiles.com? [07:21] I do [07:21] Good [07:21] yes [07:22] I will see if I can find the problem here [07:25] 0 2:25AM:alex@alex-desktop:~/emcore 156 π curl -I ascii.textfiles.com [07:25] HTTP/1.1 200 OK [07:25] Kinda a shitty route to it though [07:34] okay I got this js-wikireader thing, I produced a dump file, I installed the extension in chrome, I'm at the "add dumps" page, now how in the name of fuck do I add my dump [07:34] Dunno, I'll try and figure it out with kevin tomorrow [07:34] s/kevin/antimatter15/ [07:45] what is js-wikireader? [07:49] is it possible to use the -k option (--convert-links) outside of the original execution? Like on a site you have already archived? [07:49] this is for wget [07:50] nope [08:29] https://github.com/antimatter15/js-wikireader [08:35] Great. [08:36] Aurora froze when i went to open a new tab, so now I have the right click menu stuck on my sc- [08:36] oh, just went. [10:12] is someone able to fix this wget problem, please? http://archiveteam.org/index.php?title=Talk:Blogger [10:13] --exclude-domains=LIST comma-separated list of rejected domains. [10:13] -D, --domains=LIST comma-separated list of accepted domains. [10:13] also, seeing the logs or some sample page would help [10:13] and level=2 could be a problem [10:14] I tried to accept those domains but they weren't followed [10:14] I'm trying -m now [10:14] -m, --mirror shortcut for -N -r -l inf --no-remove-listing. [10:14] yep [10:15] which shows -l 2 is a problem, probably [10:15] all that does differently is turn on timestamps and set level to infinity [10:15] yes [10:16] but a thing I don't understand is: you can use --page-requisites, but how can you follow and download all and only the links from an embedded image, which usually go to the larger version of it? [10:16] listing all the domains of such larger versions is not very handy [10:19] perhaps one can play with --ignore-tags but I'm not able to [11:39] Wow, so everything got really interesting at the end of this shift! The New Guy used passwd on a shared server. The carnage...rootkits everywhere. [11:42] Whee! [11:49] that fast, eh [11:51] * chronomex zzzz [11:53] No, he changed it sometime yesterday from what we've been able to tell. [11:53] Things exploded about an hour and a half ago [11:53] From an IP in Lebanon. [12:10] Ugh, I'm going to go attempt sleeping. Night all. [15:14] NotGLaDOS: http://httpbin.org/status/418 [15:56] t Wyatt|Wor You guys should scan for that stuff daily, if not every few hours :V [17:04] Back [17:06] morning! [17:57] is there a place for splinder to go when we're ready to upload it? [18:24] Yes [20:54] SketchCow: my suggestion is to find the person who decided to erase the apollo tapes, interview them, then eat them [20:59] WHO DARES ERASE THE APOLLO TAPES?!?!? [20:59] bsmith095: they actually did [20:59] no seriously, who?, and also, what are they? [20:59] the original telemetry and TV tapes from apollo 11 were erased and re-used, probably in the 1980s [20:59] OMFG y9our kidding me [21:00] this is nasa, here, they keep everything how could they erase those?! [21:00] bsmith095: for the same reason that history is often destroyed because of very-short-term expediency [21:00] they don't keep everything [21:01] for example, they don't even have most of the design documents for the apollo spacecraft [21:03] again AFAIK, nasa has kept nearly every physical part, of every spacecraft they have, i a bone yard somewhere [21:04] can i time an rsync after its already started, automatically [21:04] Hopefully the digital age solves this. [21:04] bsmith095: the parts of the spacecraft themselves that weren't sent into solar orbit or didn't burn up on re-entry, yes [21:04] well yeah thats what i meant [21:04] bsmith095: we're still missing lots and lots of things [21:05] as in pretty much all the design documents, what we have now is a rounding error on what has been destroyed [21:05] are these important? sending incremental file list [21:05] rsync: link_stat "/home/ben/splinder-grab/data/it/a/al/alb/albe^" failed: No such file or directory (2) [21:05] rsync: link_stat "/home/ben/splinder-grab/data/it/g/gy/gyp/gypsy!" failed: No such file or directory (2) [21:06] seems like the special chars, killed the dump [21:06] * winr4r doesn't know [21:09] 155,381 items, totalling 8.3 GB in splinder-grab [21:11] :/ [21:11] good that you saved it, ":/" that stuff is just being destroyed [21:12] well *I* think its alot [21:12] yes, it is [21:14] i remember this one part of distributed preservation of service attack, jason's recent defcon talk, where he detailed all the project yahoo killed recently, one of them, yahoo briefcase, 25mb of space, probably less than 2 gb total, shutdown, and purged, "why, no spare usb key?!" [21:17] haha mhm [21:18] its on youtube if you havent seen it [21:19] i have! [21:19] i'm sure i saved a copy somewhere, too [21:34] yeah me, too, he's great when he's not reading off a script [21:55] does he ever read from a script? [21:57] his speeches are awesome though, i pointed one of my friends to one of them last night, he mentioned geocities, she was like "oh i had some stuff on geocities that i can't get back" and i was like "yeah, talk to jason" [21:58] she ended up getting back a poem that she wrote about her son who was killed by a drunk driver [21:59] once again, archive team scores. [22:00] archive team = internet superheroes [22:00] * winr4r can only sit back and admire from afar, since he has a 20gb/month bandwidth cap and 384kbit/s upstream [22:10] only one profile left on the ec2. been spinning down for almost 5 days now. (7 hours short of that) [22:14] underscor: Re your earlier comment, Yup. And now everyone who ISN'T me also knows this. [22:38] i'm backing up the screen savers [22:38] off of youtube [22:39] or more of what was not in techtv torrent [22:41] Hi all: I added a Splinder verification script to the git repository. It should to more/better checks than the check and fix scripts. [22:42] hiya alard [22:42] I think I covered the errors that I know of, but it may be useful if someone to check if anything is missing. https://github.com/ArchiveTeam/splinder-grab/blob/master/verify-splinder-profile.py [22:42] heh. so, like a dumbass, I erased my EC2 ssh keys [22:42] probably still got a copy on a desktop at home [22:44] you can download them from ec2, iirc [22:45] I've not seen that option [22:45] alard: is it correct to say that that script supercedes check-dld.sh? [22:46] i've been thinking we might want to start using python for the downloader scripts... aside from wget, it reduces the number of things to worry about (like bash being new enough for associative arrays, proper expr support, etc) [22:46] I agree [22:47] alard: does this also check for the stuff check-dld.sh checked? [22:47] I never knew there were so many differences in bash versions until I started doing AT stuff [22:48] to say nothing of the differences between BSD and GNU versions of utilities [22:50] i'm not seeing a check for splinder_noconn.html [22:50] https://github.com/ArchiveTeam/splinder-grab/blob/master/verify-splinder-profile.py#L171 [22:50] Coderjoe: https://github.com/ArchiveTeam/splinder-grab/blob/master/verify-splinder-profile.py#L171-173 [22:50] er, yeah [22:51] ok. irblind [22:52] eek. needs hanzo. planning on distributing it with the script? [22:52] I agree with bash/python point, by the way. (I originally thought the advantage of bash was that it's more or less the same everywhere and you don't have to install anything, but that turns out to be false. What's left is a somewhat clumsy scripting language.) [22:52] The hanzo files should be there. [22:53] Coderjoe: it's in the source tree [22:53] i haven't looked outside this file so I didn't know they were also checked in [22:54] Coderjoe: Yes, this could be a replacement for check-dld.sh. (Although check-dld.sh is much faster.) [22:54] this doesn't check for a .incomplete file... are you leaving it to the caller to not pass profiles that are still being downloaded? [22:55] This was more or less intended as a after-uploading check to run on batcave. [22:55] If there is an .incomplete file, the tests should not pass. The WARC files are incomplete, for example. [22:55] if you've got an incomplete download, you shouldn't have the wget metadata resources [22:55] -s +records [22:57] yipdw: btw, go to aws.amazon.com, drop down the account box and pick "security credentials" [22:58] I think you can get the keys from there [22:58] hmm [22:58] ok, that directs you to the ec2 management console [22:58] hm [22:58] bush [22:59] bust [22:59] "For your protection, AWS does not retain your private key." [22:59] it's fine [22:59] yeah, I just saw that [22:59] I should have a copy of it on my desktop at home [22:59] it just means I can't access the instance from here [22:59] so it'll have to wait until I get back [23:26] ok, this is sad [23:26] JSON::ParserError: 710: unexpected token at '"2011-11-26 16:26:55 -0700"' [23:26] ruby-1.9.2-p290 :004 > JSON.parse(Time.now.to_json) [23:27] er, wait [23:27] never mind, that was me being dumb [23:34] in today's apple II floppy dumping stack: http://interbutt.com/temp/drruth.png [23:34] oddly enough on a bootleg disk from a store which was apparently in saudi arabia [23:37] DFJustin: cool!