[03:01] Anyone seen a howto for backing up your life? [03:03] especially keeping everything sane so your next-of-kin can actually make sense of it? [03:15] there are services that will email next of kin when you die? [03:28] perfinion: sure. I've got a dead man's switch. more trying to figure out how to build a personal archive that my grandkids can use. [03:29] oh [03:29] but I found this, so I feel bad for not searching first: http://www.archive.org/details/personalarchiveconf [03:29] i'd write a long letter and then GPG it and give someone the passphrase and key [03:30] oh, do you guys have any use for a coder with mad system automation skills? [03:30] always [03:31] ive been busy recently so not sure what the current projects are [03:31] got a bug tracker I can bang my head against? [03:31] but we always need more ppl [03:31] no, no butracker [03:31] its mainly done in different channels for each project [03:33] well the only relevant experience I've got is my wikileaks mirror and years of grooming my backups [03:34] but I've got a weird interest in usenet and mailing lists, if there are any relevant projects [04:01] excellent [04:01] I'd like to have a copy of google groups in mbox format. [04:01] been a dream for a little while [04:01] just throwing that out [04:01] which "google groups"? [04:02] there are 3 things under that name [04:02] right. [04:02] 1) rescue the usenet archives gap from google [04:02] (the bits between where the early dump ends, and what you can get on retention right now.) [04:03] 2) archive mailing list histories [04:03] 3) what's the 3rd thing? [04:03] what providers has retention been checked on? [04:03] (and which groups) [04:03] I haven't looked very much at all [04:03] yeah, the gap is about 1991-2005 from what people tell me about what's available on giganews [04:03] right [04:03] sounds about right [04:04] iirc, there was some forums thing under the groups name as well [04:04] hmm. [04:04] google goops [04:04] oh good, PDA2011, was looking forward to watching those talks [04:05] pick a group and I'll check retention on a few proviers [04:05] wish the IA made it easier to get all the urls of every item in a collection [04:05] Coderjoe: I usually use comp.dcom.telecom [04:05] Coderjoe: that's been continuously active since forever, so it's a good canary [04:12] oldest I'm currently seeing (though I am not sure if this program is showing me everything) is [04:12] Date: Mon, 9 Jul 2007 00:04:59 EDT [04:13] what program are you using? [04:14] the wrong one for the job, since it is primarily a binary downloader [04:14] ayup [04:14] i'm about to connect in and talk to the servers manually [04:17] astraweb: Date: Sun, 21 Sep 2008 10:16:24 -0400 (EDT) [04:18] newshosting: Date: Tue, 8 Sep 2009 13:26:28 -0400 (EDT) [04:20] giganews: Date: Mon, 23 Jun 2003 11:17:25 -0400 [04:21] easynews: Date: Tue, 8 Sep 2009 13:26:28 -0400 (EDT) [04:22] giganews is the winner in text retention I see [04:22] (so far) [04:22] at least that is the oldest one given to me at the group command response [04:22] and I'm done [04:22] aye [04:23] some providers might be doing a fake lower retention due to the tier the account is on [04:25] yeah, I've seen that [04:25] or heard of it at least [04:25] the funny thing is that usually, if you ask for an article by articleID, it still gives it to you, even if it is older than your date limit [04:26] hmm [04:26] so if you have really old headers or nzbs or such, you can still get the older posts [04:26] or replies [04:30] yeah... I think that newshosting group response is fake.. [04:31] because that 2007 message I had earlier has newshosting as the last path element [04:31] i'll grab the article ID from the giganews one and see if I can get it from others [04:34] Message-ID: [04:36] astraweb: no such article [04:42] easynews: no such article [04:43] newshosting: no such article [04:44] and I double-checked at giganews to make sure I didn't screw up the command [04:46] i wonder if any schools still run usenet servers, and what retention they might have [04:50] my school quit two years ago. they had kind of shit retention. [05:01] http://blog.longnow.org/2011/09/08/the-archive-team/ [05:01] That is a HELL of an endorsement [05:01] oh [05:02] I have a bunch of shit I pulled down from divx' stage6 site before they went down [05:02] my stats from it are here: http://wegetsignal.org/stage6.php [05:03] whoa, long now?!? [05:04] (I don't think I had heard of archiveteam at that point) [05:05] I had scripts all written up and a centralized database and all that, and had 3 different systems on different networks doing the downloading work [05:05] tunneling mysql connections across ssh to talk to the db [05:05] nice [05:08] one tool just pulled video IDs from search result listing pages (and other similar listing pages) [05:08] another tool scraped metadata [05:09] and a third actually fetched the video file [05:34] I just had someone *EXPLODE* at me over e-mail [05:35] They took the absolute, complete, total misreading of the short thing I said in return to an excellent work they're doing. I mean, the absolute worst. [05:35] And then went from there on a massive rant tear, up to and including telling me to step aside for my total disrespect and insult to their abilities, goals, and issues. [05:36] SketchCow: scripts, collection. [05:36] I'm going to connect you with my boss and move it forward, OK [05:36] k [05:50] Damn, that's the worst sort of misunderstanding. [07:06] Yeah, I had to punt and send the mail to my superior as a do-over [07:06] I withdrew [07:06] There's no going back [07:15] Uh, haha - wget is taking up 1.5GB memory [07:16] SketchCow: Ouch. [07:17] Even I know when I am beeting [07:17] And beeting [07:17] beaten [07:17] When someone takes "Don't worry about this" to mean "You are incapable of understanding this", there's nowhere to go [07:17] wwwwwww [07:18] It's like someone screaming at you for offering to take out the garbage, because you just implied they're incapable of it [07:18] some people are too fragile [07:18] That's the sign of the "Abort mission!" Abort Abort!" [07:18] He wants assurance that grabbing the magnetic image off a cray disk will be legally protected. [07:18] Amount IA will discuss this: 0 [07:19] We're just not qualified and we don't have anyone qualified to [07:19] I'd suggest him to call a whambulance [07:19] But he wants to know he has some sort of gold medal saying he can do whatever [07:19] He's worried cray will sue [07:20] Coward [07:20] I bet cray is kind of quietly happy about this [07:20] ... over something on some really oudated hard drive? [07:20] Yes [07:20] It's free fuckin' PR [07:20] Anyway, I punted, sent it to my superior, I am done with it [07:20] I like to help, but not with divas [07:20] All the geeks and nerds go "Jizzpants!" hearing about it [07:21] beside the few who say "Cray who?" [07:21] it would be awesome if someone with hardware and filesystem knowlege would come forward and help [07:21] I think I should have ran wget on another machine than my work machine >_> [07:22] that's what --continue is for [07:23] not sure --continue works well with the options to modify links in downloaded files [07:23] I'll just let it run, but it's gonna be a bit laggy :D [07:23] that and the list of links to visit are the only things I can think of that would cause wget to eat so much ram [07:24] Anyway, I was trying to give this nancyboy an out. [07:24] tar: memory exhausted [07:24] on a system with 4G of ram and very little else running [07:24] Coderjoe: Party [07:24] No swap? [07:25] i didn't have swap at that time. I just added 4G of swap and am trying again [07:25] ah [07:25] 4G of ram? who needs swap? :D [07:25] I got a little swap, even though I got 4GB memory [07:26] I'll never let the installer choose again though, last machine I installed it auto sat 12GB as swap [07:26] fuckin retarded [07:26] i ran into another case where a broken autotools config caused automake to forkbomb and consume vast amounts of ram [07:56] mmm [07:57] expected raw tar file size: 1452086937600 bytes [08:25] and tar is up to 7.5G without having output anything yet [08:26] that's kind of a big tarfile [08:26] how are you making it? [08:26] how do you mean, how am I making it? [08:27] paste your tar commandline? [08:27] tar shouldn't be using that much ram [08:27] tar cf - --numeric-owner --no-recursion --totals -T 14346.LRWBU [08:27] and the -T file is 4G [08:27] ah [08:27] nevermind [09:16] sweet [09:16] 4GB isn't enough swap [09:16] Heh [09:35] josephhol: shamir's secret sharing algorithm - check it out [09:40] Good morning, how is everyone doing this fine day ? [12:37] Umm.. I up at 1.8GB RAM on wget [12:37] I fear coming back to a dead machine on monday :D [12:47] ersi: whee! [12:48] hullo [12:54] Or maybe my disks will be full of instructables >_> [12:55] * kin37ik goes back to poking fortunecity [12:59] Hm, lots of small files - I'm "only" at 7.3GB instructables so far [12:59] have any estimate as to how big instructable sis? [13:00] instructables is* [13:00] I have no idea what so ever [13:03] hmm [13:04] We'll see. [13:14] http://www.archiveteam.org/index.php?title=Projects#Dead_Projects "EmuWiki.com Complete Emulators Collection v0.2 [All platforms]" is at underground-gamer.com [13:17] is it a package or are they using the actual wiki but updated? if it's a package itll be worth my time grabbing it [13:19] it' a 13gb torrent [13:20] link? [13:20] http://www.underground-gamer.com/details.php?id=40311 [13:21] you would need an account though [13:21] i cant sign up [13:21] max user acc's [13:21] ill keep that link bookmarked though [13:22] http://pastebin.com/1Rkn4Ev4 [13:22] nah, if you want i will grab and give you a http link [13:24] i just need the torrent file really [13:24] unless you got an invite to UG? [13:25] magnet link me [13:26] the torrent would run under my account [13:26] which is a no-no;) [13:27] But the magnet link is an ID to the torrent and can't be identified with you [13:28] it an account based torrent tracker, this would not work [13:29] demonoid.me is acocunt based, and still [13:29] anyone can just grab whatever and it wont log under that persons account [13:29] magnet link = DHT [13:30] i cant believe you guys do not know how these work :P [13:31] throw me the magnet link and we shall see [13:31] kin37ik: Have you never used a real private tracker? [13:31] Aw man [13:31] uhm [13:31] dont know lol [13:32] Heard about ratios? [13:32] yep [13:32] Private trackers are strict about only ONE user using each account, and keeping ratios good [13:32] mmmm S: [13:33] I think I have 10T credits on UG :) [13:33] So you either get an account and download it from the members, or have a member download it for you - putting it somewhere :P [13:33] well i cant create an account till the acocunts are pruned, according to the site [13:33] If you get an invite from a current member, that's another way in usually [13:34] and those torrents are non public, no dht [13:34] blame it on the retro mafia [13:35] anyone know what 'internal error DISC0272' on an HP Touchpad is? Seems I can't archive EVERY app before HP kills it off [13:35] google it? [13:36] no hits [13:36] that unit is bad, send it to me [13:36] what? cant be right [13:37] i'm well over 1000 games downloaded, and many things fail at that level [13:37] quickoffice takes 4 minutes to load, since itwants to index every little .txt file in every app folder.. which is just wrong [13:38] can't seem to find where HP stored the first two issues of their online 'Pivot' mag either.. so those might be gone forever [13:38] then do a system search for them? [13:38] sept issue has 3 free app promos hidden on page 26 in case anyone cares :) [13:39] i'm stupid, somehow i can't get TP to see my genned SSH keys, so no shell, thus no 'find' [13:39] these older apps will be useful later.. all the newer 'updates' have phone home crap in them to enable adware [13:41] only had 2 hard failures so far.. the KQED radio app and some game called 'J@cker' [13:42] guess no-clobber doesnt like me today... [13:45] and im still trying to peice together fortunecity's directory structure so i can poke it a bit more efficiently [13:50] kin37ik: I've got a list of 400,000 fortunecity urls if you want them. [13:50] alard: cheers, send them my way (: [13:50] (I've been playing with Google a little bit.) [13:51] alard: awsome, send them my way, pastebin or something? [13:52] id better move wget to my secondary drive lol [13:52] I'll have a look. First I've got to get them out of Redis. [13:52] no worries [13:52] are there google scrapers for result pages? [13:53] I've written my own scrape that asks Redis for a word (from a set of dictionary words), searches on Google and extracts the urls, adds them to another set on Redis. [13:54] awsome [13:56] geez i wish my nos ewould stop running like a tap and making my face burn when i want to sneez [14:07] uh oh [14:09] nice [14:09] woah [14:10] that is alot of urls [14:10] i have alot of poking to do this week [14:15] ahaaa, so they did keep those sites [14:15] alard: you sir, are a legend! [14:46] that blackout was strange times :/ hope i didn't miss anything [14:48] is there certain guidelines you have to follow when adding to a page on the archive team wiki? [15:03] well, this is an interesting find [15:04] kin37ik: Not that I know of. You just add a page and if it looks like spam it will be blocked later. [15:04] alard: eh? [15:04] Perhaps it's useful to copy the project panel from another project's page, if it is about a project. E.g. http://www.archiveteam.org/index.php?title=MobileMe [15:04] ahh [15:04] The wiki. [15:05] ahh right [15:05] head was in a different place [15:09] was thinking a bit too much on these url's i think [15:11] Ah, I see. [15:12] yeah, i did a little bit of googling wiht some urls [15:12] turns out that pages from the original website back from 96 and 97 still exist [15:12] though half of them return 404's i presume from people either buying a domain or just wiping out the contents [15:40] goddamn [15:53] enough poking for tonight, time for bed, laters [17:21] G'morning [17:31] hey sketchcow, get my email? just checking [17:31] Yes [17:31] Shoving through things today [17:31] ok [17:34] Today I set up an MRTG graph. I haven't done that in years. [17:35] Probably since 2001. [17:35] http://batcave.textfiles.com/ocrcount/ [18:06] SketchCow: watched your defcon talk earlier, you keep being an inspiration! [18:07] I like talking! [18:08] I like watching you talk! [18:09] I have a presentation on the 30th at Derbycon [18:15] huh [18:15] emacs actually crashed [18:44] SketchCow: ? [18:45] I having been playing with khanacedemy for a little bit, do you have any ideas on how to teach history ? [18:45] lesson 1: you will be taught the history of the winners [18:46] lesson 2: The winners are assholes [18:46] This is quite a question. [19:14] Well have you ever thought about it ? [19:16] I teach people a lot and certainly create ways to teach people. Khan Academy is just another platform, one for video, that partially takes video from other sources and repackages it. [19:19] What do you mean by "takes video from other sources", do you mean when he does a series based off practice test stuff(gmat, sat, etc) ? [19:24] http://www.khanacademy.org/video/salman-khan-talk-at-ted-2011--from-ted-com?playlist=Khan%20Academy-Related%20Talks%20and%20Interviews [19:24] I mean like he takes TED videos, and puts them up. [19:24] "Given that the mean length of a year is 365.2425 days, Office 365 only needs to maintain 99.93% uptime to stay true to its name." [19:25] Is that Bill Gates? [19:25] Likely. [19:25] yeah, he talks with Sal at the end of his talk. [19:26] I do find it strange he has playlist of all his talks and media related stuff on the khan academy site. [20:56] what's with the topic? someone mess with media mail? [21:00] <3 media mail [21:00] I guess here is the perfect place to ask [21:00] Is there a good way to diff two directories? [21:00] two very large directories, that is [21:01] I know there is atleast 65% commonality between them and I want to deduplicate [21:01] maybe just rsync? [21:01] by hand if I must, by automation if I can [21:01] eh? [21:01] rsync from one into the other, delete original [21:02] mmm [21:02] but [21:02] hmm [21:02] I'm not sure that accomplishes what I want [21:02] I'm not sure what you want [21:02] ideally, I'd be moving things that weren't duplicated into its own folder [21:03] you could probably has some script out using md5sum and symlinks [21:03] so I'd end up with: 1 folder with duplicated content and 1 folder with content only present in one of the previous two folders [21:05] copy them together with a tool that auto-renames duplicates, like "name-2" [21:05] then search for the "*-2" and move them somewhere? [21:05] rsync --dry-run maybe [21:06] ^ [21:06] I guess I'll have to figure out how to use rsync then :> [21:06] protip: back up data before learning how to use rsync [21:07] haha alright [21:08] I've got a backup of the smaller folder, but the larger of the two has only a single copy [21:09] of course, the bigger dataset is easier to lose :) [21:09] yup [21:14] -c will be useful [21:35] ohhh yeahhhhhh, rsync is my friday night [21:37] yea, rsync rocks [21:42] it's surprising how clever the algorithm is [21:42] I was reading about it a few weeks ago [22:20] hey all. I'm working on my mirroring script. Wonder if anybody has some feedback on it. https://github.com/human39/scruffy/blob/master/scruffy.pl (feel free to fork, muck and push!) [22:51] human39: http://twitter.com/geekmire/status/18108572789379072 http://twitter.com/geekmire/status/18216874642767872 http://twitter.com/geekmire/status/80495403572789248