[00:14] http://www.archive.org/search.php?query=%28collection%3Atop_domains%20OR%20mediatype%3Atop_domains%29%20AND%20-mediatype%3Acollection&sort=-downloads [00:15] Why are all the top downloads from porn sites? [00:15] admittedly, that is a great website [00:15] because the internet is for porn [00:16] is IA actually slurping the videos or just pages/images [00:16] slurping, great choice of words [00:16] I get that someone would go to a porn site to watch porn, but why go to Internet Archive and download a warc archive full of porn instead of just going to the site directly? [00:17] soultcer: free, easy and (likely) virus-free porn, can't really complain [00:17] porn sites tend not to be great for usability [00:17] pop-ups, interstitial ads, etc [00:17] Using the wayback machine probably won't improve that... [00:18] not that I would know mind you [00:18] i wonder if that ever happens [00:18] vintage internet porn via the wayback machine [00:19] "Why yes, I do believe that your plumbing is broken" [00:20] BlueMax: no, no. "Why yes, ma'am, I do believe that your plumbing is in need of repair." or "is in disrepair." [00:21] Shown up on my own joke [00:22] "Oh, no! How shall I remit payment for services to be rendered?" [00:22] Joke's on Coderjoe for actually caring about the dialogue in porn movies. [00:22] Good point! [00:22] "If you don't notice the snatch I have some bad news for you..." [00:22] soultcer: he was going for some old-timey kinda-steampunky vibe [00:23] and this was before things got nekkid [00:23] ;-) [00:24] http://gizmodo.com/5884684/why-ill-never-trust-a-human-with-my-data-again [00:25] haven't read more than the first paragraph, but....backups? [00:25] also, you won't lose FTP and SSH just because the domain name goes down [00:26] oh, several people are affected [00:26] I have some experience with the service-provider-owner-vanishes situation [00:27] I linked it because I thought it was slightly relevant [00:28] a BBS/ISP I used and helped out had rented colo space from someone. at some point, the guy just vanished. Long court battles to get permission to enter and recover hardware. [00:28] the article author says he's backing his own stuff up [00:28] wonder if we can help these guys http://outercircle.wikispaces.com/ [00:29] at the very least grab dns for the domains [00:29] what are the domains? [00:30] http://outercircle.wikispaces.com/users_to_warn [00:30] "There's a wiki that some customers have set up to help people recover data[outercircle.wikispaces.com], but that only helps those who are aware there's a problem." [00:31] what about people that are not aware there is a problem [00:33] yeah exactly [00:35] the whole "some people are already locked out, due to an expired certificate" [00:35] uh... no [00:35] tell your browser to ignore that problem to get back in and save your shit [00:35] well, tell them, not us [00:36] OK, slammed out that stuff. [00:36] Hiya SketchCow. [00:37] here's the article dude's twitter https://twitter.com/mat [00:37] he mentions Metafilter, but as far as I can see there's not _that_ many posts about the hoster there [00:38] I don't think Twitter is the place that people would complain about something like that. [00:39] I mean somebody can poke him to connect with us [00:42] I see at least 234 hostnames on ns(1|2).sabren.com, I wonder if he contacted everyone of them [00:43] It'd be better to operate under the assumption that he didn't [00:46] is that the only domain being used for DNS? also, what about people that were not using DNS there [00:49] from the gawker comments: CornerHost.com hosts 469 websites [00:52] how nice [00:52] another comment says "The owner/operator of Cornerhost is apparently active on github as of 4 days ago..." https://github.com/sabren [00:54] By the way, I pulled SOMEBODY's twitter feed away this weekend. [00:54] Someone was uploading new twitter stream captures. [00:54] I can help us get it to the new machine and keep going, but I did kill that directory. [00:56] Well, in case somebody wants to have a go at warning cornerhost customers or downloading their stuff: http://pastebin.com/ge54giU0 [00:58] ugh [00:59] http://businessprofiles.com/details/SABREN_ENTERPRISES_INC/GA-0145887 [00:59] Status: Admin. Dissolved [01:00] apparently money problems: http://withoutane.com/ [01:01] 2009? [01:11] looks like he moved from Georgia to Texas [01:12] man [01:12] even his resume on his personal site is outdated: http://www.michalwallace.com/resume [01:13] lists nov 2001 to present for sabren enterprises, but the businessprofiles listing says it was dissolved [01:13] we should send a repo team [01:13] "you have failed to mantain data integrity, we are here to take your servers" [01:14] http://versionhost.com/ [01:14] http://versionhost.com/contact/ [01:15] if you have an emergency, you may also contact the missing guy on an atlanta-area number [01:16] so it looks like versionhost is another hosting company affected by this [01:16] old website [01:16] http://michaljwallace.com/ [01:17] new one [01:17] jeez [01:17] https://twitter.com/#!/tangentstorm [01:17] he plays minecraft [01:18] What exactly are we doing here? [01:18] https://twitter.com/mat/status/169226412992114688 [01:19] I expect to come out of this with a much stronger, effective, and reliable business system that isn't dependent on just me holding everything together. [01:19] BEst line ever [01:19] lol [01:21] gawker actually manages to do something GOOD? [01:24] "10-year-labor-of-love web framework" is always a bad sign [01:27] the "it was set to autorenew" attitude kinda sucks... does the registrar have current billing details to be able to do the autorenew? (especially after a 3-state-away move?) [01:28] every registrar i've ever been with just keeps the domain if you don't pay [01:28] to renew [01:29] many registrars institute a grace period [02:18] t chronomex Do you know Pi? [02:18] (aka Anthony Martinez) [02:30] hello everybody [02:31] I doubt there's much that can be done about this and I doubt it's going to be a big deal but I just wanna mention something ImageShack is doing [02:31] I've been using their site since 2006 so I've accumulated a lot of images there. I checked the My Images section and see this message: [02:31] You have 641 photos stored. Since you're over the 500 photo limit you'll need to upgrade to a Premium account or you'll only be able to keep 500 of your recent photos. Older photos will expire on the 1st of March. [02:32] And the My Images section is pretty much they only place you'll see it. It's not on the front page and they didn't bother to send an email [02:33] o shi [02:34] luckily I only have 428 images, this is going to be hell for old forum threads and such though [02:34] 500 is kind of a generous number so I doubt there will be too many people at risk here [02:35] but maybe I'm underestimating [02:35] hmm maybe [02:37] I emailed them last night asking if there was a way to mass download my images (which there doesn't appear to be) and mentioned how they should publicize this more but I just got an automated response with links to their FAQs and a message that said "If required, your email will be answered as soon as possible." [02:37] and well, I didn't get another email [02:38] suffice to say I won't be using them anymore [02:38] actually the only reason I've still been using them to this day is because that's just where all my other images are [02:57] rabidabid: you can use the mass-forumlink generator to get the links, then throw the list at wget or the like. [02:59] hmm [03:00] at least I've done that in the past to make a big gallery page of everything [03:01] I might make a greasemonkey script to automate things a bit now, however [03:01] hmm. I'm at 444 [03:38] underscor: yes, I know Pi and SaburWulf [03:43] heh gave my friend a heads-up, turns out he has 1207 images on imageshack [03:43] ick [03:43] so may not be too rare [03:53] chronomex: Awesome, that's cool [03:53] Funny how small the world is [03:54] there are not a huge number of furries in the world [03:55] it just seems like so because they all whine about people who aren't furrries [03:55] very true [03:55] chronomex: are you furry? [03:56] fuck no [03:56] and I'm not scalie either [03:57] So that has a separate name [03:57] And you knew it did... [03:57] nitro2k01: depends on who you ask [03:57] I've heard people call everything furry [03:57] I consider furry = all anthropomorphized critters, regardless of skin [03:57] chronomex: No need to be so defensive :P [03:57] I have too many friends who are [03:57] are you a microbie? [03:57] aaaaugh [03:57] lolol [03:58] "I want to be an amoeba!" [03:58] http://scalie.deviantart.com/art/No-Eternity-coloured-186255315 [03:58] (Googled for scalie) [03:58] sabur was actually my roommate for some months before he left town to attend a loser school and fuck some girl who whines too much [03:59] hahaha [03:59] now they're engaged [03:59] oic [03:59] she still is bitchy [04:00] let's stop this conversation [04:01] chronomex: Do you want to, or shall I? [04:01] Fuck it, I want to do it for once [04:02] WOOT WOOT WOOT OFF TOPIC SIREN [04:03] HONK HONK HONK OFF TOPIC HORN [04:03] WOOF WOOF WOOF FURRY SIREN [04:03] hahahahaha [04:03] that was excellent [04:04] woot mostly means "win" or similar btw [04:06] yeah, I suppose [04:07] I have it aliased to /ots [04:07] woop woop woop off-topic siren [04:07] oh, yeah [04:07] It's woop, not woot [04:07] damn [04:08] Need a new offtopic siren? Why not Zoid-WOOP WOOP WOOP WOOP WOOP WOOP [04:09] Did someone say zoid? Because I think I heard someone say zoid. http://canv.as/ugc/original/e922d098bd6083d2948bb5235203d8eed192b2f1.jpeg [04:30] yeah... the automated downloader scripts on future projects need to have a way to have the tracker tell the downloaders to stop cleanly [04:32] I just killed rsync on batcave [04:32] Because I need to bzip2 stuff before it can go up. [04:32] Let's put it this way. It had 15 simultaneous rsyncs going AND was transferring files AND was doing a massive bzip2 [04:47] ionice [04:48] often makes these things more pleasant [04:52] dumping some 500MB HDDs from a garage sale, what could be on them! [04:56] Setting up an openbsd box to replace my actiontec router [04:56] I'm tired of dealing with it's fucking 1024 entry nat table [04:57] its* [05:03] http://img22.imageshack.us/img22/9867/cableunplugged.jpg [05:09] holy wtf [06:40] bahahah [06:40] https://afaikblog.wordpress.com/2012/02/10/a-new-approach-to-gnome-application-design/ [06:40] I think they're early for april fools day [07:35] username: eigenart [07:45] wonder if that is art made using eigenvalues [11:00] SLAMMING METADATA [11:00] SO BORING [11:01] They see me slammin' [11:01] they snorin' [11:01] .. wut [11:02] http://www.archive.org/details/bbc-rd-reports-1954-21 [11:03] haha, awesome subject [11:03] http://www.archive.org/details/bbc-rd-reports-1954-23 is also good. [11:04] I am adding things quickly, which merely means I'm adding it slowly. [11:04] I'm trying to listen to interviews and podcasts [11:04] I'm listening to a podcast, digitizing two tapes on two laptops [11:04] Uploading friendster to archive.org [11:04] And doing this metadata [11:04] And I feel like I'm behind and moving too slow. [11:04] That's the curse of what I have. [11:04] whoa, I'm not really all that suprised @ BBC Research - it's within their focus/realm of knowledge.. but still, dang [11:06] Yeah [11:07] I'm happy to put it in [11:07] Without this metadata, it's impossible to negotiate this thing. [11:07] Metadata is very important [11:08] http://www.archive.org/details/bbc-rd-reports-1954-28 [11:08] I agree these are all interesting essays [11:08] Hence my wanting them up. [11:10] I can add a new paper every 10 seconds right now. [11:10] By hand. [11:10] But there's 1,338 papers [11:10] Calculate that, if you have a moment. [11:10] I' [11:10] I'm still adding. [11:11] That'll take a while [11:11] Multiply 1338 x 10 seconds. [11:12] If you could [11:13] 223 minutes. You took too long [11:22] yeah, was bashing a systems integrator a little with a colleague [11:30] http://www.archive.org/search.php?query=collection%3Abbc-rd-reports&sort=-publicdate&page=27 looking good! [11:37] Found 1tb of doubled data. [11:37] Always quality. [12:14] This report deals with the statistical analysis of questionnaires in which observers enter their opinions under a series of graded classification, such as "Bad", "Indifferent", "Good". [15:43] always a tricky task [16:41] I've been blowing these R&D metadata sets in, and it feels like I've made all this headway, but it turns out I barely have. [16:41] Even with short work, it's hours left. [16:48] It can't be automated? [16:48] It's about as automated as I can get it. [16:48] Unless I REALLY want to write a rather needlessly complicated thing. [16:49] Which will have little general use. [16:49] I can add a new entry every 20 seconds. [16:49] But at 1,300+ entries to do, that's still a lot of time. [16:50] http://www.archive.org/details/bbc-rd-reports-1957-15 but sexy! [16:53] Motherfuckers wrote a lot of reports. [16:55] haha [17:24] kennethre: Help me understand the system you're using. [17:24] SketchCow: what aspect of it? [17:25] Like, what it is you're using that's then going into the machine. [17:25] Because I think I may need you to set up a machine that I use as an augmenter for batcave/fortress [17:25] Because I don't think the internal archiveteam portion of the infrastructure can ever handle the amount of pain your system is capable of. [17:26] At least jamming you directly into the mobileme section, we can set something up that you see the result [17:26] You were getting a great rate, but it left the system in a gutter wearing its bloody panties as a hat [17:27] haha, it was essentially 300 VPS's running on a large number very very large ec2 boxes w/ a stupid amount of bandwidth [17:27] Right [17:27] I think we need a smartybox on your side that is then getting access to the mobileme subcollection [17:27] each instance was just running the seesaw script [17:27] See if you can get me a smartybox [17:56] SketchCow: I don't follow [18:28] is there a list availble of all the google groups? [18:41] SketchCow: http://video.fosdem.org/ conference videos, not sure if you know those already [18:49] Jesus webm [18:49] :) [18:49] Why not just do it in quickcam and get it over [18:50] http://www.archive.org/details/Fosdem2011Presentations [18:54] lol [18:59] man, I dunno if archive.org has enough space for these here xvids, I mean some of them are like 700mb [19:01] it's like youtube stockholm syndrome [19:08] DoubleJ: Your slot is back [19:15] http://creativecommons.org/weblog/entry/31415 [19:15] worth archiving? [19:17] looking forward to seeing this tomorrow http://video.fosdem.org/2012/lightningtalks/git_annex___manage_files_with_git,_without_checking_their_contents_into_git.webm [19:17] I want to understand your annex thing [19:18] that may help [19:18] should be a good 15 minute into, I hope [19:18] Git-Annex is the solution to all my file management problems [19:18] oh, you use it? [19:19] The archiveteam git-annex needs a porn folder [19:19] I've just been working on scaling git-annex to not leak memory when adding millions of files. [19:20] Extensively. I have one about 3 TB repository of media files with my music, videos and backups in it. Thanks to the numcopies setting I can always sleep well knowing that I won't lose a file [19:20] I'd like to know when that's the case, because I'd like to set one up for our saves. [19:20] soultcer: awesome.. Maybe I forgot, but I don't remember you mentioning you used it before [19:21] I do most of the urlteam stuff in git-annex as well. Various hosts run scrapers, and I simply use git annex get . to fetch all their data ;-) [19:21] well, I think the scalability is fixed. At least, the limiting factor now is git's own memory bloat with a million files [19:21] 2625 joey 20 0 90324 56m 3260 S 0.0 0.7 1:28.94 git-annex [19:21] 2821 joey 20 0 198m 182m 1032 R 100.0 2.3 0:24.16 git [19:21] soultcer: jesus, I had no clue [19:21] I'd tell you how big the repositories are but I did a pacman update today and now git-annex is broken [19:22] ahahah [19:22] new ghc? [19:22] http://vimeo.com/creativecommons classy [19:23] Some major upgrades to various shared libraries, because a lot of other stuff broke as well [19:26] soultcer: he pronounces git wrong, his opinion is irrelivant [19:27] Who pronounces git wrong? [19:27] the guy in that video [19:27] oh closure posed it. [19:27] closure: see above :) [19:27] lol, how does richih say git? [19:27] Like JIT [19:27] yeah [19:27] it's weird [19:27] oh well, I think he's german [19:28] Found a way to slice 20 percent time off the adding of metadata to archive.org. [19:29] Given that git-annex speaks S3, an archiveteam.git that uses the Internet Archive to store the actual contents would work, right? [19:29] git-annex has special archive.org S3 upload support [19:30] I need to think through that. [19:30] but, I'm not 100% happy with it. [19:30] Bear in mind I am really treading water regarding it [19:30] I'll happily play it up, but I understand it only surface-wise. [19:30] At least for read access that would be pretty awesome, though I guess the web special remote would work too [19:30] I only understand git a little [19:31] Currently, each archive.org bucket has to be configured separetly in git-annex [19:31] I've used it, but if I had a ton of collections, it might not work [19:31] http://git-annex.branchable.com/tips/Internet_Archive_via_S3/ [19:33] soultcer: yes, web special remote is fine for stuff already in the IA [19:35] Keeping in mind I am dangerously retarded.... [19:35] sure, let's get back to basics.. git. big files. [19:35] using git-annex basically lets you know when network connected items everywhere are providing resources that you can acquire from any other aspect [19:36] So I go "where the fuck is my traci lords collection" and it goes "Oh, that's on the accounting server at your old work" [19:36] yeah, basically. If you have access to the other remotes. [19:36] Since I checked them in [19:36] if work killed your accout, you're screwed, obviousoly [19:36] (I figured I'd use real-world examples) [19:36] Well, that would be the equivalent of a disk failure [19:36] yep [19:37] So I have traci on the old work server AND on two usb drives at my friend's house [19:37] Do they need to be connected all the time? [19:37] Or do I call dave at 3am telling him to hook the USB to the server in the living room [19:37] And then git-annex goes "ah, there we are" [19:37] nope, things can be disconnected and offline. That's nearly the default state. [19:37] as soon as it can get to the drive, it's good [19:38] And then it rebuilds the traci folder on my mom's fuse-enabled gmail account [19:38] I have a whole shoebox of 1 TB drives that I plug in when git-annex wants data from one of the,m [19:38] That's viciously good [19:39] I have drives I sneakernet around and run git-annex on whatever computer they're plugged into, it keeps track of where everything is [19:39] Two questions, one tech, one pr [19:40] tech: I assume there's some client program running that is querying git-annex, on windows boxes or linux or whatever [19:40] pr: has git-annex had any major announcement or is that a slow burn [19:40] tech: git-annex is a single standalone binary. No server. You just run it [19:41] (I have not ported it to windows though, just unix/linux/freebsd/osx) [19:41] also what happens when you find out traci was 16 in that video and you want to nuke everything, are there logs and filesystem tables everywhere [19:41] DFJustin: yes, the forensics people will be very happy with the available data trail. [19:41] :P [19:42] Dude, I mean, fuck [19:42] Everyone knows, at the end, Traci will always betray you. [19:42] It's the yin to her delicious yang [19:43] pr: Word's been getting out. There was an article in Linux Weekly News http://lwn.net/Articles/418337/ .. I presented git-annex at the Gittogether conference this fall [19:43] I'll make noise next week [19:43] It's not an archiveteam project but I do think it has archiveteam principles [19:43] Saving traci lords from oblivion [19:44] Let's not forget the automatic features. You can specify "keep x copies of all files in this directory" and it will make sure you have enough copies around by a) offering to only copy files that have too few copies and b) not allowing you to drop files with too few copies [19:45] closure: Do you know the story of "no cat" [19:45] also you can have it store stuff encrypted. There are features, yes :) (Might even help with your Traci problem) [19:45] no cat? [19:46] Explaining Wireless Telegraph. [19:46] "You see, wire telegraph is a kind of a very, very long cat. You pull [19:46] his tail in New York and his head is meowing in Los Angeles. Do you [19:46] signals here, they receive them there. The only difference is that [19:46] understand this? And radio operates exactly the same way: you send [19:46] there is no cat." [19:46] aha, yes, heard er [19:46] This is no cat all the way [19:46] fine praise [19:46] Imagine a fileserver that has all your drives connected, keeping track of things....except there is no fileserver. [19:47] That's git-annex [19:47] Right [19:47] I work to make complicated concepts easier, it's all I do all day. [19:47] Credit me with some influence or something, so I can die happy [19:48] "I coded this by the glow of jason's guiding light" [19:49] the light of his towering ire [19:50] srsly, I would certianly not have written the same program if I were not in archiveteam [19:50] (being in a cabin with only dialup helps too) [19:52] taking full credit for inspiration [19:52] * SketchCow poses like george washington crossing the delaware [19:53] SketchCow: stuffs for archiveteam collection http://www.archive.org/details/something-awful-forums-2001 http://www.archive.org/details/konachan-siterip-2009 [19:54] Does git-annex do checksumming on web remotes? [19:54] soultcer: git annex addurl pulls it down and checksums it, yes [19:54] --fast does not [20:00] DFJustin: Both swapped over now [20:07] Sweet, sha256 is now the default backend [20:10] soultcer: I feel a git annex status disksize comparison coming on [20:11] http://pastebin.com/hR87NQge [20:13] I will note that neither the konachan nor SA stuff has been deleted [20:13] so this is more of a just in case kinda thing [20:14] sorry if that was unclear [20:16] closure: http://pastebin.com/PjGs3MN9 [20:16] Though pretty much all with numcopies=3 [20:17] I like how you use Disk: and Host: in your descriptions [20:17] Well it would still be pretty easy, since hosts are named for BSG characters, and disks are named for austrian politicians [20:19] or I should say, the original sites are still up but the packs are gone from megaupload and filesonic (although the kona stuff is still on bittorrent) [20:20] hmm, yeah, it only shows the size of one copy.. perhaps I should make it say known annex size: 2 terabytes (plus 4 terabytes of redundant copies) [20:20] That would be pretty sweet [20:20] Because then I know when to buy a new hdd [20:20] or some such measure. Reundancy: 300% [20:21] oh, I'd have to walk every location log to do it, pretty expensive I think [21:19] kennethre: Do you know how much diskspace a dyno on heroku can use? [21:19] alard: it's whatever [21:19] it needs to be download and upload though, because a dyno can be killed at any time [21:19] for splinder i was rsyncing every 2 minutes in the background [21:19] Yes, but could you download, say, 20GB? [21:20] depends :) [21:20] typically yes [21:20] there's on official limit [21:20] it's not typically an issue [21:20] the host machine can't run out of disk space, obviously [21:20] but i downloaded 2TB overnight [21:21] and some of them were 20GB+ [21:21] So perhaps we could make something that lets you download and make chunks of 20 or 40GB, which you could then upload directly to archive.org. [21:22] 10 files of 20GB would fill an item, and I don't think 10 files is too much. [21:22] i think that'd be pretty prone to failure [21:23] what's wrong with uploading one user at a time? [21:23] The problem seems to be in SketchCow's batcave (or fortress) where you rsync to. [21:24] You rsync to SketchCow, and then it's bundled and uploaded to the archive. [21:24] i don't have to run at that rediculous capaticy [21:24] No, that's true. [21:24] And probably simpler. [21:25] :) [21:25] So SketchCow should find a good rate. [21:26] that'd be ideal for me [21:26] i doubt i could run like that constantly without someone noticing anyway