[00:11] man... [00:12] "No doubt some will call the closure 'melodramatic' but they have no idea what's involved." [00:12] Coderjoe: ? [00:12] apparently, he's never heard of having someone he trusts manage the site for awhile, or at least HELP him manage the site [00:13] and I wonder about the "Tired of dealing with content thieves" [00:15] if he is talking about image bandwidth thieves (those who post an image on their blog or whatever without rehosting the image), why doesn't he set up a referer check and be done with it? [00:17] shaqfu: planetphillip.com [00:19] apparently, something happened today that was the straw that broke the camel's back, and he flipped out and threw his site in the fire. [00:19] (judging by the content of the closing message) [00:21] Saw it :( [00:21] nice robots.txt file, too [00:22] Now that's just uncalled for [00:23] though he changed it when he closed up [00:23] modified: Sunday, July 08, 2012 5:24:03 AM [00:23] Well that sucks. I was thinking at least some of it is archived, but now that will go dark [00:24] He really care about any of the files or content he has hosted then. If people are not going to appreciate him enough he will just throw it all in the trash [03:27] love the mobile me is closed sign, swinging in the wind like yeah, fuck you, get out of my store you dirty hippie. [08:05] wow that planetphillip guy is a loser just like the above comments imply. For I am the only one to run this site and everyone can kiss my ass [08:13] couldn't we find people who had files hosted on that site and then get them to ask the owner for the content? [08:17] tbh he sounds like a dick and is unlikely to comply :S [08:18] that is why you get someone who hosted his contetn [08:19] maybe we should change 'Dead as a Doornail' to a hall of shame. Why not have a list of sites that are pro-actively destroying history [08:19] Hopefully all the people on the site realise what a dick he is, and upload their content independantly to IA, but its not going to happen :( [08:19] "You cannot help those who will not help themselves." [08:20] For those people you threat to beat them with the stick since the carrot is a waste [08:20] that is why I am so fired up about doing pre-emtive stuff [08:20] omf_: yeah [08:20] except there is too much stuff :S [08:21] * SmileyG goes off to do some work which makes him sad :( [08:51] Can someone check my script for checking for valid, non redirecting links on klik.me then scraping them with wget. They close the 20th of this month. [08:51] http://pastebin.com/5xXsBJ76 [08:53] and we'll need a tracker to distribute the jobs if we want to get this done on time. [08:53] Let me know what you guys think. I'm going to bed. nearly 3am here. [09:36] arkhive1, I ran the script and it generated 5000 urls and it only starts at http://klik.me/yV5aE [12:15] where's the best place to add news of a website dying? http://archiveteam.org/index.php?title=Deathwatch ? [12:26] huh, getting an error when attempting to edit that page [12:26] Fatal error: Call to undefined method Article::getSection() in /home/archivet/public_html/extensions/recaptcha/ConfirmEdit.php on line 620 [12:26] * edsu shrugs [12:35] the best place to report a website closing is this channel [13:04] arkhive (and others): chronomex and I have set up a shared tracker on http://tracker.archiveteam.org/ . We can give you an account if you have a project that needs tracking. It's basically the same tracker that was used for MeMac/Picplz/Tabblo et cetera, but with a web interface to upload and manage tasks. [13:09] That said: it's a nice script you have there, but I don't think this is a feasible way to index klik.me. (We'd have to check ~1500 IDs per second.) [13:31] omf_: oh well i just added news of kasabi closure to the deathwatch http://archiveteam.org/index.php?title=Deathwatch#Pining_for_the_Fjords_.28Dying.29 [16:19] alard: Ya, that was a concern of mine. Time is what we don't have. I do not know what way we should download klik.me since I can't go the API way. [16:20] alard: and that would be cool if you could set me up with an account. Does anyone else have any more ideas on how we can backup klik.me ? [16:34] * Schbirid extracts ~2 years worth of robots.txt grabs [17:06] arkhive1: I sent you some information about the tracker. [17:08] plz 2 send me the codes sanjay@hotmail.com [17:36] http://www.ebay.com/itm/BIGGEST-COLLECTION-EVER-22-SEGA-NINTENDO-PC-ENGINE-FULLSETS-FACTORY-SEALED-/300736846867?pt=FR_Jeux_Vid%C3%A9o&hash=item4605501c13#ht_165766wt_1163 <<< instant archive? [17:59] wow that is an archive [18:07] jesus [18:07] insta archive is right [18:42] I'm trying to upload some newspapers digitized from the microfilm by fultonhistory.com; however, for all of the thousands of pages he has digitized, each page is a separate pdf file. I've tried to put some of them back into individual issues, but have quickly realized that it might take the rest of my life to do so. So my question is, do you all think I can just create one pdf for the entire digitized run of a newspaper he has and upload t [18:42] Obivously its not ideal, but I'm worried about this website just disappearing [18:44] "Sorry. This URL has been excluded from the Wayback Machine." :( [18:44] / Provide alternate content for browsers that do not support scripting // or for those that have scripting disabled. Alternate HTML content should be placed here. This content requires the Macromedia Flash Player. Get Flash [18:45] its a hidious website, by the way [18:45] O [18:45] I've sent him an email asking permission to upload his newspapers, but have never head back, so i'm just doing it [18:47] yeah just do it [18:47] there's a way to concatenate pdf files [18:47] benuski: Is this the site with the flying fish [18:47] shaqfu: yes it is [18:47] Yeah, I have been using pdftk to concatenate individual issues, but I'm just going to merge the each title into one big pdf [18:49] pdfcat I think [18:49] tho iirc it passes through TeX for some reason [18:50] if the page is one big image, pdfimages is your man [18:51] Is there some way to OCR each page, check for masthead, and start a new file if there is? [18:52] there's probably a way that involves stupid amounts of custom programming [18:52] Essentially pattern-match to "is there a big bar across the top" [18:52] I've had good success with shellscripts that display an image and then ask me questions [18:55] At least with the title I'm doing so far, they're all four pages long so dividing it into issues hasn't been that big of a deal, but this is one of the shorter ones and getting them uploaded is taking awhile [18:55] Like I said, its not ideal, but at least it gets it out of this terrible website [18:56] <3 [18:56] Petition Brewster to replace the little Greek temple logo with a bouncing head [18:56] The newspapers will feel right at home [18:58] I think the "original web crawler" would be a great logo for IA [18:59] Hahahahaha [18:59] chronomex: pdfunite [19:01] ahhh [19:29] At the moment if you change your password on archive.org you will no longer be able to use the ftp uploader [19:29] just as an fyi [19:29] * chronomex changes godane's password [19:29] :D [19:30] this also applies if you're a new user [19:30] just so people know if someone comes crying in here about it [19:30] :) [19:30] oh I get it [19:30] now archiveteam is IA's unpaid support team [19:30] p much [19:30] well, we're just something like 15% of the data uploaded ytd or something crazy [19:31] so :P [19:31] still extracting the robots.txt files, jesus [19:31] that avfs mount of forumplanet hates my server [19:32] i must find a better way to host that crap [19:32] it actually amazes me that it's ONLY 15% [19:32] likewise [19:32] DFJustin: Let me get a real number :D [19:35] query still running [19:35] >:( [19:36] is archive.org not responding for you guys too? [19:37] just kidding to make underscor soil himself [19:37] haha [19:37] oh dear [19:37] the surprised laugh of shame [19:37] fortunately, the query runs on a different server [19:37] and wouldn't affect mainsite [19:37] :P [19:37] you never know! [19:37] HE ALWAYS KNOWS [19:37] >:D [19:38] Results (1-25) of 10,957 where: &w_collection=*archiveteam*&w_publicdate=2012* [19:38] mmmmm [19:38] size query is still running though [19:43] underscor: also check the scrollback in #archiveteam-bs, godane found some magazine dvds on demonoid that would probably be more efficient to download straight into IA than have one of us reup [20:11] 293104745259 KB uploaded by archiveteam in 2012 [20:11] 2304321571060 KB uploaded total in 2012 [20:12] So it's actually 12% [20:12] 293 104 745 259 / 2 304 321 571 060 = 0.127197848 [20:12] for humans, that's 293T / 2.3P [20:13] fun fact, wolframalpha uses appx data content of WBM as of 2006 (2P) as a comparative measure [20:13] 2.3 new petabytes?! [20:14] DFJustin: mhm! [20:14] yea [20:14] jumping jehoshaphat [20:14] does that include the wayback machine? [20:14] http://home.us.archive.org/~tracey/mrtg/du.html [20:15] hmmmm [20:15] that only says 1PB [20:15] I guess tv archive probably takes a lot [20:15] Oh, right! [20:15] we added SOLO storage [20:15] unpaired? [20:15] mhm [20:15] for "less important" stuff [20:15] nighty [20:15] like web crawls [20:15] hmm [20:15] so web crawls, the idea is that hopefully it'll be in multiple crawls and so you don't need to pair it? [20:16] seems reasonable I guess [20:16] that and we don't usually lose disks [20:16] and having two copies is really expensive [20:16] aye [20:16] It's about $2k per TB, per node, over the lifetime of it [20:17] you run single disks or raid1 in nodes? [20:17] (that's the number brewster likes to roughly estimate with) [20:17] and I assume paired space is on multiple nodes [20:17] single disks [20:17] paired space is two copies of a file [20:17] ok [20:17] file level dupe, instead of block level [20:17] right [20:17] we lost a lot more files the other way [20:17] (back in like, 1997) [20:17] mmmmm [20:18] http://i.imgur.com/WUP1D.png [20:19] Better graph, but there's not a public link to it [20:19] That's free space [20:19] that's nice [20:19] I mean a neat graph [20:19] those are K of terabytes? [20:19] No, that's free space [20:20] So it's just T [20:20] on the axes it says k [20:20] kGB [20:20] presently there is only 1.2T free at IA? [20:20] oh, oops [20:21] I was thinkging you meant 1245 kTB [20:21] thinking* [20:21] There are 1,245TB free. 844TB solo, 401TB paired [20:21] aye [20:21] so if there's 5417.0 TB now, and 2.3 PB have been added so far in 2012 and we're only halfway through the year, that's quite a growth rate [20:22] yeah! [20:22] it's both scary and amazing [20:22] I dare you to average it into megabytes/secong [20:27] http://www.wolframalpha.com/input/?i=%282.25031403E9%29%2F%281.642E7%29 [20:27] 137MB/s average, if my math is right [20:27] how much of that gets backed up to egypt/holland [20:27] 0 [20:28] It's a long story. [20:28] But effectively nothing anymore. [20:28] that sounds like a leakyscor story [20:28] leakyscor? [20:29] :P [20:31] i just got a 3tb hard drive today [20:32] backing up all my dvds later before it 'disappears' [20:32] this is so all the gbtv shows i have will be safer (in theory) [20:33] also does IA have failsafe if a EMP strike happens? [20:33] unpaired storage? so no redundant copy? [20:33] godane: No. [20:33] that scares me [20:33] i don't have that kind of money [20:33] LOL @ Yahoo having an account at Youtube.. :D "yahooMusicTV" [20:33] Coderjoe: Me too. [20:34] But it's not my decision, nor my money. [20:34] We've been able to drain every drive that's failed so far though [20:34] Which has been like 6 [20:34] in my experience, drives are less reliable now than they were back in the 90s [20:34] (We catch them as "failing") [20:35] Higher data density leading to decreased reliability? [20:36] we need higher data density or you need 2000 cds to backup your hard drive [20:36] where is the dam holographic memory? [20:38] Make sure that they're "archival" gold cds :D [20:40] so the scrappers can steal them to try and recover the gold content [20:43] Come on, the vendor says they could last up to 300 years! [20:45] lol [20:52] And if you don't label it with a sharpie, it might [20:58] label it with a number in the hub using a sharpie, then catalog the disc elsewhere using that number. [20:59] ^ [21:00] horrified when I learned labeling disks destroys them quickly [21:00] does it really? [21:00] :p [21:00] :o* [21:01] Hi, gang. [21:02] http://boards.straightdope.com/sdmb/showthread.php?t=594687 [21:13] wow [23:17] balrog: thanks for the recommendation of pdfunite; way better than pdftk [23:24] got a couple of new shareware collections: LaserMagic's Super Value Pack (251 titles) and RomTech's Galaxy of Tetrimania (part of their Galaxy of Games series apparently)