#archiveteam 2012-07-09,Mon

↑back Search

Time Nickname Message
00:11 🔗 Coderjoe man...
00:12 🔗 Coderjoe "No doubt some will call the closure 'melodramatic' but they have no idea what's involved."
00:12 🔗 shaqfu Coderjoe: ?
00:12 🔗 Coderjoe apparently, he's never heard of having someone he trusts manage the site for awhile, or at least HELP him manage the site
00:13 🔗 Coderjoe and I wonder about the "Tired of dealing with content thieves"
00:15 🔗 Coderjoe if he is talking about image bandwidth thieves (those who post an image on their blog or whatever without rehosting the image), why doesn't he set up a referer check and be done with it?
00:17 🔗 Coderjoe shaqfu: planetphillip.com
00:19 🔗 Coderjoe apparently, something happened today that was the straw that broke the camel's back, and he flipped out and threw his site in the fire.
00:19 🔗 Coderjoe (judging by the content of the closing message)
00:21 🔗 shaqfu Saw it :(
00:21 🔗 Coderjoe nice robots.txt file, too
00:22 🔗 shaqfu Now that's just uncalled for
00:23 🔗 Coderjoe though he changed it when he closed up
00:23 🔗 Coderjoe modified: Sunday, July 08, 2012 5:24:03 AM
00:23 🔗 Swizzle Well that sucks. I was thinking at least some of it is archived, but now that will go dark
00:24 🔗 Swizzle He really care about any of the files or content he has hosted then. If people are not going to appreciate him enough he will just throw it all in the trash
03:27 🔗 DrainLbry love the mobile me is closed sign, swinging in the wind like yeah, fuck you, get out of my store you dirty hippie.
08:05 🔗 omf_ wow that planetphillip guy is a loser just like the above comments imply. For I am the only one to run this site and everyone can kiss my ass
08:13 🔗 omf_ couldn't we find people who had files hosted on that site and then get them to ask the owner for the content?
08:17 🔗 SmileyG tbh he sounds like a dick and is unlikely to comply :S
08:18 🔗 omf_ that is why you get someone who hosted his contetn
08:19 🔗 omf_ maybe we should change 'Dead as a Doornail' to a hall of shame. Why not have a list of sites that are pro-actively destroying history
08:19 🔗 SmileyG Hopefully all the people on the site realise what a dick he is, and upload their content independantly to IA, but its not going to happen :(
08:19 🔗 SmileyG "You cannot help those who will not help themselves."
08:20 🔗 omf_ For those people you threat to beat them with the stick since the carrot is a waste
08:20 🔗 omf_ that is why I am so fired up about doing pre-emtive stuff
08:20 🔗 SmileyG omf_: yeah
08:20 🔗 SmileyG except there is too much stuff :S
08:21 🔗 * SmileyG goes off to do some work which makes him sad :(
08:51 🔗 arkhive1 Can someone check my script for checking for valid, non redirecting links on klik.me then scraping them with wget. They close the 20th of this month.
08:51 🔗 arkhive1 http://pastebin.com/5xXsBJ76
08:53 🔗 arkhive1 and we'll need a tracker to distribute the jobs if we want to get this done on time.
08:53 🔗 arkhive1 Let me know what you guys think. I'm going to bed. nearly 3am here.
09:36 🔗 omf_ arkhive1, I ran the script and it generated 5000 urls and it only starts at http://klik.me/yV5aE
12:15 🔗 edsu where's the best place to add news of a website dying? http://archiveteam.org/index.php?title=Deathwatch ?
12:26 🔗 edsu huh, getting an error when attempting to edit that page
12:26 🔗 edsu Fatal error: Call to undefined method Article::getSection() in /home/archivet/public_html/extensions/recaptcha/ConfirmEdit.php on line 620
12:26 🔗 * edsu shrugs
12:35 🔗 omf_ the best place to report a website closing is this channel
13:04 🔗 alard arkhive (and others): chronomex and I have set up a shared tracker on http://tracker.archiveteam.org/ . We can give you an account if you have a project that needs tracking. It's basically the same tracker that was used for MeMac/Picplz/Tabblo et cetera, but with a web interface to upload and manage tasks.
13:09 🔗 alard That said: it's a nice script you have there, but I don't think this is a feasible way to index klik.me. (We'd have to check ~1500 IDs per second.)
13:31 🔗 edsu omf_: oh well i just added news of kasabi closure to the deathwatch http://archiveteam.org/index.php?title=Deathwatch#Pining_for_the_Fjords_.28Dying.29
16:19 🔗 arkhive1 alard: Ya, that was a concern of mine. Time is what we don't have. I do not know what way we should download klik.me since I can't go the API way.
16:20 🔗 arkhive1 alard: and that would be cool if you could set me up with an account. Does anyone else have any more ideas on how we can backup klik.me ?
16:34 🔗 * Schbirid extracts ~2 years worth of robots.txt grabs
17:06 🔗 alard arkhive1: I sent you some information about the tracker.
17:08 🔗 chronomex plz 2 send me the codes sanjay@hotmail.com
17:36 🔗 SmileyG http://www.ebay.com/itm/BIGGEST-COLLECTION-EVER-22-SEGA-NINTENDO-PC-ENGINE-FULLSETS-FACTORY-SEALED-/300736846867?pt=FR_Jeux_Vid%C3%A9o&hash=item4605501c13#ht_165766wt_1163 <<< instant archive?
17:59 🔗 omf_ wow that is an archive
18:07 🔗 Aranje jesus
18:07 🔗 Aranje insta archive is right
18:42 🔗 benuski I'm trying to upload some newspapers digitized from the microfilm by fultonhistory.com; however, for all of the thousands of pages he has digitized, each page is a separate pdf file. I've tried to put some of them back into individual issues, but have quickly realized that it might take the rest of my life to do so. So my question is, do you all think I can just create one pdf for the entire digitized run of a newspaper he has and upload t
18:42 🔗 benuski Obivously its not ideal, but I'm worried about this website just disappearing
18:44 🔗 ersi "Sorry. This URL has been excluded from the Wayback Machine." :(
18:44 🔗 Schbirid / Provide alternate content for browsers that do not support scripting // or for those that have scripting disabled. Alternate HTML content should be placed here. This content requires the Macromedia Flash Player. Get Flash
18:45 🔗 benuski its a hidious website, by the way
18:45 🔗 benuski O
18:45 🔗 benuski I've sent him an email asking permission to upload his newspapers, but have never head back, so i'm just doing it
18:47 🔗 chronomex yeah just do it
18:47 🔗 chronomex there's a way to concatenate pdf files
18:47 🔗 shaqfu benuski: Is this the site with the flying fish
18:47 🔗 benuski shaqfu: yes it is
18:47 🔗 benuski Yeah, I have been using pdftk to concatenate individual issues, but I'm just going to merge the each title into one big pdf
18:49 🔗 chronomex pdfcat I think
18:49 🔗 chronomex tho iirc it passes through TeX for some reason
18:50 🔗 chronomex if the page is one big image, pdfimages is your man
18:51 🔗 shaqfu Is there some way to OCR each page, check for masthead, and start a new file if there is?
18:52 🔗 chronomex there's probably a way that involves stupid amounts of custom programming
18:52 🔗 shaqfu Essentially pattern-match to "is there a big bar across the top"
18:52 🔗 chronomex I've had good success with shellscripts that display an image and then ask me questions
18:55 🔗 benuski At least with the title I'm doing so far, they're all four pages long so dividing it into issues hasn't been that big of a deal, but this is one of the shorter ones and getting them uploaded is taking awhile
18:55 🔗 benuski Like I said, its not ideal, but at least it gets it out of this terrible website
18:56 🔗 chronomex <3
18:56 🔗 shaqfu Petition Brewster to replace the little Greek temple logo with a bouncing head
18:56 🔗 shaqfu The newspapers will feel right at home
18:58 🔗 benuski I think the "original web crawler" would be a great logo for IA
18:59 🔗 shaqfu Hahahahaha
18:59 🔗 balrog chronomex: pdfunite
19:01 🔗 chronomex ahhh
19:29 🔗 underscor At the moment if you change your password on archive.org you will no longer be able to use the ftp uploader
19:29 🔗 underscor just as an fyi
19:29 🔗 * chronomex changes godane's password
19:29 🔗 underscor :D
19:30 🔗 underscor this also applies if you're a new user
19:30 🔗 underscor just so people know if someone comes crying in here about it
19:30 🔗 underscor :)
19:30 🔗 chronomex oh I get it
19:30 🔗 chronomex now archiveteam is IA's unpaid support team
19:30 🔗 underscor p much
19:30 🔗 underscor well, we're just something like 15% of the data uploaded ytd or something crazy
19:31 🔗 underscor so :P
19:31 🔗 Schbirid still extracting the robots.txt files, jesus
19:31 🔗 Schbirid that avfs mount of forumplanet hates my server
19:32 🔗 Schbirid i must find a better way to host that crap
19:32 🔗 DFJustin it actually amazes me that it's ONLY 15%
19:32 🔗 chronomex likewise
19:32 🔗 underscor DFJustin: Let me get a real number :D
19:35 🔗 underscor query still running
19:35 🔗 underscor >:(
19:36 🔗 Schbirid is archive.org not responding for you guys too?
19:37 🔗 Schbirid just kidding to make underscor soil himself
19:37 🔗 underscor haha
19:37 🔗 Aranje oh dear
19:37 🔗 Schbirid the surprised laugh of shame
19:37 🔗 underscor fortunately, the query runs on a different server
19:37 🔗 underscor and wouldn't affect mainsite
19:37 🔗 underscor :P
19:37 🔗 Schbirid you never know!
19:37 🔗 Aranje HE ALWAYS KNOWS
19:37 🔗 underscor >:D
19:38 🔗 underscor Results (1-25) of 10,957 where: &w_collection=*archiveteam*&w_publicdate=2012*
19:38 🔗 underscor mmmmm
19:38 🔗 underscor size query is still running though
19:43 🔗 DFJustin underscor: also check the scrollback in #archiveteam-bs, godane found some magazine dvds on demonoid that would probably be more efficient to download straight into IA than have one of us reup
20:11 🔗 underscor 293104745259 KB uploaded by archiveteam in 2012
20:11 🔗 underscor 2304321571060 KB uploaded total in 2012
20:12 🔗 underscor So it's actually 12%
20:12 🔗 underscor 293 104 745 259 / 2 304 321 571 060 = 0.127197848
20:12 🔗 chronomex for humans, that's 293T / 2.3P
20:13 🔗 chronomex fun fact, wolframalpha uses appx data content of WBM as of 2006 (2P) as a comparative measure
20:13 🔗 DFJustin 2.3 new petabytes?!
20:14 🔗 underscor DFJustin: mhm!
20:14 🔗 chronomex yea
20:14 🔗 DFJustin jumping jehoshaphat
20:14 🔗 Schbirid does that include the wayback machine?
20:14 🔗 underscor http://home.us.archive.org/~tracey/mrtg/du.html
20:15 🔗 underscor hmmmm
20:15 🔗 underscor that only says 1PB
20:15 🔗 DFJustin I guess tv archive probably takes a lot
20:15 🔗 underscor Oh, right!
20:15 🔗 underscor we added SOLO storage
20:15 🔗 chronomex unpaired?
20:15 🔗 underscor mhm
20:15 🔗 underscor for "less important" stuff
20:15 🔗 Schbirid nighty
20:15 🔗 underscor like web crawls
20:15 🔗 chronomex hmm
20:15 🔗 chronomex so web crawls, the idea is that hopefully it'll be in multiple crawls and so you don't need to pair it?
20:16 🔗 chronomex seems reasonable I guess
20:16 🔗 underscor that and we don't usually lose disks
20:16 🔗 underscor and having two copies is really expensive
20:16 🔗 chronomex aye
20:16 🔗 underscor It's about $2k per TB, per node, over the lifetime of it
20:17 🔗 chronomex you run single disks or raid1 in nodes?
20:17 🔗 underscor (that's the number brewster likes to roughly estimate with)
20:17 🔗 chronomex and I assume paired space is on multiple nodes
20:17 🔗 underscor single disks
20:17 🔗 underscor paired space is two copies of a file
20:17 🔗 chronomex ok
20:17 🔗 underscor file level dupe, instead of block level
20:17 🔗 chronomex right
20:17 🔗 underscor we lost a lot more files the other way
20:17 🔗 underscor (back in like, 1997)
20:17 🔗 chronomex mmmmm
20:18 🔗 underscor http://i.imgur.com/WUP1D.png
20:19 🔗 underscor Better graph, but there's not a public link to it
20:19 🔗 underscor That's free space
20:19 🔗 chronomex that's nice
20:19 🔗 chronomex I mean a neat graph
20:19 🔗 chronomex those are K of terabytes?
20:19 🔗 underscor No, that's free space
20:20 🔗 underscor So it's just T
20:20 🔗 chronomex on the axes it says k
20:20 🔗 underscor kGB
20:20 🔗 chronomex presently there is only 1.2T free at IA?
20:20 🔗 underscor oh, oops
20:21 🔗 underscor I was thinkging you meant 1245 kTB
20:21 🔗 underscor thinking*
20:21 🔗 underscor There are 1,245TB free. 844TB solo, 401TB paired
20:21 🔗 chronomex aye
20:21 🔗 DFJustin so if there's 5417.0 TB now, and 2.3 PB have been added so far in 2012 and we're only halfway through the year, that's quite a growth rate
20:22 🔗 underscor yeah!
20:22 🔗 underscor it's both scary and amazing
20:22 🔗 chronomex I dare you to average it into megabytes/secong
20:27 🔗 underscor http://www.wolframalpha.com/input/?i=%282.25031403E9%29%2F%281.642E7%29
20:27 🔗 underscor 137MB/s average, if my math is right
20:27 🔗 DFJustin how much of that gets backed up to egypt/holland
20:27 🔗 underscor 0
20:28 🔗 underscor It's a long story.
20:28 🔗 underscor But effectively nothing anymore.
20:28 🔗 chronomex that sounds like a leakyscor story
20:28 🔗 underscor leakyscor?
20:29 🔗 underscor :P
20:31 🔗 godane i just got a 3tb hard drive today
20:32 🔗 godane backing up all my dvds later before it 'disappears'
20:32 🔗 godane this is so all the gbtv shows i have will be safer (in theory)
20:33 🔗 godane also does IA have failsafe if a EMP strike happens?
20:33 🔗 Coderjoe unpaired storage? so no redundant copy?
20:33 🔗 underscor godane: No.
20:33 🔗 Coderjoe that scares me
20:33 🔗 godane i don't have that kind of money
20:33 🔗 ersi LOL @ Yahoo having an account at Youtube.. :D "yahooMusicTV"
20:33 🔗 underscor Coderjoe: Me too.
20:34 🔗 underscor But it's not my decision, nor my money.
20:34 🔗 underscor We've been able to drain every drive that's failed so far though
20:34 🔗 underscor Which has been like 6
20:34 🔗 Coderjoe in my experience, drives are less reliable now than they were back in the 90s
20:34 🔗 underscor (We catch them as "failing")
20:35 🔗 underscor Higher data density leading to decreased reliability?
20:36 🔗 godane we need higher data density or you need 2000 cds to backup your hard drive
20:36 🔗 godane where is the dam holographic memory?
20:38 🔗 benuski Make sure that they're "archival" gold cds :D
20:40 🔗 Coderjoe so the scrappers can steal them to try and recover the gold content
20:43 🔗 benuski Come on, the vendor says they could last up to 300 years!
20:45 🔗 underscor lol
20:52 🔗 Aranje And if you don't label it with a sharpie, it might
20:58 🔗 Coderjoe label it with a number in the hub using a sharpie, then catalog the disc elsewhere using that number.
20:59 🔗 Aranje ^
21:00 🔗 Aranje horrified when I learned labeling disks destroys them quickly
21:00 🔗 underscor does it really?
21:00 🔗 underscor :p
21:00 🔗 underscor :o*
21:01 🔗 SketchCow Hi, gang.
21:02 🔗 DFJustin http://boards.straightdope.com/sdmb/showthread.php?t=594687
21:13 🔗 omf_ wow
23:17 🔗 benuski balrog: thanks for the recommendation of pdfunite; way better than pdftk
23:24 🔗 dashcloud got a couple of new shareware collections: LaserMagic's Super Value Pack (251 titles) and RomTech's Galaxy of Tetrimania (part of their Galaxy of Games series apparently)

irclogger-viewer