[00:13] So, the amount of support scripts I've written to ingest Hiphop Mixtapes cleanly and dependably from 18,000 torrent files has crossed a line into "I feel bad getting even slightly paid for this." [00:14] Mostly, for those who want the post-mortem, it's because I was a little dumb and I used a second, easier source of mixtapes, and the second source KIND of sucks ass. [00:14] It works when it works and does not when it doesn't. [00:14] So I had to write something more resilient for source 1 (and I did) [00:15] But now it has to go "1. Did I already add this. If so, delete. 2. Is it uploaded from the other place? Set aside to merge them. 3. Send off to the outbox to be uploaded." [00:53] *** ndizzle has quit IRC (Read error: Connection reset by peer) [01:15] *** ndiddy has joined #archiveteam-bs [01:35] *** VADemon has joined #archiveteam-bs [02:03] *** atrocity has quit IRC (Read error: Operation timed out) [03:23] *** VADemon has quit IRC (Quit: left4dead) [03:32] *** JesseW has joined #archiveteam-bs [04:36] *** bsmith093 has quit IRC (Ping timeout: 244 seconds) [04:40] *** Sk1d has quit IRC (Ping timeout: 194 seconds) [04:46] *** BlueMaxim has joined #archiveteam-bs [04:46] *** Sk1d has joined #archiveteam-bs [04:52] *** BlueMaxim has quit IRC (Quit: Leaving) [04:55] *** BlueMaxim has joined #archiveteam-bs [05:17] *** JesseW has quit IRC (Ping timeout: 370 seconds) [05:25] *** JesseW has joined #archiveteam-bs [05:40] *** bsmith093 has joined #archiveteam-bs [05:42] *** Honno has joined #archiveteam-bs [06:12] *** vitzli has joined #archiveteam-bs [06:42] *** JesseW has quit IRC (Ping timeout: 370 seconds) [07:20] *** metalcamp has joined #archiveteam-bs [08:01] *** schbirid has joined #archiveteam-bs [08:08] *** ndiddy has quit IRC (Read error: Operation timed out) [08:47] *** bsmith093 has quit IRC (Ping timeout: 499 seconds) [08:59] *** bsmith093 has joined #archiveteam-bs [09:08] *** dashcloud has quit IRC (Read error: Operation timed out) [09:11] *** dashcloud has joined #archiveteam-bs [09:47] *** bwn has quit IRC (Read error: Operation timed out) [09:56] *** marvinw has quit IRC (Quit: Leaving) [10:22] *** marvinw has joined #archiveteam-bs [10:43] *** marvinw has quit IRC (Quit: Leaving) [10:46] *** marvinw has joined #archiveteam-bs [11:44] *** bwn has joined #archiveteam-bs [12:42] *** toad2 has quit IRC (Read error: Operation timed out) [12:43] *** toad1 has joined #archiveteam-bs [12:44] *** BlueMaxim has quit IRC (Quit: Leaving) [12:49] "Stop running local news index web pages, offering instead an open stream on the rolling Local Live service" fucking hell. What the heck BBC [12:54] Wha [12:55] What does that mean [12:57] it means that bbc said, well no you guys dont need your own website, we can do that for you and make sure WE decide what kind of news you see and delete it after X days [12:57] because storage. [12:57] I see [13:15] we should start a lot of BBC crawls [13:22] *** atrocity has joined #archiveteam-bs [13:27] *** dashcloud has quit IRC (Read error: Operation timed out) [13:30] *** dashcloud has joined #archiveteam-bs [14:44] just have one long continuous BBC crawl [14:47] *** VADemon has joined #archiveteam-bs [14:47] https://www.backblaze.com/blog/hard-drive-reliability-stats-q1-2016/ [15:09] *** JesseW has joined #archiveteam-bs [15:33] Sorry to say, but these backblaze Drivestats are nice to have but produce very little actual Information, since their approach is using consumer-grade Drives for datacenter operation. These Drives were never developed for runnin 24/7 and are not designed to run at the temperature and Vibrations, that occour in that specific usage. [15:35] *** dashcloud has quit IRC (Read error: Operation timed out) [15:36] As an ex-Datacenter-Op, these rates are way higher, than what we observed. We had many WDs, with an annual failure rate of <0,5% and we had way more load on them than backblaze(write once, read once in a while) [15:37] *** JesseW has quit IRC (Ping timeout: 370 seconds) [15:39] *** dashcloud has joined #archiveteam-bs [16:24] *** schbirid has quit IRC (Ping timeout: 258 seconds) [16:50] SketchCow, not every experiment pans out. [17:01] Medowar: were you running enterprise drives? [17:14] oh interesting http://archive.org/download/liveweb-20160509032438 [17:14] er wait never mind [17:15] that's been listable for a long time, I thought the download status changed [17:16] *** anomie_ has quit IRC (Read error: Operation timed out) [17:24] will, "I'm not Edge, I promise!" [17:24] haha [17:24] re: BBC recipes, can't simply iterate over the ID's [17:30] I sent the website an e-mail. They at least list that on thioer contact page. [17:31] user-agents form so little of an HTTP request that at this point you might as well use it as a place to have some fun and acknowledge lineage [17:31] The verboten string was "Lynx/2.8.8dev.12 libwww-FM/2.14 SSL-MM/1.4.1 GNUTLS/2.12.18" [17:33] * phillipsj tries the Edge string [17:35] ...it worked!. I made sure to accept the cookie so they know I changes user-agent strings. [17:39] www.amazon.com/Avoid-Huge-Ships-John-Trimmer/dp/0870334336/ is incredible on several levels [17:39] I don't know if "machine learning" is an artistic medium but oh my god it should be [17:46] is that page archived? (I presume, but good to check). The reviews are delightful. [17:47] it is now [17:47] hopefully [17:48] *** JW_work has quit IRC (Quit: Leaving.) [17:49] "There is one major oversight in this generally well-written book, and that is that it addresses animate readers exclusively. As a large rock in the Tyrrhenian Sea off the coast of Giglio Island, I have recently been confronted with instances in which avoiding huge ships was of fundamental interest to my personal well-being. However, the methods presented in Capt. Trimmer's book were none too useful in my efforts to avoid huge [17:49] ships, as I was recently struck by a very large ship indeed, a cruise vessel called the 'Costa Concordia'. ..." [17:50] heh yeah [17:58] *** vitzli has quit IRC (Quit: Leaving) [18:05] *** JW_work has joined #archiveteam-bs [18:06] The problem with recommendations as an art-form is that they are personalized. No idea why I get "Amazon.com Gift Card in a Greeting Card (Various Designs), Images You Should Not Masturbate To, Hutzler 571 Banana Slicer, Amazon.com Gift Card in Gift Box Reveal (Classic Black Card Design)" [18:07] it's not too bad; there are many media that are individualized to the viewer [18:08] there is also the opportunity to establish a single point of reference with a private browsing session at a known location etc [18:08] FWIW, I get the same recommendations [18:08] I used lynx with no cookies set. [18:08] I guess this is an outgrowth of those multimedia installations [18:29] User-agent strings should not be used for detections anyway. Browsers advertise what they can accept. The user Agent string is for troubleshooting if a client is acting weird. [19:05] http://www.bbc.com/news/uk-36308976 [19:05] not sure if anybody saw that yet [19:22] *** closure has joined #archiveteam-bs [19:49] Kazzy: Yes, everything from blue, red, black and gold. Depending on the use case [19:57] *** ndiddy has joined #archiveteam-bs [20:27] I'd definitely like to see a BBC Food backup. [20:27] The archive might be potentially tasty. [20:42] *** ItsYoda has joined #archiveteam-bs [20:47] *** Stiletto has quit IRC () [21:00] *** Honno has quit IRC (Read error: Operation timed out) [21:05] every day an adventure with python https://gitlab.peach-bun.com/snippets/13 [21:12] Medowar: Backblaze's use of consumer drives is intentional [21:12] they produce a lot of information, just about a different class of drives than you are using [21:13] specifically, their reason for using consumer drives is that the TCO is lower - higher failure rate but lower drive cost and higher availability of fresh drives [21:13] so they build a fault-tolerant infrastructure instead [21:14] and just replace much cheaper drives a little more often [21:14] (this is documented in one of their earlier blog posts, iirc) [21:14] that's my reason for using consumer drives [21:14] also so I can occasionally act indignant over getting a DOA drive in a batch of four [21:14] maybe that's just newegg [21:16] joepie91: I know, still consumer drives arent made for the physical tasks of datacenter operation and running 24/7. [21:17] Medowar: I'm sure. does it matter? [21:17] So Backblaze is not actually producing any valuable information. [21:17] ... yes, they are., [21:17] they just aren't producing the specific information that -you- want. [21:17] their data does not become any less correct or valuable. [21:18] then for who exactly is this valuable? [21:18] for people who want to know the reliability of consumer drives in high-stress environments like datacenters...? [21:22] These use-cases are still unrealistic. Normal operation of a consumer-grade drive are many spinups and downs and backblaze has basically none of them. So they are just testing a subset of harddrive-life and the portion, that does not produce the most stress on drives. [21:23] so i lost my cat to a coyote [21:23] here is a picture of him: https://scontent-lga3-1.xx.fbcdn.net/t31.0-8/q81/s960x960/13217022_10204588349268713_1457902763586552308_o.jpg [21:24] *** dashcloud has quit IRC (Read error: Operation timed out) [21:24] we called him Romeo [21:25] godane: poor kitty [21:25] Medowar: The data is useless to you then, but that doesn't mean it's useless to everyone [21:26] godane: sorry to hear that. https://archive.is/20160517212505/https://scontent-lga3-1.xx.fbcdn.net/t31.0-8/q81/s960x960/13217022_10204588349268713_1457902763586552308_o.jpg [21:26] godane: aw :/ [21:27] **man hugs** godane - were here for you [21:27] *** dashcloud has joined #archiveteam-bs [21:28] yeah :) [21:29] based on what we can tell it was quick [21:30] how old was him? [21:30] godane: :'( [21:30] also copied to http://imgur.com/toKufwb [21:31] 15 years old [21:31] https://scontent-lga3-1.xx.fbcdn.net/v/t1.0-9/13263951_10204588442711049_8140892414389226821_n.jpg?oh=8a8ce4dd363c59e012a47bb14340077f&oe=579F4A16 [21:31] another picture [21:31] awww :-/ [21:36] that's sooo cute [21:40] *** Stiletto has joined #archiveteam-bs [21:51] :< godane that makes me sad [22:33] *** bwn has quit IRC (Read error: Operation timed out) [22:34] *** metalcamp has quit IRC (Ping timeout: 244 seconds) [22:43] *** bwn has joined #archiveteam-bs [22:48] *** tomwsmf-a has joined #archiveteam-bs [22:59] *** w0rp has quit IRC (Read error: Operation timed out) [23:00] *** w0rp has joined #archiveteam-bs [23:21] https://github.com/lintool/warcbase and https://github.com/helgeho/ArchiveSpark should be added to http://archiveteam.org/index.php?title=The_WARC_Ecosystem [23:23] *** BlueMaxim has joined #archiveteam-bs [23:55] *** JordanJ2 has joined #archiveteam-bs