[00:00] *** odemg has quit IRC (Ping timeout: 265 seconds) [00:03] *** odemg has joined #archiveteam-bs [00:19] *** enowaldo_ has quit IRC (Read error: Operation timed out) [00:52] *** achip has quit IRC (Ping timeout: 255 seconds) [00:57] *** achip has joined #archiveteam-bs [00:58] *** w00dsman has quit IRC (Leaving) [01:34] *** HashbangI has quit IRC (Remote host closed the connection) [01:35] *** enowaldo has joined #archiveteam-bs [01:43] *** enowaldo has quit IRC (Ping timeout: 492 seconds) [01:47] *** jeekl has joined #archiveteam-bs [01:49] *** zino has joined #archiveteam-bs [01:49] *** HashbangI has joined #archiveteam-bs [02:08] *** odemg has quit IRC (Ping timeout: 265 seconds) [02:08] *** xeam has joined #archiveteam-bs [02:09] *** odemg has joined #archiveteam-bs [02:12] *** xeam has left [03:10] *** w00dsman has joined #archiveteam-bs [03:17] *** qw3rty112 has joined #archiveteam-bs [03:22] *** qw3rty111 has quit IRC (Read error: Operation timed out) [03:41] *** odemgi_ has joined #archiveteam-bs [03:43] *** odemgi has quit IRC (Read error: Operation timed out) [03:43] *** odemg has quit IRC (Ping timeout: 265 seconds) [03:43] *** enowaldo has joined #archiveteam-bs [03:44] *** bobmcjr has joined #archiveteam-bs [03:46] This is probably worth scraping given this notice: https://assemblergames.com/threads/this-forum-to-close-in-30-days.71032/ [03:48] Already in progress. [03:51] *** enowaldo has quit IRC (Read error: Operation timed out) [03:52] Cool [03:55] *** odemg has joined #archiveteam-bs [04:27] What am I supposed to do with warcs again? I have a currently dead forum I scraped a few months ago (minus stylesheets and a few icons, sorry). [04:28] Upload them to archive.org [04:29] They'll go into the warczone collection [04:29] Alright. Webrecorder Player can't see any URLs in this warc for whatever reason. The content is there, and the format appears fine with a quick look in vim. [04:37] SketchCow: so finally something interesting in my search for japanese magazines [04:37] i found scans someone had put up on mega.nz [04:37] so i'm grabbing that [04:38] its about 10gb+ from what i can tell [04:38] *** enowaldo has joined #archiveteam-bs [04:46] *** Coderjo has quit IRC (Quit: new kernel) [04:47] *** enowaldo has quit IRC (Ping timeout: 492 seconds) [04:49] SketchCow: so i found out Kevin Savetz uploaded 3 ERIC items [04:51] i'm going to have go after that id range again cause i have not touch since christmas 2014 : [04:51] https://archive.org/details/ERIC_ED284545 [04:51] one of savetz files : https://archive.org/details/ERIC_ED284540 [05:13] ok then looks like savetz got a copy of that id from somewhere else [05:14] cause ED284540 doesn't have a url on page and this url is 404: [05:14] https://files.eric.ed.gov/fulltext/ED284540.pdf [05:41] *** wyatt8740 has joined #archiveteam-bs [05:46] checking AT&T Tech Channel and there Family Affair video is block worldwide https://polsy.org.uk/stuff/ytrestrict.cgi?ytid=H7BiihzcxkQ [05:47] by MPI Media [05:48] hmmmmmmmmmmmm [05:48] *** bobmcjr has quit IRC (Read error: Operation timed out) [06:09] *** Zerote has joined #archiveteam-bs [06:22] *** c4rc4s has quit IRC (Ping timeout: 246 seconds) [06:22] *** c4rc4s has joined #archiveteam-bs [06:30] *** fuzzy8021 has quit IRC (Read error: Connection reset by peer) [06:31] *** fuzzy8021 has joined #archiveteam-bs [06:34] *** Coderjo has joined #archiveteam-bs [06:39] *** wyatt8740 has quit IRC (Read error: Operation timed out) [06:39] *** enowaldo has joined #archiveteam-bs [06:52] *** enowaldo has quit IRC (Read error: Operation timed out) [07:16] *** fuzy802 has joined #archiveteam-bs [07:21] *** Zerote has quit IRC (Ping timeout: 252 seconds) [07:21] *** fuzzy8021 has quit IRC (Ping timeout: 615 seconds) [07:26] *** fuzy802 is now known as fuzzy8021 [07:28] SketchCow: this may interest you : http://www.queenzone.com/forums/1449503/complete-list-of-documentaries-1979-2018-updated.aspx [07:28] tons of queen documentary [07:29] now i found this also : https://purplehippies.com/ [07:40] *** Zerote has joined #archiveteam-bs [08:00] *** enowaldo has joined #archiveteam-bs [08:05] *** enowaldo has quit IRC (Ping timeout: 252 seconds) [08:19] *** m007a83 has quit IRC (Ping timeout: 252 seconds) [08:50] *** m007a83 has joined #archiveteam-bs [08:54] *** BlueMax has quit IRC (Read error: Connection reset by peer) [09:59] *** w00dsman has quit IRC (Remote host closed the connection) [10:01] *** enowaldo has joined #archiveteam-bs [10:15] *** enowaldo has quit IRC (Read error: Operation timed out) [10:34] *** enowaldo has joined #archiveteam-bs [10:53] *** enowaldo has quit IRC (Read error: Operation timed out) [11:09] *** wp494 has quit IRC (Ping timeout: 268 seconds) [11:10] *** wp494 has joined #archiveteam-bs [11:26] *** enowaldo has joined #archiveteam-bs [11:29] *** kiskabak has quit IRC (Ping timeout: 265 seconds) [11:34] *** enowaldo has quit IRC (Ping timeout: 252 seconds) [11:54] *** enowaldo has joined #archiveteam-bs [12:43] *** icedice has joined #archiveteam-bs [13:06] *** terry has joined #archiveteam-bs [13:08] *** terry is now known as GLaDOS [13:57] *** deevious has quit IRC (Quit: deevious) [14:02] *** deevious has joined #archiveteam-bs [15:05] *** Zerote has quit IRC (Ping timeout: 252 seconds) [15:25] *** Zerote has joined #archiveteam-bs [15:59] *** w00dsman has joined #archiveteam-bs [16:12] *** icedice has quit IRC (Ping timeout: 252 seconds) [16:15] *** w00dsman has quit IRC (Read error: Operation timed out) [16:30] *** w00dsman has joined #archiveteam-bs [16:42] *** anarcat has joined #archiveteam-bs [16:42] hello window 51 [16:43] i'll have an estimate of the dataset size of cdn.media.ccc.de in ~8h [16:46] Sweet [16:47] *** icedice has joined #archiveteam-bs [16:52] *** Dj-Wawa has quit IRC (Quit: Connection closed for inactivity) [17:01] anarcat: if my maths are right, ~2 hours. [17:02] *** astrid has quit IRC (Read error: Operation timed out) [17:04] *** Somebody2 has quit IRC (Read error: Operation timed out) [17:04] *** MrRadar_ has quit IRC (Read error: Operation timed out) [17:04] *** swebb has quit IRC (Read error: Operation timed out) [17:04] *** yipdw has quit IRC (Read error: Operation timed out) [17:04] *** me has quit IRC (Read error: Operation timed out) [17:04] *** phirephl- has quit IRC (Read error: Operation timed out) [17:04] *** jrwr has quit IRC (Read error: Operation timed out) [17:05] *** superkuh has quit IRC (Read error: Operation timed out) [17:05] *** erin has quit IRC (Write error: Broken pipe) [17:05] *** balrog_ has joined #archiveteam-bs [17:05] *** chazchaz_ has quit IRC (Read error: Operation timed out) [17:06] *** RichardG has quit IRC (Ping timeout: 360 seconds) [17:06] *** RichardG has joined #archiveteam-bs [17:06] *** swebb has joined #archiveteam-bs [17:06] *** zino has quit IRC (Ping timeout: 360 seconds) [17:07] *** Pixi` has joined #archiveteam-bs [17:07] *** Pixi has quit IRC (Read error: Operation timed out) [17:07] *** Darkstar has quit IRC (Read error: Operation timed out) [17:07] *** chfoo has quit IRC (Ping timeout: 360 seconds) [17:08] *** nightpool has quit IRC (Ping timeout: 360 seconds) [17:08] *** Fionera_ has joined #archiveteam-bs [17:09] *** unlobito has quit IRC (Read error: Operation timed out) [17:09] *** phirephly has joined #archiveteam-bs [17:09] *** superkuh has joined #archiveteam-bs [17:09] *** unlobito has joined #archiveteam-bs [17:09] *** godane has quit IRC (Ping timeout: 360 seconds) [17:09] *** twigfoot has quit IRC (Ping timeout: 360 seconds) [17:09] *** Darkstar has joined #archiveteam-bs [17:10] anarcat: cdn.media.ccc.de for? [17:10] *** GLaDOS has quit IRC (Read error: Operation timed out) [17:10] my mirror password might still work [17:10] can just rsync everything off of it [17:11] i think around 1.5tb is what it was last time i had my mirror up [17:11] *** chfoo has joined #archiveteam-bs [17:12] yup, my password is still active [17:13] *** schbirid has quit IRC (Read error: Operation timed out) [17:13] *** nightpool has joined #archiveteam-bs [17:14] *** balrog has quit IRC (Read error: Operation timed out) [17:14] *** balrog_ is now known as balrog [17:17] *** chirlu` has quit IRC (Read error: Operation timed out) [17:17] *** Fionera has quit IRC (Read error: Operation timed out) [17:17] *** twigfoot has joined #archiveteam-bs [17:20] *** zino has joined #archiveteam-bs [17:21] *** godane has joined #archiveteam-bs [17:21] If that's all it is we can just chop it up into a bunch of -ao jobs [17:21] and fire them at AB [17:22] Please no. [17:22] https://pastebin.com/UGyKqJdR [17:22] Anyone seen this kind of SSL error before? [17:23] WARNING ImportError: /tmp/_MEIK5xSzV/libssl.so.1.0.0: version `OPENSSL_1.0.2' not found (required by /usr/lib/python3.5/lib-dynload/_ssl.cpython-35m-x86_64-linux-gnu.so) [17:23] *** schbirid has joined #archiveteam-bs [17:23] Ubuntu 16.04 - wpull 2.0.1 & 1.2.3 - youtube-dl is what throws it [17:23] "Your paste has triggered our automatic SPAM detection filter." [17:23] Fixed. [17:23] *** bobmcjr has joined #archiveteam-bs [17:25] Fusl: So the thing is, an rsync mirror or similar would definitely be great if we want it as IA items. For the WBM though, we need to retrieve it over HTTP. [17:25] So yeah, the question is where we want to put it and whether we want links in the WBM to work. [17:25] total size is 8,453,818,507,398 [17:25] 8.5tb [17:25] cdn.media.ccc.de URLs came up repeatedly in AB jobs, so there's clearly a good number of links out there. [17:26] anarcat: ^ I guess you can stop your script. [17:26] also, cdn urls cant be really grabbed [17:26] the cdn itself is a redirector to other domains [17:27] https://cdn.media.ccc.de/congress/2016/webm-hd/33c3-8429-eng-deu-fra-33C3_Opening_Ceremony_webm-hd.webm?mirrorlist [17:27] check this [17:27] *** chirlu has joined #archiveteam-bs [17:27] it does a 302 redirect: Location: https://ftp.halifax.rwth-aachen.de/ccc/congress/2016/webm-hd/33c3-8429-eng-deu-fra-33C3_Opening_Ceremony_webm-hd.webm [17:27] That's fine. [17:27] ic [17:27] We'd just grab both the redirect and whatever it points to. [17:28] Where that's stored exactly doesn't matter for the WBM. [17:28] The CDN link would still work. [17:28] well here's the full rsync file list: http://xor.meo.ws/TRBVQ6SkRNtSnqDJC6nyKeOrr6Uo1bAP/ccc.txt [17:29] and here: https://cdn.media.ccc.de/INDEX [17:35] *** erin has joined #archiveteam-bs [17:36] *** me has joined #archiveteam-bs [17:37] *** jrwr has joined #archiveteam-bs [17:37] *** Fusl sets mode: +o jrwr [17:39] *** astrid has joined #archiveteam-bs [17:39] *** Fusl sets mode: +o astrid [17:40] *** MrRadar has joined #archiveteam-bs [17:41] *** chazchaz has joined #archiveteam-bs [17:41] *** Somebody2 has joined #archiveteam-bs [17:41] *** svchfoo1 sets mode: +o Somebody2 [17:41] *** svchfoo3 sets mode: +o Somebody2 [17:46] Hi Jason. I found out today that you blocked my bot (@shwayest) which allows people to tweet anonymously from IRC. I am not going to try and convince you to unblock it or anything like that as I respect your decision, however I'm just curious as to what caused you to block it? I am busy adding more features and like to gather data so I can improve existing ones, such as the anti-abuse stuff. [17:46] The fuck is this [17:47] https://twitter.com/shwayest ? [17:47] ayeah [17:48] The tweets there make me want to block it as well, and I don't even use Twitter. [17:48] is IA choking on its search index or something [17:48] "what caused you to block it" - because what you're doing is a bad idea? [17:48] search is not working and vhsvault is empty [17:49] godane: archive seems b0rked right now, website was fully down a few minutes ago [17:49] https://twitter.com/search?q=from%3Ashwayest%20to%3Atextfiles&src=typd [17:49] OK, I see now [17:49] ok [17:50] When he said "from IRC" I assumed he meant from here [17:50] But he means probably some ridiculous channel somewhere [17:50] Yeah [17:50] And he was tweeting at me to tell me about ASSembler [17:50] And that's how he found out I was blocked [17:50] http://www.megachan.net/proxy-tweets/ [17:50] *** yipdw has joined #archiveteam-bs [17:51] "I fully expect the account to be banned soon due to shitposts (shitweets?) from the IRC users" [17:51] I like that he both acknowledges that it can't ever not be a vector for abuse, but also is saaaaaaaaaaaaaaad I blocked that shit [17:51] you are expecting the worst, but you are complaining about the best outcome you had so far? [17:51] SketchCow: yeah [17:51] :D [17:51] Anyway, sorry to distract, where am I [17:51] literally just my words [17:51] hahaha [17:57] looks like FOS maybe down too [17:58] Yep, ok wasn't my vm not connecting to rsync then [18:00] Error: Error: connect EHOSTUNREACH 208.70.31.102:21 [18:00] Thu May 30 2019 19:43:08 GMT+0200 (Central European Summer Time) [18:00] yeah [18:19] The entire datacenter is down. [18:19] Fiber upgrade [18:20] Was supposed to be an hour, but it's expanding, of course [18:20] lol nice [18:20] feels more like a downgrade if you ask me :P [18:37] *** GLaDOS has joined #archiveteam-bs [18:46] FOS is back. [19:13] *** killsushi has joined #archiveteam-bs [19:22] *** Despatche has quit IRC (Read error: Operation timed out) [19:37] *** icedice2 has joined #archiveteam-bs [19:40] *** icedice has quit IRC (Ping timeout: 252 seconds) [19:47] *** icedice2 has quit IRC (Quit: Leaving) [19:48] *** icedice has joined #archiveteam-bs [20:10] *** Despatche has joined #archiveteam-bs [20:25] SketchCow: i'm starting to upload some vhs tape rips i have done 2 weeks ago [20:26] these are Readers Digest tapes on Grand Canyon, Yellowstone, and Yosemite from 1988 [20:30] *** thuban4 has joined #archiveteam-bs [20:32] *** w00dsman has quit IRC (Leaving) [20:32] *** enowaldo has quit IRC (Ping timeout: 268 seconds) [20:51] *** thuban has joined #archiveteam-bs [20:52] *** thuban4 has quit IRC (Read error: Operation timed out) [21:03] *** lindalap has joined #archiveteam-bs [21:10] *** lindalap has quit IRC (Quit: lindalap) [21:43] *** enowaldo has joined #archiveteam-bs [21:45] *** BlueMax has joined #archiveteam-bs [21:48] *** enowaldo has quit IRC (Ping timeout: 268 seconds) [22:09] godane: Is there any recommended VHS ripping kit btw or do they all produce about the same quality? [22:10] i'm using a usb easycap [22:11] it was my cheap solution to capture my home recordings [22:11] then captures everything Jason sents me [22:23] *** DigiDigi has joined #archiveteam-bs [22:34] Ok [22:37] *** phiresky has quit IRC (Quit: The Lounge - https://thelounge.chat) [22:38] *** Atom has joined #archiveteam-bs [22:38] *** BlueMax has quit IRC (Quit: Leaving) [23:03] *** Zerote has quit IRC (Ping timeout: 252 seconds) [23:10] *** Hani has quit IRC (Ping timeout: 615 seconds) [23:10] *** Hani has joined #archiveteam-bs [23:27] *** exoire has joined #archiveteam-bs [23:31] JAA: thanks for the heads up, stopped [23:31] hum [23:32] it was finished, oddly [23:32] $ wc -l all-lengths [23:32] 6727 all-lengths [23:32] $ awk '{ total += $2 } END { print total }' < all-lengths [23:32] 918453455252 [23:32] Maybe the server doesn't advertise the length for some (most?) URLs? [23:32] that says 918GB [23:32] but i would trust the other numbers we had before here better [23:33] anyways [23:52] *** godane has quit IRC (Ping timeout: 246 seconds)