[00:03] *** octothorp has joined #archiveteam-bs [00:41] *** Asparagir has joined #archiveteam-bs [01:13] 444gb torrent site dump [01:13] ah, godane beat me to it [01:14] *** Asparagir has quit IRC (Asparagir) [01:14] robogoat: I have the infohash-only dump which is 300mb gzipped [01:22] i have the infohash-only dump too [01:24] maybe some here will want to archive this: https://www.youtube.com/user/mikeziegler/videos [01:34] I HAVE A SUTPID IDEA [01:34] AS SUTPID AS IT GETS [01:34] But I was thinking of it today. [01:35] Maybe we should have some page on the Wiki, some obvious name, which is the HOT BUTTON topics, the ones we're getting a lot about, so people aren't dependent on the IRC scroll. [01:35] Optional. Mostly reflective that I've been really busy with non IRC stuff, but if people think we're keeping up, great. [01:36] godane: lol, just sampled 100 odd hashes - porn. [01:38] so i'm now capturing a bad tape of mst3k [01:38] thats going be 6 hours [01:38] SketchCow: btw one tape i will have to fix cause there is a bit of plastic in it [01:39] also the tape guard will have be replace [01:40] left side goes up more the the right side [01:44] *** jacketcha has quit IRC (Ping timeout: 252 seconds) [01:53] *** Dimtree has joined #archiveteam-bs [01:57] just found out YouTube deprecated video location fields: https://developers.google.com/youtube/v3/revision_history#release_notes_06_01_2017 [01:57] * Stilett0 has been living under a rock apparently [01:58] obligatory "SketchCow: DOOMED" [01:58] *** Stilett0 is now known as Stiletto [02:01] SketchCow: we really need to get the show called 'The Site' to have some digitize full episodes [02:01] there is no full episodes of it out there [02:02] its that and digitizing any zdtv stuff from 1998/1999 [02:03] Over time a lot will come out, I'm sure. [02:05] after this box i have to digitize the new box of stuff i bought [02:06] i think that new box will take about 7 to 10 days to digitize [02:18] *** bithippo has joined #archiveteam-bs [02:19] *** Petri152 has quit IRC (Ping timeout: 246 seconds) [02:19] btw i'm at 37,887 items now for this month [02:19] 36,716 are the dtic docs [02:20] *** Petri152 has joined #archiveteam-bs [02:21] *** yuitimoth has quit IRC (Read error: Operation timed out) [02:21] *** yuitimoth has joined #archiveteam-bs [03:20] *** bithippo has quit IRC (Ping timeout: 260 seconds) [03:25] *** Mateon1 has quit IRC (Remote host closed the connection) [03:26] *** Mateon1 has joined #archiveteam-bs [03:33] *** bithippo has joined #archiveteam-bs [03:40] Is explicit permission require to add an item to the "Archive Team" collection? [03:40] s/require/required [04:02] *** bithippo has quit IRC (Quit: Page closed) [04:04] *** Jens has quit IRC (Remote host closed the connection) [04:05] *** Jens has joined #archiveteam-bs [04:23] *** Sanqui has quit IRC (Ping timeout: 260 seconds) [04:35] *** Sanqui has joined #archiveteam-bs [04:39] *** qw3rty119 has joined #archiveteam-bs [04:44] *** qw3rty118 has quit IRC (Ping timeout: 600 seconds) [04:51] voltagex: I got the infohashes as well, are you just sampling by querying the DHT for the infohash? [04:53] Yes [04:54] Just be aware there's some fucked up stuff in there [04:56] Left 10k hashes running, but I've got to get some paid work done today haha [05:01] robogoat: ping me if you're interested in this [05:13] *** Stilett0 has joined #archiveteam-bs [05:14] *** Stilett0 has quit IRC (Client Quit) [05:30] *** BlueMax has quit IRC (Leaving) [05:35] If all goes well I'll have the 444gb dump tomorrow [05:57] *** zyphlar_ has joined #archiveteam-bs [07:40] *** Valentine has quit IRC (Ping timeout: 506 seconds) [07:43] *** Valentine has joined #archiveteam-bs [08:07] *** zyphlar_ has quit IRC (Quit: Connection closed for inactivity) [09:42] *** ranavalon has quit IRC (Read error: Connection reset by peer) [09:44] *** ranavalon has joined #archiveteam-bs [10:32] *** Mateon1 has quit IRC (Read error: Operation timed out) [10:32] *** Mateon1 has joined #archiveteam-bs [12:07] https://www.troyhunt.com/ive-just-launched-pwned-passwords-version-2/ [12:08] Over 500 million hashes now, up from 320M last August. [12:52] *** klondike has joined #archiveteam-bs [16:01] SketchCow: so i think someone here could hack this to get a local wayback machine on rpi project going: https://github.com/alard/warc-proxy [16:02] to me it would be the most simple way to get a jump start one [16:02] i'm not good at python though so its up to some else to do the hard work [16:20] Goddammit, why is URL parsing so damn complicated? [18:26] *** Pixi has quit IRC (Quit: Pixi) [18:27] *** Pixi has joined #archiveteam-bs [18:35] JAA: use a library for it? :P [18:38] *** bitBaron has joined #archiveteam-bs [18:54] *** MrDignity has quit IRC (Remote host closed the connection) [18:54] *** MrDignity has joined #archiveteam-bs [19:00] *** bitBaron has quit IRC (Quit: My computer has gone to sleep. ZZZzzz…) [19:02] *** bitBaron has joined #archiveteam-bs [19:12] *** jschwart has joined #archiveteam-bs [20:19] SketchCow: tape 24 is getting digitize [20:20] tape 23 has no video signal for the last 40 minutes [20:21] most likely going named like this: random-tv-mtv-letterman-hard-copy-making-of-oz-1990.mpg [20:21] for full tape [20:27] *** RichardG has quit IRC (Read error: Connection reset by peer) [20:29] *** RichardG has joined #archiveteam-bs [20:36] *** ola_norsk has joined #archiveteam-bs [20:37] the webrecorder.io guys seem quite nice. They upped my storage from 1.5Gb to 7Gb when i said i used it to warc to IA [20:38] bytes even* [20:43] *** WubTheCap has joined #archiveteam-bs [20:49] some claimed webrecorder was 'shady', but i forget the reason. I don't want to fall under some spell for 7GB of online storage.. [20:49] _why_ are they shady? [21:02] *** atlogbot has quit IRC (Remote host closed the connection) [21:02] *** swebb has quit IRC (Quit: badcheese.com - where crap sometimes gets done) [21:14] *** atlogbot has joined #archiveteam-bs [21:14] *** svchfoo1 sets mode: +v atlogbot [21:23] *** WubTheCap has quit IRC (Read error: Connection reset by peer) [21:47] joepie91: Yeah. I'm working with wpull currently, which already has that code. I guess I'll look into replacing it with urllib or something, but that's more of a long-term goal. I'm just trying to fix bugs right now to get wpull 2 into usable shape. [21:51] NB, it already uses urllib for some things, but not for everything. [21:52] I'm not sure what the reasoning behind that is, but I'm sure I'll find out about all the subtleties when I try to replace it. [22:08] *** ola_norsk has quit IRC (It's all goblins and frogs! https://pastebin.com/raw/jeEdHUQC) [23:01] *** jtn2 has quit IRC (Ping timeout: 492 seconds) [23:04] *** jschwart has quit IRC (Quit: Konversation terminated!) [23:08] *** Famicoman has joined #archiveteam-bs [23:25] *** jtn2 has joined #archiveteam-bs [23:26] https://archive.org/details/Popular_Science_1984-06_June_600_dpi [23:27] So, unfortunately, his scanner was dirty so streaks on the left pages. I've asked him to rescan. [23:27] But the point is there. Lovely. [23:30] He's willing to do it all "right" [23:31] And we've been working back and forth, lots of mail, and he's debinding magazines and off he goes. He's doing Popular Science and Byte [23:46] how well do the debind scans come out? [23:47] ive considered debinding some stuff to get better scans [23:47] dunno if i really have the right tools for it though (xacto, cutting mat, and metal ruler)