[00:13] *** bwn has joined #archiveteam-bs [00:27] *** GE has quit IRC (Remote host closed the connection) [00:33] *** pizzaiolo has quit IRC (Ping timeout: 260 seconds) [00:55] *** JAA has quit IRC (Quit: Page closed) [00:59] *** matt_lock has quit IRC (Ping timeout: 268 seconds) [01:31] *** n00b811 has quit IRC (Ping timeout: 268 seconds) [01:46] *** icedice has quit IRC (Quit: Leaving) [01:51] <godane> SketchCow: i'm close to getting tagesschau 20 clock evening news up to 1992-09-30 [01:51] <godane> i'm at 1992-09-29 right now [01:52] <godane> https://archive.org/details/tagesschau-20-clock-evening-news-1992-09-29 [02:31] *** vitzli has joined #archiveteam-bs [02:56] *** vitzli has quit IRC (Quit: Leaving) [03:09] *** ndiddy has joined #archiveteam-bs [03:17] *** ndiddy has quit IRC () [04:10] <dxrt> Somebody2: Have you run into any ratelimiting when using curl with the wayback save function? [04:46] *** Sk1d has quit IRC (Ping timeout: 250 seconds) [04:51] <Somebody2> dxrt: I've only been running it once a day, so no. :-) [04:53] *** Sk1d has joined #archiveteam-bs [07:17] *** JAA has joined #archiveteam-bs [07:23] <JAA> Shit, my WunderBlogs grab ran out of space because of an MJPEG of several GB. :-| [08:23] <wp494> JAA: expect to run into a little bit of that, satellite GIFs can be several hundreds of MBs if not more [08:23] <wp494> at least GOES-R isn't fully operational yet because that would definitely drive lots more data [08:33] <Sanqui> well a MJPEG can be of an infinite size [08:33] <Sanqui> can't it [08:34] *** GE has joined #archiveteam-bs [08:39] <JAA> wp494: I just don't expect those huge images to be embedded in a webpage since it also leads to a terrible user experience (webpage loading forever). I'm not following external links, only grabbing page requisites. [08:48] <Sanqui> JAA: mjpeg is used for streams. think webcams [08:55] <JAA> Ooh, that makes sense [08:57] <Sanqui> it's sort of a hack [08:58] <Sanqui> from back when <video> nor flash didn't exist [09:05] <JAA> "sort of" [09:40] *** odemg has joined #archiveteam-bs [09:40] *** BlueMaxim has quit IRC (Read error: Operation timed out) [09:41] *** BlueMaxim has joined #archiveteam-bs [09:52] <JAA> What the hell? I added a --reject-regex for that MJPEG, but wpull tried downloading it anyway. [10:04] *** GE has quit IRC (Remote host closed the connection) [11:14] *** BlueMaxim has quit IRC (Quit: Leaving) [11:55] *** dashcloud has quit IRC (Read error: Connection reset by peer) [11:55] *** dashcloud has joined #archiveteam-bs [11:59] *** pizzaiolo has joined #archiveteam-bs [12:04] <JAA> Terrific. When a torrent isn't found on Mininova, the download link returns HTTP status 200. Sigh... [12:06] *** odemg has quit IRC (Remote host closed the connection) [12:27] *** GE has joined #archiveteam-bs [13:06] *** odemg has joined #archiveteam-bs [13:27] *** icedice has joined #archiveteam-bs [14:50] *** pnJay has joined #archiveteam-bs [16:23] *** icedice has quit IRC (Quit: Leaving) [16:45] *** odemg has quit IRC (Quit: fucked right off!!) [16:46] *** odemg has joined #archiveteam-bs [17:34] *** GE has quit IRC (Remote host closed the connection) [18:15] <Frogging> done https://github.com/chfoo/wpull/pull/360 [18:30] *** odemg has quit IRC (Remote host closed the connection) [18:30] *** odemg has joined #archiveteam-bs [18:31] *** tpw_rules has left Textual IRC Client: www.textualapp.com [18:53] *** GE has joined #archiveteam-bs [18:53] *** JAA has quit IRC (Ping timeout: 268 seconds) [18:55] *** Aranje has quit IRC (Read error: Connection reset by peer) [18:59] *** JAA has joined #archiveteam-bs [19:17] *** kyounko has quit IRC (Read error: Connection reset by peer) [19:42] *** odemg has quit IRC (Remote host closed the connection) [19:53] *** Aranje has joined #archiveteam-bs [20:17] *** odemg has joined #archiveteam-bs [20:31] *** odemg has quit IRC (Remote host closed the connection) [20:32] *** odemg has joined #archiveteam-bs [20:58] <JAA> Whoa, just discovered that ArchiveBot downloaded a 1.7 GB video file while grabbing Mininova [21:01] <JAA> ... and the video's filename contains "part05". I guess that might explain why the grab is 68 GB (compressed). [21:20] <JAA> Yep, that's definitely not the only one in those archives. [21:48] <JAA> arkiver, rocode: Better estimate for Mininova torrent data size incoming. [21:48] <JAA> I extracted all correctly downloaded (status 200) .torrent files from the ArchiveBot grab WARC uploaded to IA as of yesterday (just realised moments ago that the upload is still running; I analysed WARCs 00000 to 00024); there are 48294 files in these, 6 of which are empty. [21:48] <JAA> The 48288 valid torrents contain 2530986261593 bytes = 2.30 TiB (average size 49.99 MiB). Extrapolating to the total torrent count of 72056 gives a total size of approximately 3.43 TiB. [21:49] <JAA> Also xmc & HCross2 ^ [21:52] <JAA> My grab is at 146k done, 293k left, by the way [21:53] *** dmt` has left [21:53] <JAA> WunderBlogs update: 283k done, 463k left [22:00] *** JAA has quit IRC (Quit: Page closed) [22:00] *** JAA has joined #archiveteam-bs [22:00] *** JAA has quit IRC (Client Quit) [22:00] *** JAA has joined #archiveteam-bs [22:01] <JAA> (Sorry about that) [22:06] *** ZizzyDizz has quit IRC (Remote host closed the connection) [22:06] *** ZizzyDizz has joined #archiveteam-bs [22:10] *** Dark_Star has quit IRC (Ping timeout: 246 seconds) [22:12] *** BlueMaxim has joined #archiveteam-bs [22:15] *** Dark_Star has joined #archiveteam-bs [22:19] *** pizzaiolo has left [22:24] *** GE has quit IRC (Remote host closed the connection) [22:27] *** pizzaiolo has joined #archiveteam-bs [22:30] *** TC01 has quit IRC (Read error: Operation timed out) [22:31] *** TC01 has joined #archiveteam-bs [23:22] *** JAA has quit IRC (Quit: Page closed) [23:37] *** kristian_ has joined #archiveteam-bs [23:57] *** ZizzyDizz has quit IRC (Remote host closed the connection)