[00:16] *** BlueMaxim has joined #archiveteam-bs [01:33] *** JesseW has joined #archiveteam-bs [01:54] *** VADemon has joined #archiveteam-bs [02:04] *** xXx_ndidd has joined #archiveteam-bs [02:07] *** ndiddy has quit IRC (Ping timeout: 244 seconds) [02:08] *** DoomTay has joined #archiveteam-bs [02:14] so this is odd [02:14] i found out that TBC in korea has mms and rtmp streams [02:14] but none of them work for me [02:15] example url thats from website today: [02:15] rtmp://media.tbc.co.kr:1935/vod/_definst_/mp4:news_mp4/prime16-0702-160702013.mp4 [02:15] i can't get that url to work [02:15] here is a older example url: mms://vod.tbc.co.kr/vod3/news/prime13-0815.wmv [02:16] i end up with 404 bad request errors when trying mms [02:17] the page for the rtmp stream : http://www.tbc.co.kr/tbc_news/n14_newsview.html?p_no=160702013&news_code=46 [02:32] *** vitzli has joined #archiveteam-bs [03:00] i've had mobileme data sitting on my nas for a long time [03:00] i finally got around to doing a few spot checks and it looks like there's nothing that wasn't uploaded [03:01] so, time to reclaim that space [03:47] *** xXx_ndidd is now known as ndiddy [04:05] *** FalconK has quit IRC (Remote host closed the connection) [04:50] *** Sk1d has quit IRC (Ping timeout: 194 seconds) [04:56] *** Sk1d has joined #archiveteam-bs [04:56] *** Sk1d has quit IRC (Connection closed) [04:58] *** Sk1d has joined #archiveteam-bs [05:27] *** dashcloud has quit IRC (Read error: Operation timed out) [05:31] *** dashcloud has joined #archiveteam-bs [05:43] *** FalconK has joined #archiveteam-bs [05:43] aha! [05:43] July 2 and it's 65 F here [05:43] this is hilarious [05:43] huh. [05:43] Climate Change Simulator 2016 [05:43] here in calgary it has snowed in july back in the 80s [05:43] every decade or so it happens once I think? [05:44] * FalconK shrugs [05:44] anyway [05:44] so if I'm not very much mistaken, the uploading of the json file to fos is how fos knows the job is done [05:44] but what else is in the json file? [05:44] oh, the JSON file isn't really used as a signal [05:44] packs are just packed up and shipped off at time intervals [05:45] the JSON file contains some stuff like job URL, who did it, when, etc [05:46] aah [05:46] what is used as a signal then? [05:46] does the pipeline make the appropriate redis calls itself? [05:46] there's no completion signal [05:47] well [05:47] there is a finished signal once the pipeline finishes [05:48] but that's about it [05:49] mm [05:50] I might actually have a little time to put into archivebot this week, since I'm between projets [05:50] projects. [05:51] the JSON file can be uploaded out-of-sequence with the WARCs, so it's not really that useful as a completion signal [05:51] (concurrent upload processes and odd filesystem order etc) [05:55] aah [05:55] oh by the way, that job you asked about might be caught in the infinite epoll loop bug [05:55] I should probably update ananiel since I gather our current version gets rid of that but [05:55] so many long-term jobs [05:56] I could murder them all and restart them after, but [05:56] it would be nice to have more features to upgrade to, for that [05:58] anyway I have to catch a flight in like 7 hours so I better get to bed [06:00] Maybe temporary disallow !a jobs until all those others clear? [06:01] you can't actually do that once it's started [06:01] the only thing you can do to manipulate it is tell it to stop by putting in a stopfile [06:02] we could use some additional features, like the ability to control such things, and the ability to save off state so you can restart it [06:06] yeah a couple jobs with no reporting for 15min just started reporting again [06:06] so usually I want a long time [06:07] * FalconK shrugs [06:07] another useful feature would be a ring buffer for wpull.log, which uses tons of space for long jobs [06:07] SO MUCH DOWNLOADING [06:08] * FalconK downloads a car [06:09] The irony is there's a few jobs that look like they're close to completion, but are frozen [06:18] well consider how it works [06:19] suppose you were archiving a simple website that just had next and back buttons on each page, of 100000 pages, and a handful of images on each page, for buttons and one for content (say, a gallery) [06:19] every page load, there would be ~10 duplicates that just get dropped from the queue, and two items enqueued [06:20] now suppose for some reason the web server takes 3 hours to send the next page [06:20] one item in queue, 50000 downloaded. [06:20] the 50000 to go is unknowable. [06:21] *** tomwsmf-a has quit IRC (Ping timeout: 258 seconds) [06:21] *** tomwsmf-a has joined #archiveteam-bs [06:25] Good point [06:25] I think I actually saw that happen once [06:26] all the time. [06:28] FalconK: yeah, for wpull.log I'd like multiple metawarcs [06:28] I suspect wpull can do this [06:28] I just haven't looked [06:47] it certainly doesn't mind if I truncate the log to free up disk [06:47] to be able to move it off, though, it'd have to be re-opening it every time it writes a log line [07:08] *** DoomTay has quit IRC (Quit: Page closed) [07:35] *** JesseW has quit IRC (Read error: Operation timed out) [07:50] *** tomwsmf-a has quit IRC (Read error: Operation timed out) [07:58] *** metalcamp has joined #archiveteam-bs [08:03] *** metal_cam has joined #archiveteam-bs [08:05] *** metalcamp has quit IRC (Ping timeout: 244 seconds) [08:10] can anyone view this file in browser: http://www.tbc.co.kr/tbc_player/vod14_player.html?vodurl=top/top14-1014.mp4&imgurl=top/top14-1014.jpg&board_id=top14_vod&pro_cnt=2014%EB%85%84%2010%EC%9B%94%2014%EC%9D%BC%20%EB%B0%A9%EC%86%A1 [08:10] *** metal_cam has quit IRC (Ping timeout: 244 seconds) [08:11] i can't seem to play any streams from tbc.co.kr [08:11] there maybe news archives that go back to 2005 [08:11] *** metalcamp has joined #archiveteam-bs [08:13] *** metal_cam has joined #archiveteam-bs [08:14] *** ravetcofx has joined #archiveteam-bs [08:16] ok now i'm getting something [08:16] *** metalcamp has quit IRC (Ping timeout: 244 seconds) [08:29] *** metal_cam is now known as metalcamp [08:45] *** Stilett0 has quit IRC (Read error: Connection reset by peer) [08:45] *** Stiletto has joined #archiveteam-bs [09:16] anyways i'm grabbing the special event videos from tbc [09:16] they got back to 2006 [09:40] godane, http://schoolsweek.co.uk/archive/ more newspapers for you :) [09:59] HCross i sent that to archivebot for now [10:00] i will grab the pdfs at later point [10:00] ok :) [10:43] *** metalcamp has quit IRC (Ping timeout: 244 seconds) [10:45] *** metalcamp has joined #archiveteam-bs [10:52] *** ris has joined #archiveteam-bs [11:02] so based on the front pages for tbc.co.kr [11:02] the mms streams may have stopped around summer 2014 [11:02] https://web.archive.org/web/20140702065231/http://www.tbc.co.kr/ [11:04] there are mms urls on the front page [11:44] *** dashcloud has quit IRC (Read error: Operation timed out) [11:47] *** dashcloud has joined #archiveteam-bs [12:28] *** signius has quit IRC (Ping timeout: 260 seconds) [12:34] *** signius has joined #archiveteam-bs [13:09] *** dashcloud has quit IRC (Read error: Operation timed out) [13:13] *** dashcloud has joined #archiveteam-bs [13:15] *** BlueMaxim has quit IRC (Quit: Leaving) [13:20] *** ndiddy has quit IRC (Read error: Connection reset by peer) [13:25] *** ris has quit IRC () [13:41] *** kristian_ has joined #archiveteam-bs [13:54] *** VADemon has quit IRC (Read error: Connection reset by peer) [14:16] *** dashcloud has quit IRC (Read error: Operation timed out) [14:19] *** dashcloud has joined #archiveteam-bs [14:26] i'm at 740k items now [14:32] *** atrocity has quit IRC (Ping timeout: 272 seconds) [14:39] *** metalcamp has quit IRC (Ping timeout: 244 seconds) [14:41] *** metalcamp has joined #archiveteam-bs [14:50] *** kristian_ has quit IRC (Leaving) [15:44] *** arkiver2 has joined #archiveteam-bs [16:12] *** dashcloud has quit IRC (Read error: Operation timed out) [16:15] *** dashcloud has joined #archiveteam-bs [16:23] *** DoomTay has joined #archiveteam-bs [16:26] *** JesseW has joined #archiveteam-bs [17:19] Well I'll be. The beta version of Wayback Machine seems to have fixed almost all of the bugs I brought up with info@archive.org [17:20] not mine yet :P [17:20] *** dashcloud has quit IRC (Read error: Operation timed out) [17:24] *** dashcloud has joined #archiveteam-bs [17:31] *** arkiver2 has quit IRC (Ping timeout: 244 seconds) [17:49] Hm... https://archive.org/details/@numbers_station -- just what it sounds like; since last April. No provenance, for what little that matters. [19:05] *** bzc6p has joined #archiveteam-bs [19:05] *** swebb sets mode: +o bzc6p [19:06] *** tomwsmf-a has joined #archiveteam-bs [19:13] Back onsite tomorrow [19:14] And yes, there's amazing stuff every day up on the archive [19:14] I go through the stacks to find things to make collections [19:15] My internal rule is 100 good items or more, likely to be a collection. [19:15] Under, it better be fucking AMAZING [19:15] * CatButts waits for return of fie_ senpai [19:17] I'm grabbing a collection of old operating system images. [19:17] It's.... huge. [19:17] multi-petabyte huge, or less than that? [19:17] *** vitzli has quit IRC (Quit: Leaving) [19:17] Dude, nothing ever multi-petabyte huge [19:18] If something multi-petabyte huge, I'm in a meeting you're not invited to to be told not to do it [19:18] That's why we don't have scientific datasets [19:18] And why I had to turn down some satellite imagery [19:18] They were all "10tb a day" and I was all "whateverrrrrrr" [19:19] ok, so multi-*terabyte* huge, then. :-) [19:20] * JesseW is perfectly happy not to be invited to such meetings [19:23] Yeah, no happiness in those [19:23] Multi-terabyte is more like it, yes [19:23] I have agency but not THAT much [19:24] And archive team beyond me is quite a load on the sets [19:26] I do wonder what the software heritage people will eventually come up with. It's ... not very visible ... yet. [19:26] garbage fires [19:26] sorry I've been porting shit to Rails 5 and am annoyed [19:27] Now now [19:27] I'm to write them an endorsement [19:27] oh the heritage people will do fine I'm sure [19:27] it's the software itself [19:36] *** bzc6p has left [19:40] *** robink has joined #archiveteam-bs [19:42] *** robink has quit IRC (Read error: Connection reset by peer) [19:43] *** closure has joined #archiveteam-bs [19:43] *** midas sets mode: +o closure [20:03] *** JesseW has quit IRC (Ping timeout: 370 seconds) [20:04] *** metalcamp has quit IRC (Read error: Connection reset by peer) [20:18] CatButts, still nothing [20:19] ah [20:19] I see [20:20] *** DoomTay has quit IRC (Ping timeout: 268 seconds) [20:43] *** j08nY has joined #archiveteam-bs [20:58] *** DoomTay has joined #archiveteam-bs [21:35] *** ring has quit IRC (Ping timeout: 260 seconds) [21:43] *** ring has joined #archiveteam-bs [21:59] You think 5qh8wqh219asr5433wy7rzgzt is taking forever? I'm surprised that 8o9ey88xpscwsvwbhudlu5dz5 is still going [22:06] *** JesseW has joined #archiveteam-bs [22:41] *** fie_ has quit IRC (Read error: Connection reset by peer) [23:57] *** BlueMaxim has joined #archiveteam-bs