#archiveteam-bs 2016-07-03,Sun

↑back Search

Time Nickname Message
00:16 🔗 BlueMaxim has joined #archiveteam-bs
01:33 🔗 JesseW has joined #archiveteam-bs
01:54 🔗 VADemon has joined #archiveteam-bs
02:04 🔗 xXx_ndidd has joined #archiveteam-bs
02:07 🔗 ndiddy has quit IRC (Ping timeout: 244 seconds)
02:08 🔗 DoomTay has joined #archiveteam-bs
02:14 🔗 godane so this is odd
02:14 🔗 godane i found out that TBC in korea has mms and rtmp streams
02:14 🔗 godane but none of them work for me
02:15 🔗 godane example url thats from website today:
02:15 🔗 godane rtmp://media.tbc.co.kr:1935/vod/_definst_/mp4:news_mp4/prime16-0702-160702013.mp4
02:15 🔗 godane i can't get that url to work
02:15 🔗 godane here is a older example url: mms://vod.tbc.co.kr/vod3/news/prime13-0815.wmv
02:16 🔗 godane i end up with 404 bad request errors when trying mms
02:17 🔗 godane the page for the rtmp stream : http://www.tbc.co.kr/tbc_news/n14_newsview.html?p_no=160702013&news_code=46
02:32 🔗 vitzli has joined #archiveteam-bs
03:00 🔗 xmc i've had mobileme data sitting on my nas for a long time
03:00 🔗 xmc i finally got around to doing a few spot checks and it looks like there's nothing that wasn't uploaded
03:01 🔗 xmc so, time to reclaim that space
03:47 🔗 xXx_ndidd is now known as ndiddy
04:05 🔗 FalconK has quit IRC (Remote host closed the connection)
04:50 🔗 Sk1d has quit IRC (Ping timeout: 194 seconds)
04:56 🔗 Sk1d has joined #archiveteam-bs
04:56 🔗 Sk1d has quit IRC (Connection closed)
04:58 🔗 Sk1d has joined #archiveteam-bs
05:27 🔗 dashcloud has quit IRC (Read error: Operation timed out)
05:31 🔗 dashcloud has joined #archiveteam-bs
05:43 🔗 FalconK has joined #archiveteam-bs
05:43 🔗 FalconK aha!
05:43 🔗 yipdw July 2 and it's 65 F here
05:43 🔗 yipdw this is hilarious
05:43 🔗 FalconK huh.
05:43 🔗 yipdw Climate Change Simulator 2016
05:43 🔗 FalconK here in calgary it has snowed in july back in the 80s
05:43 🔗 FalconK every decade or so it happens once I think?
05:44 🔗 * FalconK shrugs
05:44 🔗 FalconK anyway
05:44 🔗 FalconK so if I'm not very much mistaken, the uploading of the json file to fos is how fos knows the job is done
05:44 🔗 FalconK but what else is in the json file?
05:44 🔗 yipdw oh, the JSON file isn't really used as a signal
05:44 🔗 yipdw packs are just packed up and shipped off at time intervals
05:45 🔗 yipdw the JSON file contains some stuff like job URL, who did it, when, etc
05:46 🔗 FalconK aah
05:46 🔗 FalconK what is used as a signal then?
05:46 🔗 FalconK does the pipeline make the appropriate redis calls itself?
05:46 🔗 yipdw there's no completion signal
05:47 🔗 yipdw well
05:47 🔗 yipdw there is a finished signal once the pipeline finishes
05:48 🔗 yipdw but that's about it
05:49 🔗 FalconK mm
05:50 🔗 FalconK I might actually have a little time to put into archivebot this week, since I'm between projets
05:50 🔗 FalconK projects.
05:51 🔗 yipdw the JSON file can be uploaded out-of-sequence with the WARCs, so it's not really that useful as a completion signal
05:51 🔗 yipdw (concurrent upload processes and odd filesystem order etc)
05:55 🔗 FalconK aah
05:55 🔗 FalconK oh by the way, that job you asked about might be caught in the infinite epoll loop bug
05:55 🔗 FalconK I should probably update ananiel since I gather our current version gets rid of that but
05:55 🔗 FalconK so many long-term jobs
05:56 🔗 FalconK I could murder them all and restart them after, but
05:56 🔗 FalconK it would be nice to have more features to upgrade to, for that
05:58 🔗 FalconK anyway I have to catch a flight in like 7 hours so I better get to bed
06:00 🔗 DoomTay Maybe temporary disallow !a jobs until all those others clear?
06:01 🔗 FalconK you can't actually do that once it's started
06:01 🔗 FalconK the only thing you can do to manipulate it is tell it to stop by putting in a stopfile
06:02 🔗 FalconK we could use some additional features, like the ability to control such things, and the ability to save off state so you can restart it
06:06 🔗 FalconK yeah a couple jobs with no reporting for 15min just started reporting again
06:06 🔗 FalconK so usually I want a long time
06:07 🔗 * FalconK shrugs
06:07 🔗 FalconK another useful feature would be a ring buffer for wpull.log, which uses tons of space for long jobs
06:07 🔗 SketchCow SO MUCH DOWNLOADING
06:08 🔗 * FalconK downloads a car
06:09 🔗 DoomTay The irony is there's a few jobs that look like they're close to completion, but are frozen
06:18 🔗 FalconK well consider how it works
06:19 🔗 FalconK suppose you were archiving a simple website that just had next and back buttons on each page, of 100000 pages, and a handful of images on each page, for buttons and one for content (say, a gallery)
06:19 🔗 FalconK every page load, there would be ~10 duplicates that just get dropped from the queue, and two items enqueued
06:20 🔗 FalconK now suppose for some reason the web server takes 3 hours to send the next page
06:20 🔗 FalconK one item in queue, 50000 downloaded.
06:20 🔗 FalconK the 50000 to go is unknowable.
06:21 🔗 tomwsmf-a has quit IRC (Ping timeout: 258 seconds)
06:21 🔗 tomwsmf-a has joined #archiveteam-bs
06:25 🔗 DoomTay Good point
06:25 🔗 DoomTay I think I actually saw that happen once
06:26 🔗 FalconK all the time.
06:28 🔗 yipdw FalconK: yeah, for wpull.log I'd like multiple metawarcs
06:28 🔗 yipdw I suspect wpull can do this
06:28 🔗 yipdw I just haven't looked
06:47 🔗 FalconK it certainly doesn't mind if I truncate the log to free up disk
06:47 🔗 FalconK to be able to move it off, though, it'd have to be re-opening it every time it writes a log line
07:08 🔗 DoomTay has quit IRC (Quit: Page closed)
07:35 🔗 JesseW has quit IRC (Read error: Operation timed out)
07:50 🔗 tomwsmf-a has quit IRC (Read error: Operation timed out)
07:58 🔗 metalcamp has joined #archiveteam-bs
08:03 🔗 metal_cam has joined #archiveteam-bs
08:05 🔗 metalcamp has quit IRC (Ping timeout: 244 seconds)
08:10 🔗 godane can anyone view this file in browser: http://www.tbc.co.kr/tbc_player/vod14_player.html?vodurl=top/top14-1014.mp4&imgurl=top/top14-1014.jpg&board_id=top14_vod&pro_cnt=2014%EB%85%84%2010%EC%9B%94%2014%EC%9D%BC%20%EB%B0%A9%EC%86%A1
08:10 🔗 metal_cam has quit IRC (Ping timeout: 244 seconds)
08:11 🔗 godane i can't seem to play any streams from tbc.co.kr
08:11 🔗 godane there maybe news archives that go back to 2005
08:11 🔗 metalcamp has joined #archiveteam-bs
08:13 🔗 metal_cam has joined #archiveteam-bs
08:14 🔗 ravetcofx has joined #archiveteam-bs
08:16 🔗 godane ok now i'm getting something
08:16 🔗 metalcamp has quit IRC (Ping timeout: 244 seconds)
08:29 🔗 metal_cam is now known as metalcamp
08:45 🔗 Stilett0 has quit IRC (Read error: Connection reset by peer)
08:45 🔗 Stiletto has joined #archiveteam-bs
09:16 🔗 godane anyways i'm grabbing the special event videos from tbc
09:16 🔗 godane they got back to 2006
09:40 🔗 HCross godane, http://schoolsweek.co.uk/archive/ more newspapers for you :)
09:59 🔗 godane HCross i sent that to archivebot for now
10:00 🔗 godane i will grab the pdfs at later point
10:00 🔗 HCross ok :)
10:43 🔗 metalcamp has quit IRC (Ping timeout: 244 seconds)
10:45 🔗 metalcamp has joined #archiveteam-bs
10:52 🔗 ris has joined #archiveteam-bs
11:02 🔗 godane so based on the front pages for tbc.co.kr
11:02 🔗 godane the mms streams may have stopped around summer 2014
11:02 🔗 godane https://web.archive.org/web/20140702065231/http://www.tbc.co.kr/
11:04 🔗 godane there are mms urls on the front page
11:44 🔗 dashcloud has quit IRC (Read error: Operation timed out)
11:47 🔗 dashcloud has joined #archiveteam-bs
12:28 🔗 signius has quit IRC (Ping timeout: 260 seconds)
12:34 🔗 signius has joined #archiveteam-bs
13:09 🔗 dashcloud has quit IRC (Read error: Operation timed out)
13:13 🔗 dashcloud has joined #archiveteam-bs
13:15 🔗 BlueMaxim has quit IRC (Quit: Leaving)
13:20 🔗 ndiddy has quit IRC (Read error: Connection reset by peer)
13:25 🔗 ris has quit IRC ()
13:41 🔗 kristian_ has joined #archiveteam-bs
13:54 🔗 VADemon has quit IRC (Read error: Connection reset by peer)
14:16 🔗 dashcloud has quit IRC (Read error: Operation timed out)
14:19 🔗 dashcloud has joined #archiveteam-bs
14:26 🔗 godane i'm at 740k items now
14:32 🔗 atrocity has quit IRC (Ping timeout: 272 seconds)
14:39 🔗 metalcamp has quit IRC (Ping timeout: 244 seconds)
14:41 🔗 metalcamp has joined #archiveteam-bs
14:50 🔗 kristian_ has quit IRC (Leaving)
15:44 🔗 arkiver2 has joined #archiveteam-bs
16:12 🔗 dashcloud has quit IRC (Read error: Operation timed out)
16:15 🔗 dashcloud has joined #archiveteam-bs
16:23 🔗 DoomTay has joined #archiveteam-bs
16:26 🔗 JesseW has joined #archiveteam-bs
17:19 🔗 DoomTay Well I'll be. The beta version of Wayback Machine seems to have fixed almost all of the bugs I brought up with info@archive.org
17:20 🔗 joepie91 not mine yet :P
17:20 🔗 dashcloud has quit IRC (Read error: Operation timed out)
17:24 🔗 dashcloud has joined #archiveteam-bs
17:31 🔗 arkiver2 has quit IRC (Ping timeout: 244 seconds)
17:49 🔗 JesseW Hm... https://archive.org/details/@numbers_station -- just what it sounds like; since last April. No provenance, for what little that matters.
19:05 🔗 bzc6p has joined #archiveteam-bs
19:05 🔗 swebb sets mode: +o bzc6p
19:06 🔗 tomwsmf-a has joined #archiveteam-bs
19:13 🔗 SketchCow Back onsite tomorrow
19:14 🔗 SketchCow And yes, there's amazing stuff every day up on the archive
19:14 🔗 SketchCow I go through the stacks to find things to make collections
19:15 🔗 SketchCow My internal rule is 100 good items or more, likely to be a collection.
19:15 🔗 SketchCow Under, it better be fucking AMAZING
19:15 🔗 * CatButts waits for return of fie_ senpai
19:17 🔗 SketchCow I'm grabbing a collection of old operating system images.
19:17 🔗 SketchCow It's.... huge.
19:17 🔗 JesseW multi-petabyte huge, or less than that?
19:17 🔗 vitzli has quit IRC (Quit: Leaving)
19:17 🔗 SketchCow Dude, nothing ever multi-petabyte huge
19:18 🔗 SketchCow If something multi-petabyte huge, I'm in a meeting you're not invited to to be told not to do it
19:18 🔗 SketchCow That's why we don't have scientific datasets
19:18 🔗 SketchCow And why I had to turn down some satellite imagery
19:18 🔗 SketchCow They were all "10tb a day" and I was all "whateverrrrrrr"
19:19 🔗 JesseW ok, so multi-*terabyte* huge, then. :-)
19:20 🔗 * JesseW is perfectly happy not to be invited to such meetings
19:23 🔗 SketchCow Yeah, no happiness in those
19:23 🔗 SketchCow Multi-terabyte is more like it, yes
19:23 🔗 SketchCow I have agency but not THAT much
19:24 🔗 SketchCow And archive team beyond me is quite a load on the sets
19:26 🔗 JesseW I do wonder what the software heritage people will eventually come up with. It's ... not very visible ... yet.
19:26 🔗 yipdw garbage fires
19:26 🔗 yipdw sorry I've been porting shit to Rails 5 and am annoyed
19:27 🔗 SketchCow Now now
19:27 🔗 SketchCow I'm to write them an endorsement
19:27 🔗 yipdw oh the heritage people will do fine I'm sure
19:27 🔗 yipdw it's the software itself
19:36 🔗 bzc6p has left
19:40 🔗 robink has joined #archiveteam-bs
19:42 🔗 robink has quit IRC (Read error: Connection reset by peer)
19:43 🔗 closure has joined #archiveteam-bs
19:43 🔗 midas sets mode: +o closure
20:03 🔗 JesseW has quit IRC (Ping timeout: 370 seconds)
20:04 🔗 metalcamp has quit IRC (Read error: Connection reset by peer)
20:18 🔗 fie_ CatButts, still nothing
20:19 🔗 CatButts ah
20:19 🔗 CatButts I see
20:20 🔗 DoomTay has quit IRC (Ping timeout: 268 seconds)
20:43 🔗 j08nY has joined #archiveteam-bs
20:58 🔗 DoomTay has joined #archiveteam-bs
21:35 🔗 ring has quit IRC (Ping timeout: 260 seconds)
21:43 🔗 ring has joined #archiveteam-bs
21:59 🔗 DoomTay You think 5qh8wqh219asr5433wy7rzgzt is taking forever? I'm surprised that 8o9ey88xpscwsvwbhudlu5dz5 is still going
22:06 🔗 JesseW has joined #archiveteam-bs
22:41 🔗 fie_ has quit IRC (Read error: Connection reset by peer)
23:57 🔗 BlueMaxim has joined #archiveteam-bs

irclogger-viewer