Time |
Nickname |
Message |
00:16
🔗
|
|
BlueMaxim has joined #archiveteam-bs |
01:33
🔗
|
|
JesseW has joined #archiveteam-bs |
01:54
🔗
|
|
VADemon has joined #archiveteam-bs |
02:04
🔗
|
|
xXx_ndidd has joined #archiveteam-bs |
02:07
🔗
|
|
ndiddy has quit IRC (Ping timeout: 244 seconds) |
02:08
🔗
|
|
DoomTay has joined #archiveteam-bs |
02:14
🔗
|
godane |
so this is odd |
02:14
🔗
|
godane |
i found out that TBC in korea has mms and rtmp streams |
02:14
🔗
|
godane |
but none of them work for me |
02:15
🔗
|
godane |
example url thats from website today: |
02:15
🔗
|
godane |
rtmp://media.tbc.co.kr:1935/vod/_definst_/mp4:news_mp4/prime16-0702-160702013.mp4 |
02:15
🔗
|
godane |
i can't get that url to work |
02:15
🔗
|
godane |
here is a older example url: mms://vod.tbc.co.kr/vod3/news/prime13-0815.wmv |
02:16
🔗
|
godane |
i end up with 404 bad request errors when trying mms |
02:17
🔗
|
godane |
the page for the rtmp stream : http://www.tbc.co.kr/tbc_news/n14_newsview.html?p_no=160702013&news_code=46 |
02:32
🔗
|
|
vitzli has joined #archiveteam-bs |
03:00
🔗
|
xmc |
i've had mobileme data sitting on my nas for a long time |
03:00
🔗
|
xmc |
i finally got around to doing a few spot checks and it looks like there's nothing that wasn't uploaded |
03:01
🔗
|
xmc |
so, time to reclaim that space |
03:47
🔗
|
|
xXx_ndidd is now known as ndiddy |
04:05
🔗
|
|
FalconK has quit IRC (Remote host closed the connection) |
04:50
🔗
|
|
Sk1d has quit IRC (Ping timeout: 194 seconds) |
04:56
🔗
|
|
Sk1d has joined #archiveteam-bs |
04:56
🔗
|
|
Sk1d has quit IRC (Connection closed) |
04:58
🔗
|
|
Sk1d has joined #archiveteam-bs |
05:27
🔗
|
|
dashcloud has quit IRC (Read error: Operation timed out) |
05:31
🔗
|
|
dashcloud has joined #archiveteam-bs |
05:43
🔗
|
|
FalconK has joined #archiveteam-bs |
05:43
🔗
|
FalconK |
aha! |
05:43
🔗
|
yipdw |
July 2 and it's 65 F here |
05:43
🔗
|
yipdw |
this is hilarious |
05:43
🔗
|
FalconK |
huh. |
05:43
🔗
|
yipdw |
Climate Change Simulator 2016 |
05:43
🔗
|
FalconK |
here in calgary it has snowed in july back in the 80s |
05:43
🔗
|
FalconK |
every decade or so it happens once I think? |
05:44
🔗
|
* |
FalconK shrugs |
05:44
🔗
|
FalconK |
anyway |
05:44
🔗
|
FalconK |
so if I'm not very much mistaken, the uploading of the json file to fos is how fos knows the job is done |
05:44
🔗
|
FalconK |
but what else is in the json file? |
05:44
🔗
|
yipdw |
oh, the JSON file isn't really used as a signal |
05:44
🔗
|
yipdw |
packs are just packed up and shipped off at time intervals |
05:45
🔗
|
yipdw |
the JSON file contains some stuff like job URL, who did it, when, etc |
05:46
🔗
|
FalconK |
aah |
05:46
🔗
|
FalconK |
what is used as a signal then? |
05:46
🔗
|
FalconK |
does the pipeline make the appropriate redis calls itself? |
05:46
🔗
|
yipdw |
there's no completion signal |
05:47
🔗
|
yipdw |
well |
05:47
🔗
|
yipdw |
there is a finished signal once the pipeline finishes |
05:48
🔗
|
yipdw |
but that's about it |
05:49
🔗
|
FalconK |
mm |
05:50
🔗
|
FalconK |
I might actually have a little time to put into archivebot this week, since I'm between projets |
05:50
🔗
|
FalconK |
projects. |
05:51
🔗
|
yipdw |
the JSON file can be uploaded out-of-sequence with the WARCs, so it's not really that useful as a completion signal |
05:51
🔗
|
yipdw |
(concurrent upload processes and odd filesystem order etc) |
05:55
🔗
|
FalconK |
aah |
05:55
🔗
|
FalconK |
oh by the way, that job you asked about might be caught in the infinite epoll loop bug |
05:55
🔗
|
FalconK |
I should probably update ananiel since I gather our current version gets rid of that but |
05:55
🔗
|
FalconK |
so many long-term jobs |
05:56
🔗
|
FalconK |
I could murder them all and restart them after, but |
05:56
🔗
|
FalconK |
it would be nice to have more features to upgrade to, for that |
05:58
🔗
|
FalconK |
anyway I have to catch a flight in like 7 hours so I better get to bed |
06:00
🔗
|
DoomTay |
Maybe temporary disallow !a jobs until all those others clear? |
06:01
🔗
|
FalconK |
you can't actually do that once it's started |
06:01
🔗
|
FalconK |
the only thing you can do to manipulate it is tell it to stop by putting in a stopfile |
06:02
🔗
|
FalconK |
we could use some additional features, like the ability to control such things, and the ability to save off state so you can restart it |
06:06
🔗
|
FalconK |
yeah a couple jobs with no reporting for 15min just started reporting again |
06:06
🔗
|
FalconK |
so usually I want a long time |
06:07
🔗
|
* |
FalconK shrugs |
06:07
🔗
|
FalconK |
another useful feature would be a ring buffer for wpull.log, which uses tons of space for long jobs |
06:07
🔗
|
SketchCow |
SO MUCH DOWNLOADING |
06:08
🔗
|
* |
FalconK downloads a car |
06:09
🔗
|
DoomTay |
The irony is there's a few jobs that look like they're close to completion, but are frozen |
06:18
🔗
|
FalconK |
well consider how it works |
06:19
🔗
|
FalconK |
suppose you were archiving a simple website that just had next and back buttons on each page, of 100000 pages, and a handful of images on each page, for buttons and one for content (say, a gallery) |
06:19
🔗
|
FalconK |
every page load, there would be ~10 duplicates that just get dropped from the queue, and two items enqueued |
06:20
🔗
|
FalconK |
now suppose for some reason the web server takes 3 hours to send the next page |
06:20
🔗
|
FalconK |
one item in queue, 50000 downloaded. |
06:20
🔗
|
FalconK |
the 50000 to go is unknowable. |
06:21
🔗
|
|
tomwsmf-a has quit IRC (Ping timeout: 258 seconds) |
06:21
🔗
|
|
tomwsmf-a has joined #archiveteam-bs |
06:25
🔗
|
DoomTay |
Good point |
06:25
🔗
|
DoomTay |
I think I actually saw that happen once |
06:26
🔗
|
FalconK |
all the time. |
06:28
🔗
|
yipdw |
FalconK: yeah, for wpull.log I'd like multiple metawarcs |
06:28
🔗
|
yipdw |
I suspect wpull can do this |
06:28
🔗
|
yipdw |
I just haven't looked |
06:47
🔗
|
FalconK |
it certainly doesn't mind if I truncate the log to free up disk |
06:47
🔗
|
FalconK |
to be able to move it off, though, it'd have to be re-opening it every time it writes a log line |
07:08
🔗
|
|
DoomTay has quit IRC (Quit: Page closed) |
07:35
🔗
|
|
JesseW has quit IRC (Read error: Operation timed out) |
07:50
🔗
|
|
tomwsmf-a has quit IRC (Read error: Operation timed out) |
07:58
🔗
|
|
metalcamp has joined #archiveteam-bs |
08:03
🔗
|
|
metal_cam has joined #archiveteam-bs |
08:05
🔗
|
|
metalcamp has quit IRC (Ping timeout: 244 seconds) |
08:10
🔗
|
godane |
can anyone view this file in browser: http://www.tbc.co.kr/tbc_player/vod14_player.html?vodurl=top/top14-1014.mp4&imgurl=top/top14-1014.jpg&board_id=top14_vod&pro_cnt=2014%EB%85%84%2010%EC%9B%94%2014%EC%9D%BC%20%EB%B0%A9%EC%86%A1 |
08:10
🔗
|
|
metal_cam has quit IRC (Ping timeout: 244 seconds) |
08:11
🔗
|
godane |
i can't seem to play any streams from tbc.co.kr |
08:11
🔗
|
godane |
there maybe news archives that go back to 2005 |
08:11
🔗
|
|
metalcamp has joined #archiveteam-bs |
08:13
🔗
|
|
metal_cam has joined #archiveteam-bs |
08:14
🔗
|
|
ravetcofx has joined #archiveteam-bs |
08:16
🔗
|
godane |
ok now i'm getting something |
08:16
🔗
|
|
metalcamp has quit IRC (Ping timeout: 244 seconds) |
08:29
🔗
|
|
metal_cam is now known as metalcamp |
08:45
🔗
|
|
Stilett0 has quit IRC (Read error: Connection reset by peer) |
08:45
🔗
|
|
Stiletto has joined #archiveteam-bs |
09:16
🔗
|
godane |
anyways i'm grabbing the special event videos from tbc |
09:16
🔗
|
godane |
they got back to 2006 |
09:40
🔗
|
HCross |
godane, http://schoolsweek.co.uk/archive/ more newspapers for you :) |
09:59
🔗
|
godane |
HCross i sent that to archivebot for now |
10:00
🔗
|
godane |
i will grab the pdfs at later point |
10:00
🔗
|
HCross |
ok :) |
10:43
🔗
|
|
metalcamp has quit IRC (Ping timeout: 244 seconds) |
10:45
🔗
|
|
metalcamp has joined #archiveteam-bs |
10:52
🔗
|
|
ris has joined #archiveteam-bs |
11:02
🔗
|
godane |
so based on the front pages for tbc.co.kr |
11:02
🔗
|
godane |
the mms streams may have stopped around summer 2014 |
11:02
🔗
|
godane |
https://web.archive.org/web/20140702065231/http://www.tbc.co.kr/ |
11:04
🔗
|
godane |
there are mms urls on the front page |
11:44
🔗
|
|
dashcloud has quit IRC (Read error: Operation timed out) |
11:47
🔗
|
|
dashcloud has joined #archiveteam-bs |
12:28
🔗
|
|
signius has quit IRC (Ping timeout: 260 seconds) |
12:34
🔗
|
|
signius has joined #archiveteam-bs |
13:09
🔗
|
|
dashcloud has quit IRC (Read error: Operation timed out) |
13:13
🔗
|
|
dashcloud has joined #archiveteam-bs |
13:15
🔗
|
|
BlueMaxim has quit IRC (Quit: Leaving) |
13:20
🔗
|
|
ndiddy has quit IRC (Read error: Connection reset by peer) |
13:25
🔗
|
|
ris has quit IRC () |
13:41
🔗
|
|
kristian_ has joined #archiveteam-bs |
13:54
🔗
|
|
VADemon has quit IRC (Read error: Connection reset by peer) |
14:16
🔗
|
|
dashcloud has quit IRC (Read error: Operation timed out) |
14:19
🔗
|
|
dashcloud has joined #archiveteam-bs |
14:26
🔗
|
godane |
i'm at 740k items now |
14:32
🔗
|
|
atrocity has quit IRC (Ping timeout: 272 seconds) |
14:39
🔗
|
|
metalcamp has quit IRC (Ping timeout: 244 seconds) |
14:41
🔗
|
|
metalcamp has joined #archiveteam-bs |
14:50
🔗
|
|
kristian_ has quit IRC (Leaving) |
15:44
🔗
|
|
arkiver2 has joined #archiveteam-bs |
16:12
🔗
|
|
dashcloud has quit IRC (Read error: Operation timed out) |
16:15
🔗
|
|
dashcloud has joined #archiveteam-bs |
16:23
🔗
|
|
DoomTay has joined #archiveteam-bs |
16:26
🔗
|
|
JesseW has joined #archiveteam-bs |
17:19
🔗
|
DoomTay |
Well I'll be. The beta version of Wayback Machine seems to have fixed almost all of the bugs I brought up with info@archive.org |
17:20
🔗
|
joepie91 |
not mine yet :P |
17:20
🔗
|
|
dashcloud has quit IRC (Read error: Operation timed out) |
17:24
🔗
|
|
dashcloud has joined #archiveteam-bs |
17:31
🔗
|
|
arkiver2 has quit IRC (Ping timeout: 244 seconds) |
17:49
🔗
|
JesseW |
Hm... https://archive.org/details/@numbers_station -- just what it sounds like; since last April. No provenance, for what little that matters. |
19:05
🔗
|
|
bzc6p has joined #archiveteam-bs |
19:05
🔗
|
|
swebb sets mode: +o bzc6p |
19:06
🔗
|
|
tomwsmf-a has joined #archiveteam-bs |
19:13
🔗
|
SketchCow |
Back onsite tomorrow |
19:14
🔗
|
SketchCow |
And yes, there's amazing stuff every day up on the archive |
19:14
🔗
|
SketchCow |
I go through the stacks to find things to make collections |
19:15
🔗
|
SketchCow |
My internal rule is 100 good items or more, likely to be a collection. |
19:15
🔗
|
SketchCow |
Under, it better be fucking AMAZING |
19:15
🔗
|
* |
CatButts waits for return of fie_ senpai |
19:17
🔗
|
SketchCow |
I'm grabbing a collection of old operating system images. |
19:17
🔗
|
SketchCow |
It's.... huge. |
19:17
🔗
|
JesseW |
multi-petabyte huge, or less than that? |
19:17
🔗
|
|
vitzli has quit IRC (Quit: Leaving) |
19:17
🔗
|
SketchCow |
Dude, nothing ever multi-petabyte huge |
19:18
🔗
|
SketchCow |
If something multi-petabyte huge, I'm in a meeting you're not invited to to be told not to do it |
19:18
🔗
|
SketchCow |
That's why we don't have scientific datasets |
19:18
🔗
|
SketchCow |
And why I had to turn down some satellite imagery |
19:18
🔗
|
SketchCow |
They were all "10tb a day" and I was all "whateverrrrrrr" |
19:19
🔗
|
JesseW |
ok, so multi-*terabyte* huge, then. :-) |
19:20
🔗
|
* |
JesseW is perfectly happy not to be invited to such meetings |
19:23
🔗
|
SketchCow |
Yeah, no happiness in those |
19:23
🔗
|
SketchCow |
Multi-terabyte is more like it, yes |
19:23
🔗
|
SketchCow |
I have agency but not THAT much |
19:24
🔗
|
SketchCow |
And archive team beyond me is quite a load on the sets |
19:26
🔗
|
JesseW |
I do wonder what the software heritage people will eventually come up with. It's ... not very visible ... yet. |
19:26
🔗
|
yipdw |
garbage fires |
19:26
🔗
|
yipdw |
sorry I've been porting shit to Rails 5 and am annoyed |
19:27
🔗
|
SketchCow |
Now now |
19:27
🔗
|
SketchCow |
I'm to write them an endorsement |
19:27
🔗
|
yipdw |
oh the heritage people will do fine I'm sure |
19:27
🔗
|
yipdw |
it's the software itself |
19:36
🔗
|
|
bzc6p has left |
19:40
🔗
|
|
robink has joined #archiveteam-bs |
19:42
🔗
|
|
robink has quit IRC (Read error: Connection reset by peer) |
19:43
🔗
|
|
closure has joined #archiveteam-bs |
19:43
🔗
|
|
midas sets mode: +o closure |
20:03
🔗
|
|
JesseW has quit IRC (Ping timeout: 370 seconds) |
20:04
🔗
|
|
metalcamp has quit IRC (Read error: Connection reset by peer) |
20:18
🔗
|
fie_ |
CatButts, still nothing |
20:19
🔗
|
CatButts |
ah |
20:19
🔗
|
CatButts |
I see |
20:20
🔗
|
|
DoomTay has quit IRC (Ping timeout: 268 seconds) |
20:43
🔗
|
|
j08nY has joined #archiveteam-bs |
20:58
🔗
|
|
DoomTay has joined #archiveteam-bs |
21:35
🔗
|
|
ring has quit IRC (Ping timeout: 260 seconds) |
21:43
🔗
|
|
ring has joined #archiveteam-bs |
21:59
🔗
|
DoomTay |
You think 5qh8wqh219asr5433wy7rzgzt is taking forever? I'm surprised that 8o9ey88xpscwsvwbhudlu5dz5 is still going |
22:06
🔗
|
|
JesseW has joined #archiveteam-bs |
22:41
🔗
|
|
fie_ has quit IRC (Read error: Connection reset by peer) |
23:57
🔗
|
|
BlueMaxim has joined #archiveteam-bs |