| Time |
Nickname |
Message |
|
00:10
🔗
|
arkiver |
thomas project is tested and working. |
|
00:10
🔗
|
arkiver |
I'm off to bed now, project will be started in the morning |
|
00:10
🔗
|
arkiver |
good night! |
|
00:16
🔗
|
|
BlueMaxim has joined #archiveteam |
|
00:23
🔗
|
luckcolor |
goodnight! |
|
00:51
🔗
|
|
WinterFox has joined #archiveteam |
|
00:52
🔗
|
|
WinterFox has quit IRC (Client Quit) |
|
00:52
🔗
|
|
WinterFox has joined #archiveteam |
|
01:21
🔗
|
|
wyatt8740 has quit IRC (Ping timeout: 250 seconds) |
|
01:30
🔗
|
godane |
!ao https://www.youtube.com/watch?v=d9gGYGbjMqM --youtube-dl |
|
01:31
🔗
|
godane |
its in archivebot channel now |
|
01:33
🔗
|
|
JesseW has joined #archiveteam |
|
01:54
🔗
|
|
VADemon has joined #archiveteam |
|
02:04
🔗
|
|
xXx_ndidd has joined #archiveteam |
|
02:07
🔗
|
|
ndiddy has quit IRC (Ping timeout: 244 seconds) |
|
02:08
🔗
|
|
philpem has quit IRC (Ping timeout: 260 seconds) |
|
02:08
🔗
|
|
DoomTay has joined #archiveteam |
|
02:32
🔗
|
|
vitzli has joined #archiveteam |
|
03:36
🔗
|
xmc |
spot check of my splinder data shows some warcs that don't seem to have made it to IA |
|
03:36
🔗
|
xmc |
what should i do? |
|
03:36
🔗
|
xmc |
i know it's been a long time ... |
|
03:36
🔗
|
xmc |
probably a few gigs only, but they're all mixed in with warcs that are in wayback |
|
03:39
🔗
|
|
MMovie has quit IRC (Read error: Connection reset by peer) |
|
03:47
🔗
|
|
xXx_ndidd is now known as ndiddy |
|
04:02
🔗
|
JesseW |
xmc: you probably know this already, but ... write something to compare them against https://archive.org/details/archiveteam-splinder then upload the non-matching stuff, and send an email to info@ listing what you uploaded. |
|
04:03
🔗
|
JesseW |
This may be relevant, also: https://archive.org/details/splinder-alternatives |
|
04:03
🔗
|
xmc |
yea |
|
04:03
🔗
|
xmc |
i'm working up a python script to do spot-checks of warcs |
|
04:05
🔗
|
|
FalconK has quit IRC (Remote host closed the connection) |
|
04:06
🔗
|
JesseW |
won't you need to do a comprehensive check, rather than spot checks, eventually? |
|
04:09
🔗
|
xmc |
statistical sampling is too much work i'll probably just check them all anyway |
|
04:11
🔗
|
DFJustin |
I've got some of that kicking around too |
|
04:22
🔗
|
DoomTay |
Md5 hash comparisons? |
|
04:50
🔗
|
|
Sk1d has quit IRC (Ping timeout: 194 seconds) |
|
04:56
🔗
|
|
Sk1d has joined #archiveteam |
|
04:56
🔗
|
|
Sk1d has quit IRC (Connection closed) |
|
04:58
🔗
|
|
Sk1d has joined #archiveteam |
|
05:01
🔗
|
xmc |
there, i have a functional warc checker :) |
|
05:05
🔗
|
xmc |
gosh python is easy |
|
05:05
🔗
|
JesseW |
btw, arto is still the ArchiveTeam's Choice project on the warrior -- it should probably get moved over to Urlteam or Google Code. |
|
05:06
🔗
|
JesseW |
unless we're going to grab more from arto... |
|
05:13
🔗
|
DoomTay |
I thought arto shut down completely by now |
|
05:15
🔗
|
|
FalconK has joined #archiveteam |
|
05:16
🔗
|
xmc |
hrmph |
|
05:16
🔗
|
xmc |
wayback availability api ignores 302s |
|
05:16
🔗
|
xmc |
a new wrinkle! |
|
05:17
🔗
|
* |
FalconK yawn |
|
05:17
🔗
|
FalconK |
hmm |
|
05:17
🔗
|
xmc |
sup FalconK |
|
05:17
🔗
|
FalconK |
do you know if it behaves correctly on 503? |
|
05:17
🔗
|
FalconK |
not much |
|
05:17
🔗
|
JesseW |
cdx access should work |
|
05:17
🔗
|
FalconK |
finally got around to updating my server in calgary for the first time in a couple months |
|
05:18
🔗
|
FalconK |
ironed out ipv6 support wrinkles now that the config syntax supports it properly in gentoo |
|
05:18
🔗
|
xmc |
JesseW: hm, ok, thx |
|
05:18
🔗
|
DoomTay |
My experience with availability API says that do too much in one sitting will yield a 503 |
|
05:18
🔗
|
DoomTay |
With the API itself |
|
05:18
🔗
|
FalconK |
I'm still not sure if anything I've uploaded to IA actually gets into the wayback machine :/ |
|
05:19
🔗
|
xmc |
bleh, i don't want to do a stupid parser |
|
05:19
🔗
|
DoomTay |
That's why all my archiviing efforts from the past few days have been through the "save" link. As direct as it gets |
|
05:19
🔗
|
Lord_Nigh |
http://ifixit.org/blog/8210/rossmann-repair-legal-threat/ <- have we archived all those videos mentioned there yet? |
|
05:19
🔗
|
xmc |
FalconK: pick a warc that should have been uploaded, zless it, pick a url, then go to web-beta.archive.org and it should list the warc filename up in the header |
|
05:19
🔗
|
JesseW |
xmc: I wrote a stupid parser already: https://bitbucket.org/jesseweinstein/sundry-python-stuff/src/07dee229358685750af20be860028f80a0485541/wayback_cdx.py?fileviewer=file-view-default |
|
05:20
🔗
|
JesseW |
feel free to use it |
|
05:20
🔗
|
FalconK |
xmc: good idea! |
|
05:20
🔗
|
Lord_Nigh |
sad that apple defeats a repair/hacking protection bill in new york, and immediately c&ds a person showing board level repairs based in ny |
|
05:20
🔗
|
FalconK |
xmc: I'll have to give it a shot once I'm done verifying this server is working right |
|
05:20
🔗
|
JesseW |
Lord_Nigh: yes |
|
05:20
🔗
|
xmc |
k |
|
05:20
🔗
|
DoomTay |
I've heard people say that it might not have been apple |
|
05:20
🔗
|
Lord_Nigh |
copyright holder of schematics? |
|
05:20
🔗
|
Lord_Nigh |
foxconn? |
|
05:20
🔗
|
DoomTay |
I also heard that it's possible that it's really because he was showing schematics, not because of the repair videos in and of themselves |
|
05:21
🔗
|
Lord_Nigh |
the showing of schematic i'd think falls under fair use |
|
05:21
🔗
|
Lord_Nigh |
he's not making them available for download i don't think |
|
05:23
🔗
|
yipdw |
FalconK: I can confirm https://wayback-beta.archive.org/web/20160320233156/http://archive.usfirst.org/aboutus/first-honors-michael-bloomberg-will.i.am-and-diana-lee-guzman-for-advancing-stem-education made it |
|
05:24
🔗
|
yipdw |
which seems to correspond to http://archive.fart.website/archivebot/viewer/job/7avn9 |
|
05:24
🔗
|
FalconK |
ok |
|
05:25
🔗
|
FalconK |
I guess it just takes a few days then |
|
05:25
🔗
|
yipdw |
maybe, that or we have overlapping captures |
|
05:25
🔗
|
FalconK |
brb testing client config |
|
05:25
🔗
|
yipdw |
it's hard to tell, there's so many inputs to wayback |
|
05:25
🔗
|
|
FalconK has quit IRC (Quit: WeeChat 1.5) |
|
05:25
🔗
|
|
FalconK has joined #archiveteam |
|
05:25
🔗
|
xmc |
yes, fos usually batches daily or so, whenever it hits a size limit |
|
05:25
🔗
|
xmc |
and then |
|
05:25
🔗
|
xmc |
¯\_(ツ)_/¯ |
|
05:25
🔗
|
FalconK |
back I am. |
|
05:26
🔗
|
FalconK |
xmc: I'm uploading them not through fos |
|
05:26
🔗
|
xmc |
ah, well then |
|
05:26
🔗
|
xmc |
right |
|
05:26
🔗
|
FalconK |
so that I can still get things done even when people hammer fos for no reason with their incorrect configs |
|
05:26
🔗
|
FalconK |
200 parallel rsyncs! |
|
05:26
🔗
|
FalconK |
</3 |
|
05:26
🔗
|
FalconK |
and then all our pipelines filled up |
|
05:26
🔗
|
FalconK |
single points of failure :( |
|
05:27
🔗
|
FalconK |
oh, we do have a slight problem though |
|
05:27
🔗
|
|
dashcloud has quit IRC (Read error: Operation timed out) |
|
05:27
🔗
|
FalconK |
occasionally there are transfer errors and the archive checksum fails, probably due to the extremely large file size |
|
05:27
🔗
|
xmc |
FalconK: can you point to one of your items again? i'm curious |
|
05:28
🔗
|
FalconK |
xmc: sure hang on |
|
05:28
🔗
|
yipdw |
well |
|
05:28
🔗
|
yipdw |
something else that's interesting is that http://archive.fart.website/archivebot/viewer/items/ doesn't list any falconk items beyond, uh, 3/2016 |
|
05:28
🔗
|
yipdw |
perhaps something changed |
|
05:30
🔗
|
FalconK |
here's one |
|
05:30
🔗
|
FalconK |
https://archive.org/details/archiveteam_archivebot_go_falconk_convos_by_20160307 |
|
05:30
🔗
|
yipdw |
hm https://web-beta.archive.org/#/explore/https://convos.by/ |
|
05:31
🔗
|
|
dashcloud has joined #archiveteam |
|
05:32
🔗
|
xmc |
ok i guess i won't be finishing this warc checker tonight |
|
05:32
🔗
|
FalconK |
so there are three captures on that day |
|
05:32
🔗
|
FalconK |
it may have been a poor example |
|
05:33
🔗
|
FalconK |
https://archive.org/details/archiveteam_archivebot_go_falconk_github_com_20160309 |
|
05:34
🔗
|
yipdw |
https://wayback-beta.archive.org/web/20160309021736/https://github.com/fail0verflow/ps4-linux/archive/ps4.zip seems to work |
|
05:34
🔗
|
xmc |
my checker so far if you're interested https://github.com/ArchiveTeam/warc-checker |
|
05:34
🔗
|
xmc |
DFJustin: ^ |
|
05:35
🔗
|
FalconK |
how did you get that URL out of it? the warc cdx is awkward to use. |
|
05:35
🔗
|
yipdw |
https://ia800207.us.archive.org/9/items/archiveteam_archivebot_go_20160311120001/github.com-shallow-20160309-030433-duz6a.json |
|
05:35
🔗
|
yipdw |
:P |
|
05:35
🔗
|
yipdw |
i'm cheating heh |
|
05:36
🔗
|
yipdw |
er |
|
05:36
🔗
|
yipdw |
wait a second |
|
05:36
🔗
|
yipdw |
that is the wrong pack |
|
05:36
🔗
|
yipdw |
shit |
|
05:36
🔗
|
FalconK |
aah |
|
05:36
🔗
|
FalconK |
https://github.com/fail0verflow/ps4-linux/archive/ps4.zip is in that pack though |
|
05:36
🔗
|
yipdw |
well it may not be the wrong pack per se |
|
05:37
🔗
|
yipdw |
there's nothing to force JSON and WARC files to be uploaded in the same pack |
|
05:37
🔗
|
yipdw |
in fact, none of your packs will have the JSON files -- those are still uploaded via fos in the RsyncUpload step |
|
05:37
🔗
|
yipdw |
so just because I'm using the JSON from a go pack doesn't mean that your WARCs aren't being used |
|
05:38
🔗
|
yipdw |
it looks like your uploads in the archivebot collection are being used, though |
|
05:38
🔗
|
FalconK |
mm |
|
05:39
🔗
|
FalconK |
it doesn't show pictures of what is in my packs but |
|
05:39
🔗
|
yipdw |
there was a screenshotter error earlier on, I think that's been resolved |
|
05:39
🔗
|
FalconK |
oh |
|
05:39
🔗
|
FalconK |
ok, that makes sense |
|
05:39
🔗
|
yipdw |
maybe I'm just missing it, but I don't see the WARC file as an HTTP header or whatnot |
|
05:39
🔗
|
yipdw |
it'd be neat to have that information if it's convenient |
|
05:39
🔗
|
FalconK |
and yeah, usually the json file and the last warc are uploaded to fox |
|
05:39
🔗
|
FalconK |
to fos |
|
05:40
🔗
|
FalconK |
since they are not uploaded by the uploader |
|
05:40
🔗
|
yipdw |
the last warc should make it via the uploader |
|
05:40
🔗
|
yipdw |
sometimes it doesn't if the process stalls |
|
05:40
🔗
|
FalconK |
I suggest that they should be, for better encapsulation, though doing so might make it take a moment for the job to show that it has cleared |
|
05:41
🔗
|
FalconK |
I definitely saw it uploading something large to fos synchronously in the pipeline |
|
05:41
🔗
|
yipdw |
yeah |
|
05:41
🔗
|
FalconK |
but it's been a while since I looked honestly |
|
05:41
🔗
|
yipdw |
that can happen if wpull is killed via SIGKILL or whatever |
|
05:41
🔗
|
yipdw |
and the WARC remains in the data directory |
|
05:41
🔗
|
FalconK |
the job json file is very small and must go to fox |
|
05:41
🔗
|
FalconK |
oh |
|
05:41
🔗
|
yipdw |
the RsyncUpload step just uploads everything at that point |
|
05:41
🔗
|
FalconK |
yeah, ok, that makes sense |
|
05:41
🔗
|
FalconK |
oh, that's why it is uploading the logs and such |
|
05:41
🔗
|
yipdw |
however on normal termination wpull will move the WARC to the given directory and the uploader kicks in |
|
05:42
🔗
|
FalconK |
perhaps it ought to be patched to move everything but the job completion json file to the directory beforehand |
|
05:42
🔗
|
FalconK |
there are two json files, right? |
|
05:42
🔗
|
yipdw |
one |
|
05:42
🔗
|
FalconK |
oh, hmm. |
|
05:42
🔗
|
yipdw |
and yes, that RsyncUpload step is an artifact |
|
05:42
🔗
|
yipdw |
I guess we can #-bs this |
|
05:42
🔗
|
FalconK |
kk |
|
06:21
🔗
|
|
tomwsmf-a has quit IRC (Ping timeout: 258 seconds) |
|
06:21
🔗
|
|
tomwsmf-a has joined #archiveteam |
|
07:08
🔗
|
|
DoomTay has quit IRC (Quit: Page closed) |
|
07:35
🔗
|
|
JesseW has quit IRC (Read error: Operation timed out) |
|
07:50
🔗
|
|
tomwsmf-a has quit IRC (Read error: Operation timed out) |
|
07:58
🔗
|
|
metalcamp has joined #archiveteam |
|
08:03
🔗
|
|
metal_cam has joined #archiveteam |
|
08:05
🔗
|
|
metalcamp has quit IRC (Ping timeout: 244 seconds) |
|
08:10
🔗
|
|
metal_cam has quit IRC (Ping timeout: 244 seconds) |
|
08:11
🔗
|
|
metalcamp has joined #archiveteam |
|
08:13
🔗
|
|
metal_cam has joined #archiveteam |
|
08:16
🔗
|
|
metalcamp has quit IRC (Ping timeout: 244 seconds) |
|
08:25
🔗
|
|
Tomcat_ has joined #archiveteam |
|
08:29
🔗
|
|
metal_cam is now known as metalcamp |
|
08:33
🔗
|
|
philpem has joined #archiveteam |
|
08:45
🔗
|
|
Stilett0 has quit IRC (Read error: Connection reset by peer) |
|
08:45
🔗
|
|
Stiletto has joined #archiveteam |
|
09:42
🔗
|
|
BartoCH has quit IRC (Read error: Connection reset by peer) |
|
09:51
🔗
|
|
BartoCH has joined #archiveteam |
|
10:09
🔗
|
|
Tomcat_ has quit IRC (Remote host closed the connection) |
|
10:43
🔗
|
|
metalcamp has quit IRC (Ping timeout: 244 seconds) |
|
10:45
🔗
|
|
metalcamp has joined #archiveteam |
|
10:52
🔗
|
|
ris has joined #archiveteam |
|
11:01
🔗
|
|
jmad980 has quit IRC (Ping timeout: 246 seconds) |
|
11:16
🔗
|
|
jmad980 has joined #archiveteam |
|
11:44
🔗
|
|
dashcloud has quit IRC (Read error: Operation timed out) |
|
11:47
🔗
|
|
dashcloud has joined #archiveteam |
|
11:58
🔗
|
|
z00nx has quit IRC (Ping timeout: 244 seconds) |
|
12:11
🔗
|
|
Fake-Name has quit IRC (Ping timeout: 260 seconds) |
|
12:28
🔗
|
|
signius has quit IRC (Ping timeout: 260 seconds) |
|
12:30
🔗
|
|
WinterFox has quit IRC (Read error: Operation timed out) |
|
12:33
🔗
|
|
Tomcat_ has joined #archiveteam |
|
12:34
🔗
|
|
signius has joined #archiveteam |
|
12:37
🔗
|
|
z00nx has joined #archiveteam |
|
12:41
🔗
|
|
z00nx has quit IRC (Ping timeout: 244 seconds) |
|
13:09
🔗
|
|
dashcloud has quit IRC (Read error: Operation timed out) |
|
13:13
🔗
|
|
dashcloud has joined #archiveteam |
|
13:15
🔗
|
|
BlueMaxim has quit IRC (Quit: Leaving) |
|
13:20
🔗
|
|
ndiddy has quit IRC (Read error: Connection reset by peer) |
|
13:25
🔗
|
|
ris has quit IRC () |
|
13:25
🔗
|
|
z00nx has joined #archiveteam |
|
13:30
🔗
|
|
z00nx has quit IRC (Ping timeout: 244 seconds) |
|
13:41
🔗
|
|
kristian_ has joined #archiveteam |
|
13:49
🔗
|
|
z00nx has joined #archiveteam |
|
13:54
🔗
|
|
VADemon has quit IRC (Read error: Connection reset by peer) |
|
13:59
🔗
|
|
z00nx has quit IRC (Ping timeout: 244 seconds) |
|
14:16
🔗
|
|
dashcloud has quit IRC (Read error: Operation timed out) |
|
14:19
🔗
|
|
dashcloud has joined #archiveteam |
|
14:32
🔗
|
|
atrocity has quit IRC (Ping timeout: 272 seconds) |
|
14:39
🔗
|
|
metalcamp has quit IRC (Ping timeout: 244 seconds) |
|
14:41
🔗
|
|
metalcamp has joined #archiveteam |
|
14:50
🔗
|
|
kristian_ has quit IRC (Leaving) |
|
15:44
🔗
|
|
z00nx has joined #archiveteam |
|
15:44
🔗
|
|
arkiver2 has joined #archiveteam |
|
15:44
🔗
|
|
swebb sets mode: +o arkiver2 |
|
15:49
🔗
|
|
z00nx has quit IRC (Ping timeout: 244 seconds) |
|
16:12
🔗
|
|
dashcloud has quit IRC (Read error: Operation timed out) |
|
16:15
🔗
|
|
dashcloud has joined #archiveteam |
|
16:23
🔗
|
|
DoomTay has joined #archiveteam |
|
16:26
🔗
|
|
JesseW has joined #archiveteam |
|
17:01
🔗
|
|
jmad980 has quit IRC (Ping timeout: 246 seconds) |
|
17:11
🔗
|
|
jmad980 has joined #archiveteam |
|
17:20
🔗
|
|
dashcloud has quit IRC (Read error: Operation timed out) |
|
17:24
🔗
|
|
dashcloud has joined #archiveteam |
|
17:31
🔗
|
|
arkiver2 has quit IRC (Ping timeout: 244 seconds) |
|
18:11
🔗
|
|
gibigiana has quit IRC (Ping timeout: 499 seconds) |
|
18:13
🔗
|
|
Fake-Name has joined #archiveteam |
|
18:16
🔗
|
|
gibigiana has joined #archiveteam |
|
18:45
🔗
|
|
nertzy has joined #archiveteam |
|
19:06
🔗
|
|
tomwsmf-a has joined #archiveteam |
|
19:17
🔗
|
|
vitzli has quit IRC (Quit: Leaving) |
|
19:29
🔗
|
|
kutas has joined #archiveteam |
|
19:30
🔗
|
|
kutas has quit IRC (Client Quit) |
|
19:34
🔗
|
|
Tomcat_ has quit IRC (Remote host closed the connection) |
|
19:40
🔗
|
|
robink has joined #archiveteam |
|
19:42
🔗
|
|
robink has quit IRC (Read error: Connection reset by peer) |
|
19:43
🔗
|
|
closure has joined #archiveteam |
|
20:03
🔗
|
|
JesseW has quit IRC (Ping timeout: 370 seconds) |
|
20:04
🔗
|
|
metalcamp has quit IRC (Read error: Connection reset by peer) |
|
20:16
🔗
|
arkiver |
thomas project is running |
|
20:16
🔗
|
arkiver |
what are the three scout repo's about? |
|
20:17
🔗
|
HCross |
joepie91 is doing something or other |
|
20:18
🔗
|
arkiver |
oh nice |
|
20:20
🔗
|
|
DoomTay has quit IRC (Ping timeout: 268 seconds) |
|
20:20
🔗
|
joepie91 |
yep, mine |
|
20:20
🔗
|
joepie91 |
:P |
|
20:23
🔗
|
Frogging |
arkiver: need more hosts on there? |
|
20:23
🔗
|
arkiver |
on thomas? |
|
20:23
🔗
|
Frogging |
yeah |
|
20:23
🔗
|
arkiver |
nah, it can't handle anymore unfortunately |
|
20:23
🔗
|
Frogging |
kk |
|
20:34
🔗
|
joepie91 |
(thomas project?) |
|
20:37
🔗
|
arkiver |
joepie91: http://thomas.loc.gov/home/thomas.php |
|
20:43
🔗
|
|
j08nY has joined #archiveteam |
|
20:49
🔗
|
joepie91 |
mmm |
|
20:58
🔗
|
|
DoomTay has joined #archiveteam |
|
21:17
🔗
|
arkiver |
chfoo: can you please create a target on FOS for dnshistory? |
|
21:19
🔗
|
chfoo |
arkiver, ok done |
|
21:19
🔗
|
arkiver |
thanks! |
|
21:35
🔗
|
|
ring has quit IRC (Ping timeout: 260 seconds) |
|
21:43
🔗
|
|
ring has joined #archiveteam |
|
22:06
🔗
|
|
JesseW has joined #archiveteam |
|
22:41
🔗
|
|
fie_ has quit IRC (Read error: Connection reset by peer) |
|
23:08
🔗
|
|
Ymgve has quit IRC (Ping timeout: 506 seconds) |
|
23:12
🔗
|
|
Ymgve has joined #archiveteam |
|
23:57
🔗
|
|
BlueMaxim has joined #archiveteam |