Time |
Nickname |
Message |
00:10
🔗
|
arkiver |
thomas project is tested and working. |
00:10
🔗
|
arkiver |
I'm off to bed now, project will be started in the morning |
00:10
🔗
|
arkiver |
good night! |
00:16
🔗
|
|
BlueMaxim has joined #archiveteam |
00:23
🔗
|
luckcolor |
goodnight! |
00:51
🔗
|
|
WinterFox has joined #archiveteam |
00:52
🔗
|
|
WinterFox has quit IRC (Client Quit) |
00:52
🔗
|
|
WinterFox has joined #archiveteam |
01:21
🔗
|
|
wyatt8740 has quit IRC (Ping timeout: 250 seconds) |
01:30
🔗
|
godane |
!ao https://www.youtube.com/watch?v=d9gGYGbjMqM --youtube-dl |
01:31
🔗
|
godane |
its in archivebot channel now |
01:33
🔗
|
|
JesseW has joined #archiveteam |
01:54
🔗
|
|
VADemon has joined #archiveteam |
02:04
🔗
|
|
xXx_ndidd has joined #archiveteam |
02:07
🔗
|
|
ndiddy has quit IRC (Ping timeout: 244 seconds) |
02:08
🔗
|
|
philpem has quit IRC (Ping timeout: 260 seconds) |
02:08
🔗
|
|
DoomTay has joined #archiveteam |
02:32
🔗
|
|
vitzli has joined #archiveteam |
03:36
🔗
|
xmc |
spot check of my splinder data shows some warcs that don't seem to have made it to IA |
03:36
🔗
|
xmc |
what should i do? |
03:36
🔗
|
xmc |
i know it's been a long time ... |
03:36
🔗
|
xmc |
probably a few gigs only, but they're all mixed in with warcs that are in wayback |
03:39
🔗
|
|
MMovie has quit IRC (Read error: Connection reset by peer) |
03:47
🔗
|
|
xXx_ndidd is now known as ndiddy |
04:02
🔗
|
JesseW |
xmc: you probably know this already, but ... write something to compare them against https://archive.org/details/archiveteam-splinder then upload the non-matching stuff, and send an email to info@ listing what you uploaded. |
04:03
🔗
|
JesseW |
This may be relevant, also: https://archive.org/details/splinder-alternatives |
04:03
🔗
|
xmc |
yea |
04:03
🔗
|
xmc |
i'm working up a python script to do spot-checks of warcs |
04:05
🔗
|
|
FalconK has quit IRC (Remote host closed the connection) |
04:06
🔗
|
JesseW |
won't you need to do a comprehensive check, rather than spot checks, eventually? |
04:09
🔗
|
xmc |
statistical sampling is too much work i'll probably just check them all anyway |
04:11
🔗
|
DFJustin |
I've got some of that kicking around too |
04:22
🔗
|
DoomTay |
Md5 hash comparisons? |
04:50
🔗
|
|
Sk1d has quit IRC (Ping timeout: 194 seconds) |
04:56
🔗
|
|
Sk1d has joined #archiveteam |
04:56
🔗
|
|
Sk1d has quit IRC (Connection closed) |
04:58
🔗
|
|
Sk1d has joined #archiveteam |
05:01
🔗
|
xmc |
there, i have a functional warc checker :) |
05:05
🔗
|
xmc |
gosh python is easy |
05:05
🔗
|
JesseW |
btw, arto is still the ArchiveTeam's Choice project on the warrior -- it should probably get moved over to Urlteam or Google Code. |
05:06
🔗
|
JesseW |
unless we're going to grab more from arto... |
05:13
🔗
|
DoomTay |
I thought arto shut down completely by now |
05:15
🔗
|
|
FalconK has joined #archiveteam |
05:16
🔗
|
xmc |
hrmph |
05:16
🔗
|
xmc |
wayback availability api ignores 302s |
05:16
🔗
|
xmc |
a new wrinkle! |
05:17
🔗
|
* |
FalconK yawn |
05:17
🔗
|
FalconK |
hmm |
05:17
🔗
|
xmc |
sup FalconK |
05:17
🔗
|
FalconK |
do you know if it behaves correctly on 503? |
05:17
🔗
|
FalconK |
not much |
05:17
🔗
|
JesseW |
cdx access should work |
05:17
🔗
|
FalconK |
finally got around to updating my server in calgary for the first time in a couple months |
05:18
🔗
|
FalconK |
ironed out ipv6 support wrinkles now that the config syntax supports it properly in gentoo |
05:18
🔗
|
xmc |
JesseW: hm, ok, thx |
05:18
🔗
|
DoomTay |
My experience with availability API says that do too much in one sitting will yield a 503 |
05:18
🔗
|
DoomTay |
With the API itself |
05:18
🔗
|
FalconK |
I'm still not sure if anything I've uploaded to IA actually gets into the wayback machine :/ |
05:19
🔗
|
xmc |
bleh, i don't want to do a stupid parser |
05:19
🔗
|
DoomTay |
That's why all my archiviing efforts from the past few days have been through the "save" link. As direct as it gets |
05:19
🔗
|
Lord_Nigh |
http://ifixit.org/blog/8210/rossmann-repair-legal-threat/ <- have we archived all those videos mentioned there yet? |
05:19
🔗
|
xmc |
FalconK: pick a warc that should have been uploaded, zless it, pick a url, then go to web-beta.archive.org and it should list the warc filename up in the header |
05:19
🔗
|
JesseW |
xmc: I wrote a stupid parser already: https://bitbucket.org/jesseweinstein/sundry-python-stuff/src/07dee229358685750af20be860028f80a0485541/wayback_cdx.py?fileviewer=file-view-default |
05:20
🔗
|
JesseW |
feel free to use it |
05:20
🔗
|
FalconK |
xmc: good idea! |
05:20
🔗
|
Lord_Nigh |
sad that apple defeats a repair/hacking protection bill in new york, and immediately c&ds a person showing board level repairs based in ny |
05:20
🔗
|
FalconK |
xmc: I'll have to give it a shot once I'm done verifying this server is working right |
05:20
🔗
|
JesseW |
Lord_Nigh: yes |
05:20
🔗
|
xmc |
k |
05:20
🔗
|
DoomTay |
I've heard people say that it might not have been apple |
05:20
🔗
|
Lord_Nigh |
copyright holder of schematics? |
05:20
🔗
|
Lord_Nigh |
foxconn? |
05:20
🔗
|
DoomTay |
I also heard that it's possible that it's really because he was showing schematics, not because of the repair videos in and of themselves |
05:21
🔗
|
Lord_Nigh |
the showing of schematic i'd think falls under fair use |
05:21
🔗
|
Lord_Nigh |
he's not making them available for download i don't think |
05:23
🔗
|
yipdw |
FalconK: I can confirm https://wayback-beta.archive.org/web/20160320233156/http://archive.usfirst.org/aboutus/first-honors-michael-bloomberg-will.i.am-and-diana-lee-guzman-for-advancing-stem-education made it |
05:24
🔗
|
yipdw |
which seems to correspond to http://archive.fart.website/archivebot/viewer/job/7avn9 |
05:24
🔗
|
FalconK |
ok |
05:25
🔗
|
FalconK |
I guess it just takes a few days then |
05:25
🔗
|
yipdw |
maybe, that or we have overlapping captures |
05:25
🔗
|
FalconK |
brb testing client config |
05:25
🔗
|
yipdw |
it's hard to tell, there's so many inputs to wayback |
05:25
🔗
|
|
FalconK has quit IRC (Quit: WeeChat 1.5) |
05:25
🔗
|
|
FalconK has joined #archiveteam |
05:25
🔗
|
xmc |
yes, fos usually batches daily or so, whenever it hits a size limit |
05:25
🔗
|
xmc |
and then |
05:25
🔗
|
xmc |
¯\_(ツ)_/¯ |
05:25
🔗
|
FalconK |
back I am. |
05:26
🔗
|
FalconK |
xmc: I'm uploading them not through fos |
05:26
🔗
|
xmc |
ah, well then |
05:26
🔗
|
xmc |
right |
05:26
🔗
|
FalconK |
so that I can still get things done even when people hammer fos for no reason with their incorrect configs |
05:26
🔗
|
FalconK |
200 parallel rsyncs! |
05:26
🔗
|
FalconK |
</3 |
05:26
🔗
|
FalconK |
and then all our pipelines filled up |
05:26
🔗
|
FalconK |
single points of failure :( |
05:27
🔗
|
FalconK |
oh, we do have a slight problem though |
05:27
🔗
|
|
dashcloud has quit IRC (Read error: Operation timed out) |
05:27
🔗
|
FalconK |
occasionally there are transfer errors and the archive checksum fails, probably due to the extremely large file size |
05:27
🔗
|
xmc |
FalconK: can you point to one of your items again? i'm curious |
05:28
🔗
|
FalconK |
xmc: sure hang on |
05:28
🔗
|
yipdw |
well |
05:28
🔗
|
yipdw |
something else that's interesting is that http://archive.fart.website/archivebot/viewer/items/ doesn't list any falconk items beyond, uh, 3/2016 |
05:28
🔗
|
yipdw |
perhaps something changed |
05:30
🔗
|
FalconK |
here's one |
05:30
🔗
|
FalconK |
https://archive.org/details/archiveteam_archivebot_go_falconk_convos_by_20160307 |
05:30
🔗
|
yipdw |
hm https://web-beta.archive.org/#/explore/https://convos.by/ |
05:31
🔗
|
|
dashcloud has joined #archiveteam |
05:32
🔗
|
xmc |
ok i guess i won't be finishing this warc checker tonight |
05:32
🔗
|
FalconK |
so there are three captures on that day |
05:32
🔗
|
FalconK |
it may have been a poor example |
05:33
🔗
|
FalconK |
https://archive.org/details/archiveteam_archivebot_go_falconk_github_com_20160309 |
05:34
🔗
|
yipdw |
https://wayback-beta.archive.org/web/20160309021736/https://github.com/fail0verflow/ps4-linux/archive/ps4.zip seems to work |
05:34
🔗
|
xmc |
my checker so far if you're interested https://github.com/ArchiveTeam/warc-checker |
05:34
🔗
|
xmc |
DFJustin: ^ |
05:35
🔗
|
FalconK |
how did you get that URL out of it? the warc cdx is awkward to use. |
05:35
🔗
|
yipdw |
https://ia800207.us.archive.org/9/items/archiveteam_archivebot_go_20160311120001/github.com-shallow-20160309-030433-duz6a.json |
05:35
🔗
|
yipdw |
:P |
05:35
🔗
|
yipdw |
i'm cheating heh |
05:36
🔗
|
yipdw |
er |
05:36
🔗
|
yipdw |
wait a second |
05:36
🔗
|
yipdw |
that is the wrong pack |
05:36
🔗
|
yipdw |
shit |
05:36
🔗
|
FalconK |
aah |
05:36
🔗
|
FalconK |
https://github.com/fail0verflow/ps4-linux/archive/ps4.zip is in that pack though |
05:36
🔗
|
yipdw |
well it may not be the wrong pack per se |
05:37
🔗
|
yipdw |
there's nothing to force JSON and WARC files to be uploaded in the same pack |
05:37
🔗
|
yipdw |
in fact, none of your packs will have the JSON files -- those are still uploaded via fos in the RsyncUpload step |
05:37
🔗
|
yipdw |
so just because I'm using the JSON from a go pack doesn't mean that your WARCs aren't being used |
05:38
🔗
|
yipdw |
it looks like your uploads in the archivebot collection are being used, though |
05:38
🔗
|
FalconK |
mm |
05:39
🔗
|
FalconK |
it doesn't show pictures of what is in my packs but |
05:39
🔗
|
yipdw |
there was a screenshotter error earlier on, I think that's been resolved |
05:39
🔗
|
FalconK |
oh |
05:39
🔗
|
FalconK |
ok, that makes sense |
05:39
🔗
|
yipdw |
maybe I'm just missing it, but I don't see the WARC file as an HTTP header or whatnot |
05:39
🔗
|
yipdw |
it'd be neat to have that information if it's convenient |
05:39
🔗
|
FalconK |
and yeah, usually the json file and the last warc are uploaded to fox |
05:39
🔗
|
FalconK |
to fos |
05:40
🔗
|
FalconK |
since they are not uploaded by the uploader |
05:40
🔗
|
yipdw |
the last warc should make it via the uploader |
05:40
🔗
|
yipdw |
sometimes it doesn't if the process stalls |
05:40
🔗
|
FalconK |
I suggest that they should be, for better encapsulation, though doing so might make it take a moment for the job to show that it has cleared |
05:41
🔗
|
FalconK |
I definitely saw it uploading something large to fos synchronously in the pipeline |
05:41
🔗
|
yipdw |
yeah |
05:41
🔗
|
FalconK |
but it's been a while since I looked honestly |
05:41
🔗
|
yipdw |
that can happen if wpull is killed via SIGKILL or whatever |
05:41
🔗
|
yipdw |
and the WARC remains in the data directory |
05:41
🔗
|
FalconK |
the job json file is very small and must go to fox |
05:41
🔗
|
FalconK |
oh |
05:41
🔗
|
yipdw |
the RsyncUpload step just uploads everything at that point |
05:41
🔗
|
FalconK |
yeah, ok, that makes sense |
05:41
🔗
|
FalconK |
oh, that's why it is uploading the logs and such |
05:41
🔗
|
yipdw |
however on normal termination wpull will move the WARC to the given directory and the uploader kicks in |
05:42
🔗
|
FalconK |
perhaps it ought to be patched to move everything but the job completion json file to the directory beforehand |
05:42
🔗
|
FalconK |
there are two json files, right? |
05:42
🔗
|
yipdw |
one |
05:42
🔗
|
FalconK |
oh, hmm. |
05:42
🔗
|
yipdw |
and yes, that RsyncUpload step is an artifact |
05:42
🔗
|
yipdw |
I guess we can #-bs this |
05:42
🔗
|
FalconK |
kk |
06:21
🔗
|
|
tomwsmf-a has quit IRC (Ping timeout: 258 seconds) |
06:21
🔗
|
|
tomwsmf-a has joined #archiveteam |
07:08
🔗
|
|
DoomTay has quit IRC (Quit: Page closed) |
07:35
🔗
|
|
JesseW has quit IRC (Read error: Operation timed out) |
07:50
🔗
|
|
tomwsmf-a has quit IRC (Read error: Operation timed out) |
07:58
🔗
|
|
metalcamp has joined #archiveteam |
08:03
🔗
|
|
metal_cam has joined #archiveteam |
08:05
🔗
|
|
metalcamp has quit IRC (Ping timeout: 244 seconds) |
08:10
🔗
|
|
metal_cam has quit IRC (Ping timeout: 244 seconds) |
08:11
🔗
|
|
metalcamp has joined #archiveteam |
08:13
🔗
|
|
metal_cam has joined #archiveteam |
08:16
🔗
|
|
metalcamp has quit IRC (Ping timeout: 244 seconds) |
08:25
🔗
|
|
Tomcat_ has joined #archiveteam |
08:29
🔗
|
|
metal_cam is now known as metalcamp |
08:33
🔗
|
|
philpem has joined #archiveteam |
08:45
🔗
|
|
Stilett0 has quit IRC (Read error: Connection reset by peer) |
08:45
🔗
|
|
Stiletto has joined #archiveteam |
09:42
🔗
|
|
BartoCH has quit IRC (Read error: Connection reset by peer) |
09:51
🔗
|
|
BartoCH has joined #archiveteam |
10:09
🔗
|
|
Tomcat_ has quit IRC (Remote host closed the connection) |
10:43
🔗
|
|
metalcamp has quit IRC (Ping timeout: 244 seconds) |
10:45
🔗
|
|
metalcamp has joined #archiveteam |
10:52
🔗
|
|
ris has joined #archiveteam |
11:01
🔗
|
|
jmad980 has quit IRC (Ping timeout: 246 seconds) |
11:16
🔗
|
|
jmad980 has joined #archiveteam |
11:44
🔗
|
|
dashcloud has quit IRC (Read error: Operation timed out) |
11:47
🔗
|
|
dashcloud has joined #archiveteam |
11:58
🔗
|
|
z00nx has quit IRC (Ping timeout: 244 seconds) |
12:11
🔗
|
|
Fake-Name has quit IRC (Ping timeout: 260 seconds) |
12:28
🔗
|
|
signius has quit IRC (Ping timeout: 260 seconds) |
12:30
🔗
|
|
WinterFox has quit IRC (Read error: Operation timed out) |
12:33
🔗
|
|
Tomcat_ has joined #archiveteam |
12:34
🔗
|
|
signius has joined #archiveteam |
12:37
🔗
|
|
z00nx has joined #archiveteam |
12:41
🔗
|
|
z00nx has quit IRC (Ping timeout: 244 seconds) |
13:09
🔗
|
|
dashcloud has quit IRC (Read error: Operation timed out) |
13:13
🔗
|
|
dashcloud has joined #archiveteam |
13:15
🔗
|
|
BlueMaxim has quit IRC (Quit: Leaving) |
13:20
🔗
|
|
ndiddy has quit IRC (Read error: Connection reset by peer) |
13:25
🔗
|
|
ris has quit IRC () |
13:25
🔗
|
|
z00nx has joined #archiveteam |
13:30
🔗
|
|
z00nx has quit IRC (Ping timeout: 244 seconds) |
13:41
🔗
|
|
kristian_ has joined #archiveteam |
13:49
🔗
|
|
z00nx has joined #archiveteam |
13:54
🔗
|
|
VADemon has quit IRC (Read error: Connection reset by peer) |
13:59
🔗
|
|
z00nx has quit IRC (Ping timeout: 244 seconds) |
14:16
🔗
|
|
dashcloud has quit IRC (Read error: Operation timed out) |
14:19
🔗
|
|
dashcloud has joined #archiveteam |
14:32
🔗
|
|
atrocity has quit IRC (Ping timeout: 272 seconds) |
14:39
🔗
|
|
metalcamp has quit IRC (Ping timeout: 244 seconds) |
14:41
🔗
|
|
metalcamp has joined #archiveteam |
14:50
🔗
|
|
kristian_ has quit IRC (Leaving) |
15:44
🔗
|
|
z00nx has joined #archiveteam |
15:44
🔗
|
|
arkiver2 has joined #archiveteam |
15:44
🔗
|
|
swebb sets mode: +o arkiver2 |
15:49
🔗
|
|
z00nx has quit IRC (Ping timeout: 244 seconds) |
16:12
🔗
|
|
dashcloud has quit IRC (Read error: Operation timed out) |
16:15
🔗
|
|
dashcloud has joined #archiveteam |
16:23
🔗
|
|
DoomTay has joined #archiveteam |
16:26
🔗
|
|
JesseW has joined #archiveteam |
17:01
🔗
|
|
jmad980 has quit IRC (Ping timeout: 246 seconds) |
17:11
🔗
|
|
jmad980 has joined #archiveteam |
17:20
🔗
|
|
dashcloud has quit IRC (Read error: Operation timed out) |
17:24
🔗
|
|
dashcloud has joined #archiveteam |
17:31
🔗
|
|
arkiver2 has quit IRC (Ping timeout: 244 seconds) |
18:11
🔗
|
|
gibigiana has quit IRC (Ping timeout: 499 seconds) |
18:13
🔗
|
|
Fake-Name has joined #archiveteam |
18:16
🔗
|
|
gibigiana has joined #archiveteam |
18:45
🔗
|
|
nertzy has joined #archiveteam |
19:06
🔗
|
|
tomwsmf-a has joined #archiveteam |
19:17
🔗
|
|
vitzli has quit IRC (Quit: Leaving) |
19:29
🔗
|
|
kutas has joined #archiveteam |
19:30
🔗
|
|
kutas has quit IRC (Client Quit) |
19:34
🔗
|
|
Tomcat_ has quit IRC (Remote host closed the connection) |
19:40
🔗
|
|
robink has joined #archiveteam |
19:42
🔗
|
|
robink has quit IRC (Read error: Connection reset by peer) |
19:43
🔗
|
|
closure has joined #archiveteam |
20:03
🔗
|
|
JesseW has quit IRC (Ping timeout: 370 seconds) |
20:04
🔗
|
|
metalcamp has quit IRC (Read error: Connection reset by peer) |
20:16
🔗
|
arkiver |
thomas project is running |
20:16
🔗
|
arkiver |
what are the three scout repo's about? |
20:17
🔗
|
HCross |
joepie91 is doing something or other |
20:18
🔗
|
arkiver |
oh nice |
20:20
🔗
|
|
DoomTay has quit IRC (Ping timeout: 268 seconds) |
20:20
🔗
|
joepie91 |
yep, mine |
20:20
🔗
|
joepie91 |
:P |
20:23
🔗
|
Frogging |
arkiver: need more hosts on there? |
20:23
🔗
|
arkiver |
on thomas? |
20:23
🔗
|
Frogging |
yeah |
20:23
🔗
|
arkiver |
nah, it can't handle anymore unfortunately |
20:23
🔗
|
Frogging |
kk |
20:34
🔗
|
joepie91 |
(thomas project?) |
20:37
🔗
|
arkiver |
joepie91: http://thomas.loc.gov/home/thomas.php |
20:43
🔗
|
|
j08nY has joined #archiveteam |
20:49
🔗
|
joepie91 |
mmm |
20:58
🔗
|
|
DoomTay has joined #archiveteam |
21:17
🔗
|
arkiver |
chfoo: can you please create a target on FOS for dnshistory? |
21:19
🔗
|
chfoo |
arkiver, ok done |
21:19
🔗
|
arkiver |
thanks! |
21:35
🔗
|
|
ring has quit IRC (Ping timeout: 260 seconds) |
21:43
🔗
|
|
ring has joined #archiveteam |
22:06
🔗
|
|
JesseW has joined #archiveteam |
22:41
🔗
|
|
fie_ has quit IRC (Read error: Connection reset by peer) |
23:08
🔗
|
|
Ymgve has quit IRC (Ping timeout: 506 seconds) |
23:12
🔗
|
|
Ymgve has joined #archiveteam |
23:57
🔗
|
|
BlueMaxim has joined #archiveteam |