#archiveteam 2016-07-03,Sun

↑back Search

Time Nickname Message
00:10 🔗 arkiver thomas project is tested and working.
00:10 🔗 arkiver I'm off to bed now, project will be started in the morning
00:10 🔗 arkiver good night!
00:16 🔗 BlueMaxim has joined #archiveteam
00:23 🔗 luckcolor goodnight!
00:51 🔗 WinterFox has joined #archiveteam
00:52 🔗 WinterFox has quit IRC (Client Quit)
00:52 🔗 WinterFox has joined #archiveteam
01:21 🔗 wyatt8740 has quit IRC (Ping timeout: 250 seconds)
01:30 🔗 godane !ao https://www.youtube.com/watch?v=d9gGYGbjMqM --youtube-dl
01:31 🔗 godane its in archivebot channel now
01:33 🔗 JesseW has joined #archiveteam
01:54 🔗 VADemon has joined #archiveteam
02:04 🔗 xXx_ndidd has joined #archiveteam
02:07 🔗 ndiddy has quit IRC (Ping timeout: 244 seconds)
02:08 🔗 philpem has quit IRC (Ping timeout: 260 seconds)
02:08 🔗 DoomTay has joined #archiveteam
02:32 🔗 vitzli has joined #archiveteam
03:36 🔗 xmc spot check of my splinder data shows some warcs that don't seem to have made it to IA
03:36 🔗 xmc what should i do?
03:36 🔗 xmc i know it's been a long time ...
03:36 🔗 xmc probably a few gigs only, but they're all mixed in with warcs that are in wayback
03:39 🔗 MMovie has quit IRC (Read error: Connection reset by peer)
03:47 🔗 xXx_ndidd is now known as ndiddy
04:02 🔗 JesseW xmc: you probably know this already, but ... write something to compare them against https://archive.org/details/archiveteam-splinder then upload the non-matching stuff, and send an email to info@ listing what you uploaded.
04:03 🔗 JesseW This may be relevant, also: https://archive.org/details/splinder-alternatives
04:03 🔗 xmc yea
04:03 🔗 xmc i'm working up a python script to do spot-checks of warcs
04:05 🔗 FalconK has quit IRC (Remote host closed the connection)
04:06 🔗 JesseW won't you need to do a comprehensive check, rather than spot checks, eventually?
04:09 🔗 xmc statistical sampling is too much work i'll probably just check them all anyway
04:11 🔗 DFJustin I've got some of that kicking around too
04:22 🔗 DoomTay Md5 hash comparisons?
04:50 🔗 Sk1d has quit IRC (Ping timeout: 194 seconds)
04:56 🔗 Sk1d has joined #archiveteam
04:56 🔗 Sk1d has quit IRC (Connection closed)
04:58 🔗 Sk1d has joined #archiveteam
05:01 🔗 xmc there, i have a functional warc checker :)
05:05 🔗 xmc gosh python is easy
05:05 🔗 JesseW btw, arto is still the ArchiveTeam's Choice project on the warrior -- it should probably get moved over to Urlteam or Google Code.
05:06 🔗 JesseW unless we're going to grab more from arto...
05:13 🔗 DoomTay I thought arto shut down completely by now
05:15 🔗 FalconK has joined #archiveteam
05:16 🔗 xmc hrmph
05:16 🔗 xmc wayback availability api ignores 302s
05:16 🔗 xmc a new wrinkle!
05:17 🔗 * FalconK yawn
05:17 🔗 FalconK hmm
05:17 🔗 xmc sup FalconK
05:17 🔗 FalconK do you know if it behaves correctly on 503?
05:17 🔗 FalconK not much
05:17 🔗 JesseW cdx access should work
05:17 🔗 FalconK finally got around to updating my server in calgary for the first time in a couple months
05:18 🔗 FalconK ironed out ipv6 support wrinkles now that the config syntax supports it properly in gentoo
05:18 🔗 xmc JesseW: hm, ok, thx
05:18 🔗 DoomTay My experience with availability API says that do too much in one sitting will yield a 503
05:18 🔗 DoomTay With the API itself
05:18 🔗 FalconK I'm still not sure if anything I've uploaded to IA actually gets into the wayback machine :/
05:19 🔗 xmc bleh, i don't want to do a stupid parser
05:19 🔗 DoomTay That's why all my archiviing efforts from the past few days have been through the "save" link. As direct as it gets
05:19 🔗 Lord_Nigh http://ifixit.org/blog/8210/rossmann-repair-legal-threat/ <- have we archived all those videos mentioned there yet?
05:19 🔗 xmc FalconK: pick a warc that should have been uploaded, zless it, pick a url, then go to web-beta.archive.org and it should list the warc filename up in the header
05:19 🔗 JesseW xmc: I wrote a stupid parser already: https://bitbucket.org/jesseweinstein/sundry-python-stuff/src/07dee229358685750af20be860028f80a0485541/wayback_cdx.py?fileviewer=file-view-default
05:20 🔗 JesseW feel free to use it
05:20 🔗 FalconK xmc: good idea!
05:20 🔗 Lord_Nigh sad that apple defeats a repair/hacking protection bill in new york, and immediately c&ds a person showing board level repairs based in ny
05:20 🔗 FalconK xmc: I'll have to give it a shot once I'm done verifying this server is working right
05:20 🔗 JesseW Lord_Nigh: yes
05:20 🔗 xmc k
05:20 🔗 DoomTay I've heard people say that it might not have been apple
05:20 🔗 Lord_Nigh copyright holder of schematics?
05:20 🔗 Lord_Nigh foxconn?
05:20 🔗 DoomTay I also heard that it's possible that it's really because he was showing schematics, not because of the repair videos in and of themselves
05:21 🔗 Lord_Nigh the showing of schematic i'd think falls under fair use
05:21 🔗 Lord_Nigh he's not making them available for download i don't think
05:23 🔗 yipdw FalconK: I can confirm https://wayback-beta.archive.org/web/20160320233156/http://archive.usfirst.org/aboutus/first-honors-michael-bloomberg-will.i.am-and-diana-lee-guzman-for-advancing-stem-education made it
05:24 🔗 yipdw which seems to correspond to http://archive.fart.website/archivebot/viewer/job/7avn9
05:24 🔗 FalconK ok
05:25 🔗 FalconK I guess it just takes a few days then
05:25 🔗 yipdw maybe, that or we have overlapping captures
05:25 🔗 FalconK brb testing client config
05:25 🔗 yipdw it's hard to tell, there's so many inputs to wayback
05:25 🔗 FalconK has quit IRC (Quit: WeeChat 1.5)
05:25 🔗 FalconK has joined #archiveteam
05:25 🔗 xmc yes, fos usually batches daily or so, whenever it hits a size limit
05:25 🔗 xmc and then
05:25 🔗 xmc ¯\_(ツ)_/¯
05:25 🔗 FalconK back I am.
05:26 🔗 FalconK xmc: I'm uploading them not through fos
05:26 🔗 xmc ah, well then
05:26 🔗 xmc right
05:26 🔗 FalconK so that I can still get things done even when people hammer fos for no reason with their incorrect configs
05:26 🔗 FalconK 200 parallel rsyncs!
05:26 🔗 FalconK </3
05:26 🔗 FalconK and then all our pipelines filled up
05:26 🔗 FalconK single points of failure :(
05:27 🔗 FalconK oh, we do have a slight problem though
05:27 🔗 dashcloud has quit IRC (Read error: Operation timed out)
05:27 🔗 FalconK occasionally there are transfer errors and the archive checksum fails, probably due to the extremely large file size
05:27 🔗 xmc FalconK: can you point to one of your items again? i'm curious
05:28 🔗 FalconK xmc: sure hang on
05:28 🔗 yipdw well
05:28 🔗 yipdw something else that's interesting is that http://archive.fart.website/archivebot/viewer/items/ doesn't list any falconk items beyond, uh, 3/2016
05:28 🔗 yipdw perhaps something changed
05:30 🔗 FalconK here's one
05:30 🔗 FalconK https://archive.org/details/archiveteam_archivebot_go_falconk_convos_by_20160307
05:30 🔗 yipdw hm https://web-beta.archive.org/#/explore/https://convos.by/
05:31 🔗 dashcloud has joined #archiveteam
05:32 🔗 xmc ok i guess i won't be finishing this warc checker tonight
05:32 🔗 FalconK so there are three captures on that day
05:32 🔗 FalconK it may have been a poor example
05:33 🔗 FalconK https://archive.org/details/archiveteam_archivebot_go_falconk_github_com_20160309
05:34 🔗 yipdw https://wayback-beta.archive.org/web/20160309021736/https://github.com/fail0verflow/ps4-linux/archive/ps4.zip seems to work
05:34 🔗 xmc my checker so far if you're interested https://github.com/ArchiveTeam/warc-checker
05:34 🔗 xmc DFJustin: ^
05:35 🔗 FalconK how did you get that URL out of it? the warc cdx is awkward to use.
05:35 🔗 yipdw https://ia800207.us.archive.org/9/items/archiveteam_archivebot_go_20160311120001/github.com-shallow-20160309-030433-duz6a.json
05:35 🔗 yipdw :P
05:35 🔗 yipdw i'm cheating heh
05:36 🔗 yipdw er
05:36 🔗 yipdw wait a second
05:36 🔗 yipdw that is the wrong pack
05:36 🔗 yipdw shit
05:36 🔗 FalconK aah
05:36 🔗 FalconK https://github.com/fail0verflow/ps4-linux/archive/ps4.zip is in that pack though
05:36 🔗 yipdw well it may not be the wrong pack per se
05:37 🔗 yipdw there's nothing to force JSON and WARC files to be uploaded in the same pack
05:37 🔗 yipdw in fact, none of your packs will have the JSON files -- those are still uploaded via fos in the RsyncUpload step
05:37 🔗 yipdw so just because I'm using the JSON from a go pack doesn't mean that your WARCs aren't being used
05:38 🔗 yipdw it looks like your uploads in the archivebot collection are being used, though
05:38 🔗 FalconK mm
05:39 🔗 FalconK it doesn't show pictures of what is in my packs but
05:39 🔗 yipdw there was a screenshotter error earlier on, I think that's been resolved
05:39 🔗 FalconK oh
05:39 🔗 FalconK ok, that makes sense
05:39 🔗 yipdw maybe I'm just missing it, but I don't see the WARC file as an HTTP header or whatnot
05:39 🔗 yipdw it'd be neat to have that information if it's convenient
05:39 🔗 FalconK and yeah, usually the json file and the last warc are uploaded to fox
05:39 🔗 FalconK to fos
05:40 🔗 FalconK since they are not uploaded by the uploader
05:40 🔗 yipdw the last warc should make it via the uploader
05:40 🔗 yipdw sometimes it doesn't if the process stalls
05:40 🔗 FalconK I suggest that they should be, for better encapsulation, though doing so might make it take a moment for the job to show that it has cleared
05:41 🔗 FalconK I definitely saw it uploading something large to fos synchronously in the pipeline
05:41 🔗 yipdw yeah
05:41 🔗 FalconK but it's been a while since I looked honestly
05:41 🔗 yipdw that can happen if wpull is killed via SIGKILL or whatever
05:41 🔗 yipdw and the WARC remains in the data directory
05:41 🔗 FalconK the job json file is very small and must go to fox
05:41 🔗 FalconK oh
05:41 🔗 yipdw the RsyncUpload step just uploads everything at that point
05:41 🔗 FalconK yeah, ok, that makes sense
05:41 🔗 FalconK oh, that's why it is uploading the logs and such
05:41 🔗 yipdw however on normal termination wpull will move the WARC to the given directory and the uploader kicks in
05:42 🔗 FalconK perhaps it ought to be patched to move everything but the job completion json file to the directory beforehand
05:42 🔗 FalconK there are two json files, right?
05:42 🔗 yipdw one
05:42 🔗 FalconK oh, hmm.
05:42 🔗 yipdw and yes, that RsyncUpload step is an artifact
05:42 🔗 yipdw I guess we can #-bs this
05:42 🔗 FalconK kk
06:21 🔗 tomwsmf-a has quit IRC (Ping timeout: 258 seconds)
06:21 🔗 tomwsmf-a has joined #archiveteam
07:08 🔗 DoomTay has quit IRC (Quit: Page closed)
07:35 🔗 JesseW has quit IRC (Read error: Operation timed out)
07:50 🔗 tomwsmf-a has quit IRC (Read error: Operation timed out)
07:58 🔗 metalcamp has joined #archiveteam
08:03 🔗 metal_cam has joined #archiveteam
08:05 🔗 metalcamp has quit IRC (Ping timeout: 244 seconds)
08:10 🔗 metal_cam has quit IRC (Ping timeout: 244 seconds)
08:11 🔗 metalcamp has joined #archiveteam
08:13 🔗 metal_cam has joined #archiveteam
08:16 🔗 metalcamp has quit IRC (Ping timeout: 244 seconds)
08:25 🔗 Tomcat_ has joined #archiveteam
08:29 🔗 metal_cam is now known as metalcamp
08:33 🔗 philpem has joined #archiveteam
08:45 🔗 Stilett0 has quit IRC (Read error: Connection reset by peer)
08:45 🔗 Stiletto has joined #archiveteam
09:42 🔗 BartoCH has quit IRC (Read error: Connection reset by peer)
09:51 🔗 BartoCH has joined #archiveteam
10:09 🔗 Tomcat_ has quit IRC (Remote host closed the connection)
10:43 🔗 metalcamp has quit IRC (Ping timeout: 244 seconds)
10:45 🔗 metalcamp has joined #archiveteam
10:52 🔗 ris has joined #archiveteam
11:01 🔗 jmad980 has quit IRC (Ping timeout: 246 seconds)
11:16 🔗 jmad980 has joined #archiveteam
11:44 🔗 dashcloud has quit IRC (Read error: Operation timed out)
11:47 🔗 dashcloud has joined #archiveteam
11:58 🔗 z00nx has quit IRC (Ping timeout: 244 seconds)
12:11 🔗 Fake-Name has quit IRC (Ping timeout: 260 seconds)
12:28 🔗 signius has quit IRC (Ping timeout: 260 seconds)
12:30 🔗 WinterFox has quit IRC (Read error: Operation timed out)
12:33 🔗 Tomcat_ has joined #archiveteam
12:34 🔗 signius has joined #archiveteam
12:37 🔗 z00nx has joined #archiveteam
12:41 🔗 z00nx has quit IRC (Ping timeout: 244 seconds)
13:09 🔗 dashcloud has quit IRC (Read error: Operation timed out)
13:13 🔗 dashcloud has joined #archiveteam
13:15 🔗 BlueMaxim has quit IRC (Quit: Leaving)
13:20 🔗 ndiddy has quit IRC (Read error: Connection reset by peer)
13:25 🔗 ris has quit IRC ()
13:25 🔗 z00nx has joined #archiveteam
13:30 🔗 z00nx has quit IRC (Ping timeout: 244 seconds)
13:41 🔗 kristian_ has joined #archiveteam
13:49 🔗 z00nx has joined #archiveteam
13:54 🔗 VADemon has quit IRC (Read error: Connection reset by peer)
13:59 🔗 z00nx has quit IRC (Ping timeout: 244 seconds)
14:16 🔗 dashcloud has quit IRC (Read error: Operation timed out)
14:19 🔗 dashcloud has joined #archiveteam
14:32 🔗 atrocity has quit IRC (Ping timeout: 272 seconds)
14:39 🔗 metalcamp has quit IRC (Ping timeout: 244 seconds)
14:41 🔗 metalcamp has joined #archiveteam
14:50 🔗 kristian_ has quit IRC (Leaving)
15:44 🔗 z00nx has joined #archiveteam
15:44 🔗 arkiver2 has joined #archiveteam
15:44 🔗 swebb sets mode: +o arkiver2
15:49 🔗 z00nx has quit IRC (Ping timeout: 244 seconds)
16:12 🔗 dashcloud has quit IRC (Read error: Operation timed out)
16:15 🔗 dashcloud has joined #archiveteam
16:23 🔗 DoomTay has joined #archiveteam
16:26 🔗 JesseW has joined #archiveteam
17:01 🔗 jmad980 has quit IRC (Ping timeout: 246 seconds)
17:11 🔗 jmad980 has joined #archiveteam
17:20 🔗 dashcloud has quit IRC (Read error: Operation timed out)
17:24 🔗 dashcloud has joined #archiveteam
17:31 🔗 arkiver2 has quit IRC (Ping timeout: 244 seconds)
18:11 🔗 gibigiana has quit IRC (Ping timeout: 499 seconds)
18:13 🔗 Fake-Name has joined #archiveteam
18:16 🔗 gibigiana has joined #archiveteam
18:45 🔗 nertzy has joined #archiveteam
19:06 🔗 tomwsmf-a has joined #archiveteam
19:17 🔗 vitzli has quit IRC (Quit: Leaving)
19:29 🔗 kutas has joined #archiveteam
19:30 🔗 kutas has quit IRC (Client Quit)
19:34 🔗 Tomcat_ has quit IRC (Remote host closed the connection)
19:40 🔗 robink has joined #archiveteam
19:42 🔗 robink has quit IRC (Read error: Connection reset by peer)
19:43 🔗 closure has joined #archiveteam
20:03 🔗 JesseW has quit IRC (Ping timeout: 370 seconds)
20:04 🔗 metalcamp has quit IRC (Read error: Connection reset by peer)
20:16 🔗 arkiver thomas project is running
20:16 🔗 arkiver what are the three scout repo's about?
20:17 🔗 HCross joepie91 is doing something or other
20:18 🔗 arkiver oh nice
20:20 🔗 DoomTay has quit IRC (Ping timeout: 268 seconds)
20:20 🔗 joepie91 yep, mine
20:20 🔗 joepie91 :P
20:23 🔗 Frogging arkiver: need more hosts on there?
20:23 🔗 arkiver on thomas?
20:23 🔗 Frogging yeah
20:23 🔗 arkiver nah, it can't handle anymore unfortunately
20:23 🔗 Frogging kk
20:34 🔗 joepie91 (thomas project?)
20:37 🔗 arkiver joepie91: http://thomas.loc.gov/home/thomas.php
20:43 🔗 j08nY has joined #archiveteam
20:49 🔗 joepie91 mmm
20:58 🔗 DoomTay has joined #archiveteam
21:17 🔗 arkiver chfoo: can you please create a target on FOS for dnshistory?
21:19 🔗 chfoo arkiver, ok done
21:19 🔗 arkiver thanks!
21:35 🔗 ring has quit IRC (Ping timeout: 260 seconds)
21:43 🔗 ring has joined #archiveteam
22:06 🔗 JesseW has joined #archiveteam
22:41 🔗 fie_ has quit IRC (Read error: Connection reset by peer)
23:08 🔗 Ymgve has quit IRC (Ping timeout: 506 seconds)
23:12 🔗 Ymgve has joined #archiveteam
23:57 🔗 BlueMaxim has joined #archiveteam

irclogger-viewer