[00:10] thomas project is tested and working. [00:10] I'm off to bed now, project will be started in the morning [00:10] good night! [00:16] *** BlueMaxim has joined #archiveteam [00:23] goodnight! [00:51] *** WinterFox has joined #archiveteam [00:52] *** WinterFox has quit IRC (Client Quit) [00:52] *** WinterFox has joined #archiveteam [01:21] *** wyatt8740 has quit IRC (Ping timeout: 250 seconds) [01:30] !ao https://www.youtube.com/watch?v=d9gGYGbjMqM --youtube-dl [01:31] its in archivebot channel now [01:33] *** JesseW has joined #archiveteam [01:54] *** VADemon has joined #archiveteam [02:04] *** xXx_ndidd has joined #archiveteam [02:07] *** ndiddy has quit IRC (Ping timeout: 244 seconds) [02:08] *** philpem has quit IRC (Ping timeout: 260 seconds) [02:08] *** DoomTay has joined #archiveteam [02:32] *** vitzli has joined #archiveteam [03:36] spot check of my splinder data shows some warcs that don't seem to have made it to IA [03:36] what should i do? [03:36] i know it's been a long time ... [03:36] probably a few gigs only, but they're all mixed in with warcs that are in wayback [03:39] *** MMovie has quit IRC (Read error: Connection reset by peer) [03:47] *** xXx_ndidd is now known as ndiddy [04:02] xmc: you probably know this already, but ... write something to compare them against https://archive.org/details/archiveteam-splinder then upload the non-matching stuff, and send an email to info@ listing what you uploaded. [04:03] This may be relevant, also: https://archive.org/details/splinder-alternatives [04:03] yea [04:03] i'm working up a python script to do spot-checks of warcs [04:05] *** FalconK has quit IRC (Remote host closed the connection) [04:06] won't you need to do a comprehensive check, rather than spot checks, eventually? [04:09] statistical sampling is too much work i'll probably just check them all anyway [04:11] I've got some of that kicking around too [04:22] Md5 hash comparisons? [04:50] *** Sk1d has quit IRC (Ping timeout: 194 seconds) [04:56] *** Sk1d has joined #archiveteam [04:56] *** Sk1d has quit IRC (Connection closed) [04:58] *** Sk1d has joined #archiveteam [05:01] there, i have a functional warc checker :) [05:05] gosh python is easy [05:05] btw, arto is still the ArchiveTeam's Choice project on the warrior -- it should probably get moved over to Urlteam or Google Code. [05:06] unless we're going to grab more from arto... [05:13] I thought arto shut down completely by now [05:15] *** FalconK has joined #archiveteam [05:16] hrmph [05:16] wayback availability api ignores 302s [05:16] a new wrinkle! [05:17] * FalconK yawn [05:17] hmm [05:17] sup FalconK [05:17] do you know if it behaves correctly on 503? [05:17] not much [05:17] cdx access should work [05:17] finally got around to updating my server in calgary for the first time in a couple months [05:18] ironed out ipv6 support wrinkles now that the config syntax supports it properly in gentoo [05:18] JesseW: hm, ok, thx [05:18] My experience with availability API says that do too much in one sitting will yield a 503 [05:18] With the API itself [05:18] I'm still not sure if anything I've uploaded to IA actually gets into the wayback machine :/ [05:19] bleh, i don't want to do a stupid parser [05:19] That's why all my archiviing efforts from the past few days have been through the "save" link. As direct as it gets [05:19] http://ifixit.org/blog/8210/rossmann-repair-legal-threat/ <- have we archived all those videos mentioned there yet? [05:19] FalconK: pick a warc that should have been uploaded, zless it, pick a url, then go to web-beta.archive.org and it should list the warc filename up in the header [05:19] xmc: I wrote a stupid parser already: https://bitbucket.org/jesseweinstein/sundry-python-stuff/src/07dee229358685750af20be860028f80a0485541/wayback_cdx.py?fileviewer=file-view-default [05:20] feel free to use it [05:20] xmc: good idea! [05:20] sad that apple defeats a repair/hacking protection bill in new york, and immediately c&ds a person showing board level repairs based in ny [05:20] xmc: I'll have to give it a shot once I'm done verifying this server is working right [05:20] Lord_Nigh: yes [05:20] k [05:20] I've heard people say that it might not have been apple [05:20] copyright holder of schematics? [05:20] foxconn? [05:20] I also heard that it's possible that it's really because he was showing schematics, not because of the repair videos in and of themselves [05:21] the showing of schematic i'd think falls under fair use [05:21] he's not making them available for download i don't think [05:23] FalconK: I can confirm https://wayback-beta.archive.org/web/20160320233156/http://archive.usfirst.org/aboutus/first-honors-michael-bloomberg-will.i.am-and-diana-lee-guzman-for-advancing-stem-education made it [05:24] which seems to correspond to http://archive.fart.website/archivebot/viewer/job/7avn9 [05:24] ok [05:25] I guess it just takes a few days then [05:25] maybe, that or we have overlapping captures [05:25] brb testing client config [05:25] it's hard to tell, there's so many inputs to wayback [05:25] *** FalconK has quit IRC (Quit: WeeChat 1.5) [05:25] *** FalconK has joined #archiveteam [05:25] yes, fos usually batches daily or so, whenever it hits a size limit [05:25] and then [05:25] ¯\_(ツ)_/¯ [05:25] back I am. [05:26] xmc: I'm uploading them not through fos [05:26] ah, well then [05:26] right [05:26] so that I can still get things done even when people hammer fos for no reason with their incorrect configs [05:26] 200 parallel rsyncs! [05:26] and then all our pipelines filled up [05:26] single points of failure :( [05:27] oh, we do have a slight problem though [05:27] *** dashcloud has quit IRC (Read error: Operation timed out) [05:27] occasionally there are transfer errors and the archive checksum fails, probably due to the extremely large file size [05:27] FalconK: can you point to one of your items again? i'm curious [05:28] xmc: sure hang on [05:28] well [05:28] something else that's interesting is that http://archive.fart.website/archivebot/viewer/items/ doesn't list any falconk items beyond, uh, 3/2016 [05:28] perhaps something changed [05:30] here's one [05:30] https://archive.org/details/archiveteam_archivebot_go_falconk_convos_by_20160307 [05:30] hm https://web-beta.archive.org/#/explore/https://convos.by/ [05:31] *** dashcloud has joined #archiveteam [05:32] ok i guess i won't be finishing this warc checker tonight [05:32] so there are three captures on that day [05:32] it may have been a poor example [05:33] https://archive.org/details/archiveteam_archivebot_go_falconk_github_com_20160309 [05:34] https://wayback-beta.archive.org/web/20160309021736/https://github.com/fail0verflow/ps4-linux/archive/ps4.zip seems to work [05:34] my checker so far if you're interested https://github.com/ArchiveTeam/warc-checker [05:34] DFJustin: ^ [05:35] how did you get that URL out of it? the warc cdx is awkward to use. [05:35] https://ia800207.us.archive.org/9/items/archiveteam_archivebot_go_20160311120001/github.com-shallow-20160309-030433-duz6a.json [05:35] :P [05:35] i'm cheating heh [05:36] er [05:36] wait a second [05:36] that is the wrong pack [05:36] shit [05:36] aah [05:36] https://github.com/fail0verflow/ps4-linux/archive/ps4.zip is in that pack though [05:36] well it may not be the wrong pack per se [05:37] there's nothing to force JSON and WARC files to be uploaded in the same pack [05:37] in fact, none of your packs will have the JSON files -- those are still uploaded via fos in the RsyncUpload step [05:37] so just because I'm using the JSON from a go pack doesn't mean that your WARCs aren't being used [05:38] it looks like your uploads in the archivebot collection are being used, though [05:38] mm [05:39] it doesn't show pictures of what is in my packs but [05:39] there was a screenshotter error earlier on, I think that's been resolved [05:39] oh [05:39] ok, that makes sense [05:39] maybe I'm just missing it, but I don't see the WARC file as an HTTP header or whatnot [05:39] it'd be neat to have that information if it's convenient [05:39] and yeah, usually the json file and the last warc are uploaded to fox [05:39] to fos [05:40] since they are not uploaded by the uploader [05:40] the last warc should make it via the uploader [05:40] sometimes it doesn't if the process stalls [05:40] I suggest that they should be, for better encapsulation, though doing so might make it take a moment for the job to show that it has cleared [05:41] I definitely saw it uploading something large to fos synchronously in the pipeline [05:41] yeah [05:41] but it's been a while since I looked honestly [05:41] that can happen if wpull is killed via SIGKILL or whatever [05:41] and the WARC remains in the data directory [05:41] the job json file is very small and must go to fox [05:41] oh [05:41] the RsyncUpload step just uploads everything at that point [05:41] yeah, ok, that makes sense [05:41] oh, that's why it is uploading the logs and such [05:41] however on normal termination wpull will move the WARC to the given directory and the uploader kicks in [05:42] perhaps it ought to be patched to move everything but the job completion json file to the directory beforehand [05:42] there are two json files, right? [05:42] one [05:42] oh, hmm. [05:42] and yes, that RsyncUpload step is an artifact [05:42] I guess we can #-bs this [05:42] kk [06:21] *** tomwsmf-a has quit IRC (Ping timeout: 258 seconds) [06:21] *** tomwsmf-a has joined #archiveteam [07:08] *** DoomTay has quit IRC (Quit: Page closed) [07:35] *** JesseW has quit IRC (Read error: Operation timed out) [07:50] *** tomwsmf-a has quit IRC (Read error: Operation timed out) [07:58] *** metalcamp has joined #archiveteam [08:03] *** metal_cam has joined #archiveteam [08:05] *** metalcamp has quit IRC (Ping timeout: 244 seconds) [08:10] *** metal_cam has quit IRC (Ping timeout: 244 seconds) [08:11] *** metalcamp has joined #archiveteam [08:13] *** metal_cam has joined #archiveteam [08:16] *** metalcamp has quit IRC (Ping timeout: 244 seconds) [08:25] *** Tomcat_ has joined #archiveteam [08:29] *** metal_cam is now known as metalcamp [08:33] *** philpem has joined #archiveteam [08:45] *** Stilett0 has quit IRC (Read error: Connection reset by peer) [08:45] *** Stiletto has joined #archiveteam [09:42] *** BartoCH has quit IRC (Read error: Connection reset by peer) [09:51] *** BartoCH has joined #archiveteam [10:09] *** Tomcat_ has quit IRC (Remote host closed the connection) [10:43] *** metalcamp has quit IRC (Ping timeout: 244 seconds) [10:45] *** metalcamp has joined #archiveteam [10:52] *** ris has joined #archiveteam [11:01] *** jmad980 has quit IRC (Ping timeout: 246 seconds) [11:16] *** jmad980 has joined #archiveteam [11:44] *** dashcloud has quit IRC (Read error: Operation timed out) [11:47] *** dashcloud has joined #archiveteam [11:58] *** z00nx has quit IRC (Ping timeout: 244 seconds) [12:11] *** Fake-Name has quit IRC (Ping timeout: 260 seconds) [12:28] *** signius has quit IRC (Ping timeout: 260 seconds) [12:30] *** WinterFox has quit IRC (Read error: Operation timed out) [12:33] *** Tomcat_ has joined #archiveteam [12:34] *** signius has joined #archiveteam [12:37] *** z00nx has joined #archiveteam [12:41] *** z00nx has quit IRC (Ping timeout: 244 seconds) [13:09] *** dashcloud has quit IRC (Read error: Operation timed out) [13:13] *** dashcloud has joined #archiveteam [13:15] *** BlueMaxim has quit IRC (Quit: Leaving) [13:20] *** ndiddy has quit IRC (Read error: Connection reset by peer) [13:25] *** ris has quit IRC () [13:25] *** z00nx has joined #archiveteam [13:30] *** z00nx has quit IRC (Ping timeout: 244 seconds) [13:41] *** kristian_ has joined #archiveteam [13:49] *** z00nx has joined #archiveteam [13:54] *** VADemon has quit IRC (Read error: Connection reset by peer) [13:59] *** z00nx has quit IRC (Ping timeout: 244 seconds) [14:16] *** dashcloud has quit IRC (Read error: Operation timed out) [14:19] *** dashcloud has joined #archiveteam [14:32] *** atrocity has quit IRC (Ping timeout: 272 seconds) [14:39] *** metalcamp has quit IRC (Ping timeout: 244 seconds) [14:41] *** metalcamp has joined #archiveteam [14:50] *** kristian_ has quit IRC (Leaving) [15:44] *** z00nx has joined #archiveteam [15:44] *** arkiver2 has joined #archiveteam [15:44] *** swebb sets mode: +o arkiver2 [15:49] *** z00nx has quit IRC (Ping timeout: 244 seconds) [16:12] *** dashcloud has quit IRC (Read error: Operation timed out) [16:15] *** dashcloud has joined #archiveteam [16:23] *** DoomTay has joined #archiveteam [16:26] *** JesseW has joined #archiveteam [17:01] *** jmad980 has quit IRC (Ping timeout: 246 seconds) [17:11] *** jmad980 has joined #archiveteam [17:20] *** dashcloud has quit IRC (Read error: Operation timed out) [17:24] *** dashcloud has joined #archiveteam [17:31] *** arkiver2 has quit IRC (Ping timeout: 244 seconds) [18:11] *** gibigiana has quit IRC (Ping timeout: 499 seconds) [18:13] *** Fake-Name has joined #archiveteam [18:16] *** gibigiana has joined #archiveteam [18:45] *** nertzy has joined #archiveteam [19:06] *** tomwsmf-a has joined #archiveteam [19:17] *** vitzli has quit IRC (Quit: Leaving) [19:29] *** kutas has joined #archiveteam [19:30] *** kutas has quit IRC (Client Quit) [19:34] *** Tomcat_ has quit IRC (Remote host closed the connection) [19:40] *** robink has joined #archiveteam [19:42] *** robink has quit IRC (Read error: Connection reset by peer) [19:43] *** closure has joined #archiveteam [20:03] *** JesseW has quit IRC (Ping timeout: 370 seconds) [20:04] *** metalcamp has quit IRC (Read error: Connection reset by peer) [20:16] thomas project is running [20:16] what are the three scout repo's about? [20:17] joepie91 is doing something or other [20:18] oh nice [20:20] *** DoomTay has quit IRC (Ping timeout: 268 seconds) [20:20] yep, mine [20:20] :P [20:23] arkiver: need more hosts on there? [20:23] on thomas? [20:23] yeah [20:23] nah, it can't handle anymore unfortunately [20:23] kk [20:34] (thomas project?) [20:37] joepie91: http://thomas.loc.gov/home/thomas.php [20:43] *** j08nY has joined #archiveteam [20:49] mmm [20:58] *** DoomTay has joined #archiveteam [21:17] chfoo: can you please create a target on FOS for dnshistory? [21:19] arkiver, ok done [21:19] thanks! [21:35] *** ring has quit IRC (Ping timeout: 260 seconds) [21:43] *** ring has joined #archiveteam [22:06] *** JesseW has joined #archiveteam [22:41] *** fie_ has quit IRC (Read error: Connection reset by peer) [23:08] *** Ymgve has quit IRC (Ping timeout: 506 seconds) [23:12] *** Ymgve has joined #archiveteam [23:57] *** BlueMaxim has joined #archiveteam