[00:19] *** chazchaz_ has joined #archiveteam-bs [00:28] *** BlueMaxim has joined #archiveteam-bs [00:32] *** drumstick has quit IRC (Ping timeout: 248 seconds) [00:49] *** icedice2 has quit IRC (Read error: Connection reset by peer) [01:01] *** pizzaiolo has quit IRC (pizzaiolo) [01:02] *** pizzaiolo has joined #archiveteam-bs [01:06] *** pizzaiolo has quit IRC (Client Quit) [01:06] *** pizzaiolo has joined #archiveteam-bs [01:11] *** pizzaiolo has quit IRC (Client Quit) [01:12] *** pizzaiolo has joined #archiveteam-bs [01:14] *** pizzaiolo has quit IRC (Client Quit) [01:14] Would someone be interested in helping me verify my github collection process [01:15] I've grabbed about 1% so far [01:15] But I want to make sure I'm safely capturing everything [01:17] *** pizzaiolo has joined #archiveteam-bs [01:21] *** DFJustin has joined #archiveteam-bs [01:21] *** swebb sets mode: +o DFJustin [01:27] *** Mayonaise has quit IRC (Read error: Operation timed out) [01:31] *** pizzaiolo has quit IRC (pizzaiolo) [01:31] *** pizzaiolo has joined #archiveteam-bs [01:36] *** pizzaiolo has quit IRC (pizzaiolo) [01:37] *** pizzaiolo has joined #archiveteam-bs [01:39] *** Mayonaise has joined #archiveteam-bs [01:44] *** drumstick has joined #archiveteam-bs [01:46] *** pizzaiolo has quit IRC (pizzaiolo) [01:47] *** pizzaiolo has joined #archiveteam-bs [01:51] *** pizzaiolo has quit IRC (Client Quit) [01:52] *** pizzaiolo has joined #archiveteam-bs [01:52] *** pizzaiolo has quit IRC (Client Quit) [02:10] *** Asparagir has quit IRC (Asparagir) [02:12] *** Asparagir has joined #archiveteam-bs [02:13] *** svchfoo3 sets mode: +o Asparagir [02:47] *** username1 has joined #archiveteam-bs [02:51] *** drumstick has quit IRC (Ping timeout: 248 seconds) [02:52] *** schbirid2 has quit IRC (Read error: Operation timed out) [02:55] *** drumstick has joined #archiveteam-bs [03:01] *** dashcloud has quit IRC (Remote host closed the connection) [03:57] *** drumstick has quit IRC (Quit: Leaving) [03:57] *** drumstick has joined #archiveteam-bs [04:06] *** qw3rty10 has joined #archiveteam-bs [04:12] *** qw3rty9 has quit IRC (Read error: Operation timed out) [04:43] *** jspiros has quit IRC (Ping timeout: 493 seconds) [04:54] *** jspiros has joined #archiveteam-bs [05:02] *** Mateon1 has quit IRC (Ping timeout: 248 seconds) [05:02] *** Mateon1 has joined #archiveteam-bs [06:13] *** schbirid2 has joined #archiveteam-bs [06:20] *** username1 has quit IRC (Read error: Operation timed out) [06:33] *** Asparagir has quit IRC (Asparagir) [07:34] *** jspiros has quit IRC (leaving) [07:58] so i'm upload the 24 episodes from 2005 to FOS [08:18] *** jspiros has joined #archiveteam-bs [08:23] *** j08nY has joined #archiveteam-bs [08:54] *** j08nY has quit IRC (Read error: Operation timed out) [09:05] *** icedice has joined #archiveteam-bs [09:17] *** j08nY has joined #archiveteam-bs [09:27] *** j08nY has quit IRC (Read error: Operation timed out) [09:34] rbraun, arkiver, astrid: Backslashes are nowadays sort-of acceptable in URLs. They're treated exactly like forward slashes, except they also trigger a parsing error (but that doesn't make the URL invalid or change its meaning; think of it as an "error"-type message in a log somewhere). [09:34] So this is really a bug in the URL parser used by wpull. [09:36] It's correct though that backslashes were originally invalid in URLs. Because Microsoft had to change the path delimiter on their OS, they also had to support the backslash in URLs in IE. Other browsers then had to do so as well because stupid people wrote stupid HTML containing invalid URLs. [09:37] That's how it ended up as a valid but also erroneous alternative to forward slashes in the current URL specs. [09:38] s/parser error/validation error/g [09:40] See e.g. point 2 at https://url.spec.whatwg.org/#file-state [09:42] (That specific case handles file://\path, which is identical to file:///path except for the validation error.) [10:06] *** tuluu has quit IRC (Ping timeout: 250 seconds) [10:12] *** MadArchiv has joined #archiveteam-bs [10:14] Quick question: how do I make it so the grabs I do through grab-site display in the Wayback Machine once I upload them on archive.org? [10:14] You need to upload them with mediatype "web". [10:15] Ok, thanks! [10:20] *** tuluu has joined #archiveteam-bs [10:30] *** icedice has quit IRC (Read error: Connection reset by peer) [10:50] *** MadArchiv has quit IRC (Ping timeout: 246 seconds) [10:52] *** MadArchiv has joined #archiveteam-bs [11:07] *** drumstick has quit IRC (Read error: Operation timed out) [11:12] *** drumstick has joined #archiveteam-bs [11:16] JAA: By the way, setting the mediatype to "web" only works with WARC files, right? [11:17] *** pizzaiolo has joined #archiveteam-bs [11:19] MadArchiv: The mediatype attribute is for the entire item, not just single files, but yes, it only works for WARC (and ARC, but you should never use that nowadays). [11:20] ("working" meaning that it does something with the Wayback Machine.) [11:20] As far as I know, anyway. [11:35] *** icedice has joined #archiveteam-bs [11:35] *** icedice has quit IRC (Client Quit) [11:36] *** j08nY has joined #archiveteam-bs [11:46] *** BlueMaxim has quit IRC (Read error: Connection reset by peer) [12:02] JAA: the issue here is that the backslash is in the hyperlink but the /server/ considered it invalid [12:02] i.e. their own links don't work but are easily corrected, and the directories have indexing enabled to boot [12:12] rbraun: Ah, so the server expected that the client already converted the backslashes, I see. Not entirely sure what the correct behaviour for the client would be. [12:13] I'm also not sure what the HTTP specs say, i.e. whether backslashes are "valid" in the path there. [12:17] JAA: i mean, i think they just goofed [12:17] and put the links up without correcting them [12:20] According to RFC 7230, it's not allowed to GET \path in HTTP/1.1. [12:22] RFC 7540 also just directly refers to RFC 3986, which doesn't allow backslashes in the path either. [12:22] Interesting. [12:22] That means that backslashes are valid in URIs but not for HTTP requests, so the client is expected to replace them. [12:23] In other words, this is a bug in wpull or the URL library it's using. [12:23] (My money's on wpull) [12:24] So it's not their fault at all. [12:24] (Well, it's really the fault of whoever thought it'd be a great idea to replace forward with backward slashes as path delimiters on Windows, but whatever.) [12:25] s/Windows/MS-DOS/, I guess. [13:24] *** jspiros has quit IRC (leaving) [13:27] *** jspiros has joined #archiveteam-bs [13:55] My tool for searching the ArchiveBot archives is online: https://github.com/JustAnotherArchivist/archivebot-archives [13:56] The archives branch contains one YAML file per IA item. Using grep, you can find which items contain the files for a particular job. [13:56] No web frontend yet, so it's not a real replacement for the viewer yet, but at least it works. [13:57] I was hoping that maybe the GitHub search does the trick, but it doesn't look like it. [14:04] Yeah, apparently GitHub only indexes the master branch. Meh. [14:16] JAA: hmm, why is it separated into branches rather than folders? [14:20] joepie91: Yeah, I guess I could do that. Not sure why I didn't, to be honest. [14:20] Just seemed a bit cleaner to me. Keeping the changes of the code separate from the changes in the data. [14:21] at that point you might just want two separate repos :P [14:22] But the things still belong together, so having it in one repo makes more sense to me. [14:30] Screw it, I'll collapse it to one branch. [14:33] JAA: it's really annoying how github doesn't have a notion of 'projects' [14:33] it leads to lots of people doing lots of weird hacks like this [14:35] Agreed [14:43] *** pizzaiolo has quit IRC (Ping timeout: 246 seconds) [14:44] *** Pixi has quit IRC (ny.us.hub west.us.hub) [14:44] *** mundus201 has quit IRC (ny.us.hub west.us.hub) [14:44] *** superkuh has quit IRC (ny.us.hub west.us.hub) [14:44] *** zino has quit IRC (ny.us.hub west.us.hub) [14:44] *** Zebranky has quit IRC (ny.us.hub west.us.hub) [14:47] *** Ceryn has joined #archiveteam-bs [14:47] *** sep332 has joined #archiveteam-bs [14:50] joepie91: Everything on one branch now, and the search works. :-) [14:51] E.g. https://github.com/JustAnotherArchivist/archivebot-archives/search?utf8=%E2%9C%93&q=7c7jz&type= [15:01] :) [15:01] *** drumstick has quit IRC (Read error: Operation timed out) [15:14] *** Pixi has joined #archiveteam-bs [15:14] *** mundus201 has joined #archiveteam-bs [15:14] *** superkuh has joined #archiveteam-bs [15:14] *** zino has joined #archiveteam-bs [15:14] *** Zebranky has joined #archiveteam-bs [15:22] https://archive.org/details/Kingpin_Voicemail_Collection [15:25] *** pizzaiolo has joined #archiveteam-bs [15:46] *** Ravenloft has joined #archiveteam-bs [15:59] *** MadArchiv has quit IRC (Read error: Connection reset by peer) [16:24] *** Stiletto has quit IRC () [16:28] *** Asparagir has joined #archiveteam-bs [16:28] *** svchfoo1 sets mode: +o Asparagir [17:22] JAA: (re: backslashes) yeah. computer standards have to be descriptivist, at least to some extent [17:25] JAA: and to answer your question "why does dos use the \ like a chud" read this post https://blogs.msdn.microsoft.com/larryosterman/2005/06/24/why-is-the-dos-path-character/ [17:27] Yeah, I've read about that before elsewhere. [17:29] The term "technical debt" comes to mind. [17:32] Cool. [17:32] Cool term. [17:36] so i found a tape with sex in the city on it [17:37] that was taped over a documentary about nbc thursday shows [17:38] *** j08nY has quit IRC (Read error: Operation timed out) [17:40] it was called 20 years of must see tv [17:42] whats sad is that not on youtube [17:43] *** j08nY has joined #archiveteam-bs [17:44] *** Ravenloft has quit IRC (Read error: Operation timed out) [17:47] so, I didn't know that electric sheep saver used IA as its backing download server [17:47] thats neat [17:48] *** Asparagir has quit IRC (Asparagir) [17:59] *** Asparagir has joined #archiveteam-bs [18:00] *** svchfoo1 sets mode: +o Asparagir [18:16] *** jschwart has joined #archiveteam-bs [18:36] Yay, it's working: https://github.com/JustAnotherArchivist/archivebot-archives/commit/45eedbdd6786559876aae0f661d3060a776f79df :-) [18:44] What have you been working on? [18:45] JAA: ^ [18:56] *** MrDignity has quit IRC (Read error: Connection reset by peer) [18:59] *** MrDignity has joined #archiveteam-bs [19:02] *** Pixi has quit IRC (Ping timeout: 255 seconds) [19:03] *** Pixi has joined #archiveteam-bs [19:06] Ceryn: A working alternative to the broken ArchiveBot viewer, which allows for searching the archives of ArchiveBot for files for a particular domain or job. [19:09] If someone wants to check some random repos and make sure they contain everything I would appreciate it: rsync burn.za3k.com::github-repos/set-0001/ -l [19:10] Cool. [19:10] Sorry if this is a resend, think there was a netsplit last time [19:12] Please don't try to copy everything though [19:13] Oh, only reachable over IPv6, is that intentional kisspunch? [19:13] Ah crap--it is but I forgot [19:14] And apparently IPv6 is broken on at least one of my machines, even though it worked a while ago when I configured the network. Grrr [19:22] There may be some repos with 'auth_failed' in them, in which case the actual github repo should be inaccessible too [19:22] here are the latest tapes uploaded: https://www.patreon.com/posts/digitize-tapes-15313676 [19:29] *** Ravenloft has joined #archiveteam-bs [19:32] kisspunch: Every second line in REPOS is "loltime". I doubt that's on purpose, right? Debugging statement somewhere in the code? [19:49] *** Pixi has quit IRC (Ping timeout: 255 seconds) [20:07] *** Pixi has joined #archiveteam-bs [20:09] *** Ravenloft has quit IRC (Read error: Operation timed out) [20:10] *** Ravenloft has joined #archiveteam-bs [20:57] *** Asparagir has quit IRC (Asparagir) [21:19] *** Atom__ has joined #archiveteam-bs [21:23] *** Atom-- has quit IRC (Read error: Operation timed out) [21:24] *** jschwart has quit IRC (Read error: Operation timed out) [21:49] *** atrocity has quit IRC (Remote host closed the connection) [21:49] *** RichardG_ has quit IRC (Read error: Connection reset by peer) [21:50] *** atrocity has joined #archiveteam-bs [21:51] *** RichardG has joined #archiveteam-bs [21:54] *** Stilett0 has joined #archiveteam-bs [22:06] SketchCow: https://traveloregon.com/thegame/ [22:28] kisspunch: So I looked at a few random repos. One thing I noticed is that not all data about commits is actually from the repository, apparently. [22:28] For example, https://github.com/iciclespider/GoldenCheetah/commit/99c330edc2e2fac304031ec2ebf63cd2c9ee2e32 lists this as the author information: "rclasen committed with jknotzke on 8 Oct 2010" [22:29] But that reference to jknotzke is nowhere to be found in the actual commit. [22:30] Oh wait, I'm stupid, it is there. jknotzke is the committer, rclasen the author. [22:30] So I guess the only thing missing there would be the mapping between author/committer identities and GitHub usernames. [22:31] Otherwise, I didn't see anything missing. [22:31] One suggestion though: since these are really just bare repositories, I'd recommend naming the directories user/repo.git rather than user/repo. [22:32] (I mean inside the tars, primarily.) [22:53] *** Stilett0 has quit IRC (Read error: Operation timed out) [22:55] *** Pixi has quit IRC (Ping timeout: 255 seconds) [22:59] *** ZexaronS- has quit IRC (Quit: Leaving) [23:07] *** Pixi has joined #archiveteam-bs [23:16] *** drumstick has joined #archiveteam-bs [23:29] *** Pixi` has joined #archiveteam-bs [23:31] *** Pixi has quit IRC (Ping timeout: 255 seconds) [23:59] *** Asparagir has joined #archiveteam-bs [23:59] *** svchfoo1 sets mode: +o Asparagir