#archiveteam-bs 2017-11-10,Fri

↑back Search

Time Nickname Message
00:19 🔗 chazchaz_ has joined #archiveteam-bs
00:28 🔗 BlueMaxim has joined #archiveteam-bs
00:32 🔗 drumstick has quit IRC (Ping timeout: 248 seconds)
00:49 🔗 icedice2 has quit IRC (Read error: Connection reset by peer)
01:01 🔗 pizzaiolo has quit IRC (pizzaiolo)
01:02 🔗 pizzaiolo has joined #archiveteam-bs
01:06 🔗 pizzaiolo has quit IRC (Client Quit)
01:06 🔗 pizzaiolo has joined #archiveteam-bs
01:11 🔗 pizzaiolo has quit IRC (Client Quit)
01:12 🔗 pizzaiolo has joined #archiveteam-bs
01:14 🔗 pizzaiolo has quit IRC (Client Quit)
01:14 🔗 kisspunch Would someone be interested in helping me verify my github collection process
01:15 🔗 kisspunch I've grabbed about 1% so far
01:15 🔗 kisspunch But I want to make sure I'm safely capturing everything
01:17 🔗 pizzaiolo has joined #archiveteam-bs
01:21 🔗 DFJustin has joined #archiveteam-bs
01:21 🔗 swebb sets mode: +o DFJustin
01:27 🔗 Mayonaise has quit IRC (Read error: Operation timed out)
01:31 🔗 pizzaiolo has quit IRC (pizzaiolo)
01:31 🔗 pizzaiolo has joined #archiveteam-bs
01:36 🔗 pizzaiolo has quit IRC (pizzaiolo)
01:37 🔗 pizzaiolo has joined #archiveteam-bs
01:39 🔗 Mayonaise has joined #archiveteam-bs
01:44 🔗 drumstick has joined #archiveteam-bs
01:46 🔗 pizzaiolo has quit IRC (pizzaiolo)
01:47 🔗 pizzaiolo has joined #archiveteam-bs
01:51 🔗 pizzaiolo has quit IRC (Client Quit)
01:52 🔗 pizzaiolo has joined #archiveteam-bs
01:52 🔗 pizzaiolo has quit IRC (Client Quit)
02:10 🔗 Asparagir has quit IRC (Asparagir)
02:12 🔗 Asparagir has joined #archiveteam-bs
02:13 🔗 svchfoo3 sets mode: +o Asparagir
02:47 🔗 username1 has joined #archiveteam-bs
02:51 🔗 drumstick has quit IRC (Ping timeout: 248 seconds)
02:52 🔗 schbirid2 has quit IRC (Read error: Operation timed out)
02:55 🔗 drumstick has joined #archiveteam-bs
03:01 🔗 dashcloud has quit IRC (Remote host closed the connection)
03:57 🔗 drumstick has quit IRC (Quit: Leaving)
03:57 🔗 drumstick has joined #archiveteam-bs
04:06 🔗 qw3rty10 has joined #archiveteam-bs
04:12 🔗 qw3rty9 has quit IRC (Read error: Operation timed out)
04:43 🔗 jspiros has quit IRC (Ping timeout: 493 seconds)
04:54 🔗 jspiros has joined #archiveteam-bs
05:02 🔗 Mateon1 has quit IRC (Ping timeout: 248 seconds)
05:02 🔗 Mateon1 has joined #archiveteam-bs
06:13 🔗 schbirid2 has joined #archiveteam-bs
06:20 🔗 username1 has quit IRC (Read error: Operation timed out)
06:33 🔗 Asparagir has quit IRC (Asparagir)
07:34 🔗 jspiros has quit IRC (leaving)
07:58 🔗 godane so i'm upload the 24 episodes from 2005 to FOS
08:18 🔗 jspiros has joined #archiveteam-bs
08:23 🔗 j08nY has joined #archiveteam-bs
08:54 🔗 j08nY has quit IRC (Read error: Operation timed out)
09:05 🔗 icedice has joined #archiveteam-bs
09:17 🔗 j08nY has joined #archiveteam-bs
09:27 🔗 j08nY has quit IRC (Read error: Operation timed out)
09:34 🔗 JAA rbraun, arkiver, astrid: Backslashes are nowadays sort-of acceptable in URLs. They're treated exactly like forward slashes, except they also trigger a parsing error (but that doesn't make the URL invalid or change its meaning; think of it as an "error"-type message in a log somewhere).
09:34 🔗 JAA So this is really a bug in the URL parser used by wpull.
09:36 🔗 JAA It's correct though that backslashes were originally invalid in URLs. Because Microsoft had to change the path delimiter on their OS, they also had to support the backslash in URLs in IE. Other browsers then had to do so as well because stupid people wrote stupid HTML containing invalid URLs.
09:37 🔗 JAA That's how it ended up as a valid but also erroneous alternative to forward slashes in the current URL specs.
09:38 🔗 JAA s/parser error/validation error/g
09:40 🔗 JAA See e.g. point 2 at https://url.spec.whatwg.org/#file-state
09:42 🔗 JAA (That specific case handles file://\path, which is identical to file:///path except for the validation error.)
10:06 🔗 tuluu has quit IRC (Ping timeout: 250 seconds)
10:12 🔗 MadArchiv has joined #archiveteam-bs
10:14 🔗 MadArchiv Quick question: how do I make it so the grabs I do through grab-site display in the Wayback Machine once I upload them on archive.org?
10:14 🔗 JAA You need to upload them with mediatype "web".
10:15 🔗 MadArchiv Ok, thanks!
10:20 🔗 tuluu has joined #archiveteam-bs
10:30 🔗 icedice has quit IRC (Read error: Connection reset by peer)
10:50 🔗 MadArchiv has quit IRC (Ping timeout: 246 seconds)
10:52 🔗 MadArchiv has joined #archiveteam-bs
11:07 🔗 drumstick has quit IRC (Read error: Operation timed out)
11:12 🔗 drumstick has joined #archiveteam-bs
11:16 🔗 MadArchiv JAA: By the way, setting the mediatype to "web" only works with WARC files, right?
11:17 🔗 pizzaiolo has joined #archiveteam-bs
11:19 🔗 JAA MadArchiv: The mediatype attribute is for the entire item, not just single files, but yes, it only works for WARC (and ARC, but you should never use that nowadays).
11:20 🔗 JAA ("working" meaning that it does something with the Wayback Machine.)
11:20 🔗 JAA As far as I know, anyway.
11:35 🔗 icedice has joined #archiveteam-bs
11:35 🔗 icedice has quit IRC (Client Quit)
11:36 🔗 j08nY has joined #archiveteam-bs
11:46 🔗 BlueMaxim has quit IRC (Read error: Connection reset by peer)
12:02 🔗 rbraun JAA: the issue here is that the backslash is in the hyperlink but the /server/ considered it invalid
12:02 🔗 rbraun i.e. their own links don't work but are easily corrected, and the directories have indexing enabled to boot
12:12 🔗 JAA rbraun: Ah, so the server expected that the client already converted the backslashes, I see. Not entirely sure what the correct behaviour for the client would be.
12:13 🔗 JAA I'm also not sure what the HTTP specs say, i.e. whether backslashes are "valid" in the path there.
12:17 🔗 rbraun JAA: i mean, i think they just goofed
12:17 🔗 rbraun and put the links up without correcting them
12:20 🔗 JAA According to RFC 7230, it's not allowed to GET \path in HTTP/1.1.
12:22 🔗 JAA RFC 7540 also just directly refers to RFC 3986, which doesn't allow backslashes in the path either.
12:22 🔗 JAA Interesting.
12:22 🔗 JAA That means that backslashes are valid in URIs but not for HTTP requests, so the client is expected to replace them.
12:23 🔗 JAA In other words, this is a bug in wpull or the URL library it's using.
12:23 🔗 JAA (My money's on wpull)
12:24 🔗 JAA So it's not their fault at all.
12:24 🔗 JAA (Well, it's really the fault of whoever thought it'd be a great idea to replace forward with backward slashes as path delimiters on Windows, but whatever.)
12:25 🔗 JAA s/Windows/MS-DOS/, I guess.
13:24 🔗 jspiros has quit IRC (leaving)
13:27 🔗 jspiros has joined #archiveteam-bs
13:55 🔗 JAA My tool for searching the ArchiveBot archives is online: https://github.com/JustAnotherArchivist/archivebot-archives
13:56 🔗 JAA The archives branch contains one YAML file per IA item. Using grep, you can find which items contain the files for a particular job.
13:56 🔗 JAA No web frontend yet, so it's not a real replacement for the viewer yet, but at least it works.
13:57 🔗 JAA I was hoping that maybe the GitHub search does the trick, but it doesn't look like it.
14:04 🔗 JAA Yeah, apparently GitHub only indexes the master branch. Meh.
14:16 🔗 joepie91 JAA: hmm, why is it separated into branches rather than folders?
14:20 🔗 JAA joepie91: Yeah, I guess I could do that. Not sure why I didn't, to be honest.
14:20 🔗 JAA Just seemed a bit cleaner to me. Keeping the changes of the code separate from the changes in the data.
14:21 🔗 joepie91 at that point you might just want two separate repos :P
14:22 🔗 JAA But the things still belong together, so having it in one repo makes more sense to me.
14:30 🔗 JAA Screw it, I'll collapse it to one branch.
14:33 🔗 joepie91 JAA: it's really annoying how github doesn't have a notion of 'projects'
14:33 🔗 joepie91 it leads to lots of people doing lots of weird hacks like this
14:35 🔗 JAA Agreed
14:43 🔗 pizzaiolo has quit IRC (Ping timeout: 246 seconds)
14:44 🔗 Pixi has quit IRC (ny.us.hub west.us.hub)
14:44 🔗 mundus201 has quit IRC (ny.us.hub west.us.hub)
14:44 🔗 superkuh has quit IRC (ny.us.hub west.us.hub)
14:44 🔗 zino has quit IRC (ny.us.hub west.us.hub)
14:44 🔗 Zebranky has quit IRC (ny.us.hub west.us.hub)
14:47 🔗 Ceryn has joined #archiveteam-bs
14:47 🔗 sep332 has joined #archiveteam-bs
14:50 🔗 JAA joepie91: Everything on one branch now, and the search works. :-)
14:51 🔗 JAA E.g. https://github.com/JustAnotherArchivist/archivebot-archives/search?utf8=%E2%9C%93&q=7c7jz&type=
15:01 🔗 joepie91 :)
15:01 🔗 drumstick has quit IRC (Read error: Operation timed out)
15:14 🔗 Pixi has joined #archiveteam-bs
15:14 🔗 mundus201 has joined #archiveteam-bs
15:14 🔗 superkuh has joined #archiveteam-bs
15:14 🔗 zino has joined #archiveteam-bs
15:14 🔗 Zebranky has joined #archiveteam-bs
15:22 🔗 SketchCow https://archive.org/details/Kingpin_Voicemail_Collection
15:25 🔗 pizzaiolo has joined #archiveteam-bs
15:46 🔗 Ravenloft has joined #archiveteam-bs
15:59 🔗 MadArchiv has quit IRC (Read error: Connection reset by peer)
16:24 🔗 Stiletto has quit IRC ()
16:28 🔗 Asparagir has joined #archiveteam-bs
16:28 🔗 svchfoo1 sets mode: +o Asparagir
17:22 🔗 astrid JAA: (re: backslashes) yeah. computer standards have to be descriptivist, at least to some extent
17:25 🔗 astrid JAA: and to answer your question "why does dos use the \ like a chud" read this post https://blogs.msdn.microsoft.com/larryosterman/2005/06/24/why-is-the-dos-path-character/
17:27 🔗 JAA Yeah, I've read about that before elsewhere.
17:29 🔗 JAA The term "technical debt" comes to mind.
17:32 🔗 Ceryn Cool.
17:32 🔗 Ceryn Cool term.
17:36 🔗 godane so i found a tape with sex in the city on it
17:37 🔗 godane that was taped over a documentary about nbc thursday shows
17:38 🔗 j08nY has quit IRC (Read error: Operation timed out)
17:40 🔗 godane it was called 20 years of must see tv
17:42 🔗 godane whats sad is that not on youtube
17:43 🔗 j08nY has joined #archiveteam-bs
17:44 🔗 Ravenloft has quit IRC (Read error: Operation timed out)
17:47 🔗 jrwr so, I didn't know that electric sheep saver used IA as its backing download server
17:47 🔗 jrwr thats neat
17:48 🔗 Asparagir has quit IRC (Asparagir)
17:59 🔗 Asparagir has joined #archiveteam-bs
18:00 🔗 svchfoo1 sets mode: +o Asparagir
18:16 🔗 jschwart has joined #archiveteam-bs
18:36 🔗 JAA Yay, it's working: https://github.com/JustAnotherArchivist/archivebot-archives/commit/45eedbdd6786559876aae0f661d3060a776f79df :-)
18:44 🔗 Ceryn What have you been working on?
18:45 🔗 Ceryn JAA: ^
18:56 🔗 MrDignity has quit IRC (Read error: Connection reset by peer)
18:59 🔗 MrDignity has joined #archiveteam-bs
19:02 🔗 Pixi has quit IRC (Ping timeout: 255 seconds)
19:03 🔗 Pixi has joined #archiveteam-bs
19:06 🔗 JAA Ceryn: A working alternative to the broken ArchiveBot viewer, which allows for searching the archives of ArchiveBot for files for a particular domain or job.
19:09 🔗 kisspunch If someone wants to check some random repos and make sure they contain everything I would appreciate it: rsync burn.za3k.com::github-repos/set-0001/ -l
19:10 🔗 Ceryn Cool.
19:10 🔗 kisspunch Sorry if this is a resend, think there was a netsplit last time
19:12 🔗 kisspunch Please don't try to copy everything though
19:13 🔗 JAA Oh, only reachable over IPv6, is that intentional kisspunch?
19:13 🔗 kisspunch Ah crap--it is but I forgot
19:14 🔗 JAA And apparently IPv6 is broken on at least one of my machines, even though it worked a while ago when I configured the network. Grrr
19:22 🔗 kisspunch There may be some repos with 'auth_failed' in them, in which case the actual github repo should be inaccessible too
19:22 🔗 godane here are the latest tapes uploaded: https://www.patreon.com/posts/digitize-tapes-15313676
19:29 🔗 Ravenloft has joined #archiveteam-bs
19:32 🔗 JAA kisspunch: Every second line in REPOS is "loltime". I doubt that's on purpose, right? Debugging statement somewhere in the code?
19:49 🔗 Pixi has quit IRC (Ping timeout: 255 seconds)
20:07 🔗 Pixi has joined #archiveteam-bs
20:09 🔗 Ravenloft has quit IRC (Read error: Operation timed out)
20:10 🔗 Ravenloft has joined #archiveteam-bs
20:57 🔗 Asparagir has quit IRC (Asparagir)
21:19 🔗 Atom__ has joined #archiveteam-bs
21:23 🔗 Atom-- has quit IRC (Read error: Operation timed out)
21:24 🔗 jschwart has quit IRC (Read error: Operation timed out)
21:49 🔗 atrocity has quit IRC (Remote host closed the connection)
21:49 🔗 RichardG_ has quit IRC (Read error: Connection reset by peer)
21:50 🔗 atrocity has joined #archiveteam-bs
21:51 🔗 RichardG has joined #archiveteam-bs
21:54 🔗 Stilett0 has joined #archiveteam-bs
22:06 🔗 joepie91 SketchCow: https://traveloregon.com/thegame/
22:28 🔗 JAA kisspunch: So I looked at a few random repos. One thing I noticed is that not all data about commits is actually from the repository, apparently.
22:28 🔗 JAA For example, https://github.com/iciclespider/GoldenCheetah/commit/99c330edc2e2fac304031ec2ebf63cd2c9ee2e32 lists this as the author information: "rclasen committed with jknotzke on 8 Oct 2010"
22:29 🔗 JAA But that reference to jknotzke is nowhere to be found in the actual commit.
22:30 🔗 JAA Oh wait, I'm stupid, it is there. jknotzke is the committer, rclasen the author.
22:30 🔗 JAA So I guess the only thing missing there would be the mapping between author/committer identities and GitHub usernames.
22:31 🔗 JAA Otherwise, I didn't see anything missing.
22:31 🔗 JAA One suggestion though: since these are really just bare repositories, I'd recommend naming the directories user/repo.git rather than user/repo.
22:32 🔗 JAA (I mean inside the tars, primarily.)
22:53 🔗 Stilett0 has quit IRC (Read error: Operation timed out)
22:55 🔗 Pixi has quit IRC (Ping timeout: 255 seconds)
22:59 🔗 ZexaronS- has quit IRC (Quit: Leaving)
23:07 🔗 Pixi has joined #archiveteam-bs
23:16 🔗 drumstick has joined #archiveteam-bs
23:29 🔗 Pixi` has joined #archiveteam-bs
23:31 🔗 Pixi has quit IRC (Ping timeout: 255 seconds)
23:59 🔗 Asparagir has joined #archiveteam-bs
23:59 🔗 svchfoo1 sets mode: +o Asparagir

irclogger-viewer