#archiveteam-bs 2017-11-10,Fri

↑back Search ←Prev date Next date→ Show only urls(Click on time to select a line by its url)

WhoWhatWhen
***chazchaz_ has joined #archiveteam-bs [00:19]
BlueMaxim has joined #archiveteam-bs
drumstick has quit IRC (Ping timeout: 248 seconds)
[00:28]
.... (idle for 17mn)
icedice2 has quit IRC (Read error: Connection reset by peer) [00:49]
pizzaiolo has quit IRC (pizzaiolo)
pizzaiolo has joined #archiveteam-bs
pizzaiolo has quit IRC (Client Quit)
pizzaiolo has joined #archiveteam-bs
[01:01]
pizzaiolo has quit IRC (Client Quit)
pizzaiolo has joined #archiveteam-bs
pizzaiolo has quit IRC (Client Quit)
[01:11]
kisspunchWould someone be interested in helping me verify my github collection process
I've grabbed about 1% so far
But I want to make sure I'm safely capturing everything
[01:14]
***pizzaiolo has joined #archiveteam-bs
DFJustin has joined #archiveteam-bs
swebb sets mode: +o DFJustin
[01:17]
Mayonaise has quit IRC (Read error: Operation timed out)
pizzaiolo has quit IRC (pizzaiolo)
pizzaiolo has joined #archiveteam-bs
[01:27]
pizzaiolo has quit IRC (pizzaiolo)
pizzaiolo has joined #archiveteam-bs
Mayonaise has joined #archiveteam-bs
[01:36]
drumstick has joined #archiveteam-bs
pizzaiolo has quit IRC (pizzaiolo)
pizzaiolo has joined #archiveteam-bs
pizzaiolo has quit IRC (Client Quit)
pizzaiolo has joined #archiveteam-bs
pizzaiolo has quit IRC (Client Quit)
[01:44]
.... (idle for 18mn)
Asparagir has quit IRC (Asparagir)
Asparagir has joined #archiveteam-bs
svchfoo3 sets mode: +o Asparagir
[02:10]
....... (idle for 34mn)
username1 has joined #archiveteam-bs
drumstick has quit IRC (Ping timeout: 248 seconds)
schbirid2 has quit IRC (Read error: Operation timed out)
drumstick has joined #archiveteam-bs
[02:47]
dashcloud has quit IRC (Remote host closed the connection) [03:01]
............ (idle for 56mn)
drumstick has quit IRC (Quit: Leaving)
drumstick has joined #archiveteam-bs
[03:57]
qw3rty10 has joined #archiveteam-bs [04:06]
qw3rty9 has quit IRC (Read error: Operation timed out) [04:12]
....... (idle for 31mn)
jspiros has quit IRC (Ping timeout: 493 seconds) [04:43]
jspiros has joined #archiveteam-bs [04:54]
Mateon1 has quit IRC (Ping timeout: 248 seconds)
Mateon1 has joined #archiveteam-bs
[05:02]
............... (idle for 1h11mn)
schbirid2 has joined #archiveteam-bs [06:13]
username1 has quit IRC (Read error: Operation timed out) [06:20]
Asparagir has quit IRC (Asparagir) [06:33]
............. (idle for 1h1mn)
jspiros has quit IRC (leaving) [07:34]
..... (idle for 24mn)
godaneso i'm upload the 24 episodes from 2005 to FOS [07:58]
..... (idle for 20mn)
***jspiros has joined #archiveteam-bs [08:18]
j08nY has joined #archiveteam-bs [08:23]
....... (idle for 31mn)
j08nY has quit IRC (Read error: Operation timed out) [08:54]
icedice has joined #archiveteam-bs [09:05]
j08nY has joined #archiveteam-bs [09:17]
j08nY has quit IRC (Read error: Operation timed out) [09:27]
JAArbraun, arkiver, astrid: Backslashes are nowadays sort-of acceptable in URLs. They're treated exactly like forward slashes, except they also trigger a parsing error (but that doesn't make the URL invalid or change its meaning; think of it as an "error"-type message in a log somewhere).
So this is really a bug in the URL parser used by wpull.
It's correct though that backslashes were originally invalid in URLs. Because Microsoft had to change the path delimiter on their OS, they also had to support the backslash in URLs in IE. Other browsers then had to do so as well because stupid people wrote stupid HTML containing invalid URLs.
That's how it ended up as a valid but also erroneous alternative to forward slashes in the current URL specs.
s/parser error/validation error/g
See e.g. point 2 at https://url.spec.whatwg.org/#file-state
(That specific case handles file://\path, which is identical to file:///path except for the validation error.)
[09:34]
..... (idle for 24mn)
***tuluu has quit IRC (Ping timeout: 250 seconds) [10:06]
MadArchiv has joined #archiveteam-bs [10:12]
MadArchivQuick question: how do I make it so the grabs I do through grab-site display in the Wayback Machine once I upload them on archive.org? [10:14]
JAAYou need to upload them with mediatype "web". [10:14]
MadArchivOk, thanks! [10:15]
***tuluu has joined #archiveteam-bs [10:20]
icedice has quit IRC (Read error: Connection reset by peer) [10:30]
..... (idle for 20mn)
MadArchiv has quit IRC (Ping timeout: 246 seconds)
MadArchiv has joined #archiveteam-bs
[10:50]
.... (idle for 15mn)
drumstick has quit IRC (Read error: Operation timed out) [11:07]
drumstick has joined #archiveteam-bs [11:12]
MadArchivJAA: By the way, setting the mediatype to "web" only works with WARC files, right? [11:16]
***pizzaiolo has joined #archiveteam-bs [11:17]
JAAMadArchiv: The mediatype attribute is for the entire item, not just single files, but yes, it only works for WARC (and ARC, but you should never use that nowadays).
("working" meaning that it does something with the Wayback Machine.)
As far as I know, anyway.
[11:19]
.... (idle for 15mn)
***icedice has joined #archiveteam-bs
icedice has quit IRC (Client Quit)
j08nY has joined #archiveteam-bs
[11:35]
BlueMaxim has quit IRC (Read error: Connection reset by peer) [11:46]
.... (idle for 16mn)
rbraunJAA: the issue here is that the backslash is in the hyperlink but the /server/ considered it invalid
i.e. their own links don't work but are easily corrected, and the directories have indexing enabled to boot
[12:02]
JAArbraun: Ah, so the server expected that the client already converted the backslashes, I see. Not entirely sure what the correct behaviour for the client would be.
I'm also not sure what the HTTP specs say, i.e. whether backslashes are "valid" in the path there.
[12:12]
rbraunJAA: i mean, i think they just goofed
and put the links up without correcting them
[12:17]
JAAAccording to RFC 7230, it's not allowed to GET \path in HTTP/1.1.
RFC 7540 also just directly refers to RFC 3986, which doesn't allow backslashes in the path either.
Interesting.
That means that backslashes are valid in URIs but not for HTTP requests, so the client is expected to replace them.
In other words, this is a bug in wpull or the URL library it's using.
(My money's on wpull)
So it's not their fault at all.
(Well, it's really the fault of whoever thought it'd be a great idea to replace forward with backward slashes as path delimiters on Windows, but whatever.)
s/Windows/MS-DOS/, I guess.
[12:20]
............ (idle for 59mn)
***jspiros has quit IRC (leaving)
jspiros has joined #archiveteam-bs
[13:24]
...... (idle for 28mn)
JAAMy tool for searching the ArchiveBot archives is online: https://github.com/JustAnotherArchivist/archivebot-archives
The archives branch contains one YAML file per IA item. Using grep, you can find which items contain the files for a particular job.
No web frontend yet, so it's not a real replacement for the viewer yet, but at least it works.
I was hoping that maybe the GitHub search does the trick, but it doesn't look like it.
[13:55]
Yeah, apparently GitHub only indexes the master branch. Meh. [14:04]
joepie91JAA: hmm, why is it separated into branches rather than folders? [14:16]
JAAjoepie91: Yeah, I guess I could do that. Not sure why I didn't, to be honest.
Just seemed a bit cleaner to me. Keeping the changes of the code separate from the changes in the data.
[14:20]
joepie91at that point you might just want two separate repos :P [14:21]
JAABut the things still belong together, so having it in one repo makes more sense to me. [14:22]
Screw it, I'll collapse it to one branch. [14:30]
joepie91JAA: it's really annoying how github doesn't have a notion of 'projects'
it leads to lots of people doing lots of weird hacks like this
[14:33]
JAAAgreed [14:35]
***pizzaiolo has quit IRC (Ping timeout: 246 seconds)
Pixi has quit IRC (ny.us.hub west.us.hub)
mundus201 has quit IRC (ny.us.hub west.us.hub)
superkuh has quit IRC (ny.us.hub west.us.hub)
zino has quit IRC (ny.us.hub west.us.hub)
Zebranky has quit IRC (ny.us.hub west.us.hub)
Ceryn has joined #archiveteam-bs
sep332 has joined #archiveteam-bs
[14:43]
JAAjoepie91: Everything on one branch now, and the search works. :-)
E.g. https://github.com/JustAnotherArchivist/archivebot-archives/search?utf8=%E2%9C%93&q=7c7jz&type=
[14:50]
joepie91:) [15:01]
***drumstick has quit IRC (Read error: Operation timed out) [15:01]
Pixi has joined #archiveteam-bs
mundus201 has joined #archiveteam-bs
superkuh has joined #archiveteam-bs
zino has joined #archiveteam-bs
Zebranky has joined #archiveteam-bs
[15:14]
SketchCowhttps://archive.org/details/Kingpin_Voicemail_Collection [15:22]
***pizzaiolo has joined #archiveteam-bs [15:25]
..... (idle for 21mn)
Ravenloft has joined #archiveteam-bs [15:46]
MadArchiv has quit IRC (Read error: Connection reset by peer) [15:59]
...... (idle for 25mn)
Stiletto has quit IRC ()
Asparagir has joined #archiveteam-bs
svchfoo1 sets mode: +o Asparagir
[16:24]
........... (idle for 54mn)
astridJAA: (re: backslashes) yeah. computer standards have to be descriptivist, at least to some extent
JAA: and to answer your question "why does dos use the \ like a chud" read this post https://blogs.msdn.microsoft.com/larryosterman/2005/06/24/why-is-the-dos-path-character/
[17:22]
JAAYeah, I've read about that before elsewhere.
The term "technical debt" comes to mind.
[17:27]
CerynCool.
Cool term.
[17:32]
godaneso i found a tape with sex in the city on it
that was taped over a documentary about nbc thursday shows
[17:36]
***j08nY has quit IRC (Read error: Operation timed out) [17:38]
godaneit was called 20 years of must see tv
whats sad is that not on youtube
[17:40]
***j08nY has joined #archiveteam-bs
Ravenloft has quit IRC (Read error: Operation timed out)
[17:43]
jrwrso, I didn't know that electric sheep saver used IA as its backing download server
thats neat
[17:47]
***Asparagir has quit IRC (Asparagir) [17:48]
Asparagir has joined #archiveteam-bs
svchfoo1 sets mode: +o Asparagir
[17:59]
.... (idle for 16mn)
jschwart has joined #archiveteam-bs [18:16]
..... (idle for 20mn)
JAAYay, it's working: https://github.com/JustAnotherArchivist/archivebot-archives/commit/45eedbdd6786559876aae0f661d3060a776f79df :-) [18:36]
CerynWhat have you been working on?
JAA: ^
[18:44]
***MrDignity has quit IRC (Read error: Connection reset by peer)
MrDignity has joined #archiveteam-bs
Pixi has quit IRC (Ping timeout: 255 seconds)
Pixi has joined #archiveteam-bs
[18:56]
JAACeryn: A working alternative to the broken ArchiveBot viewer, which allows for searching the archives of ArchiveBot for files for a particular domain or job. [19:06]
kisspunchIf someone wants to check some random repos and make sure they contain everything I would appreciate it: rsync burn.za3k.com::github-repos/set-0001/ -l [19:09]
CerynCool. [19:10]
kisspunchSorry if this is a resend, think there was a netsplit last time
Please don't try to copy everything though
[19:10]
JAAOh, only reachable over IPv6, is that intentional kisspunch? [19:13]
kisspunchAh crap--it is but I forgot [19:13]
JAAAnd apparently IPv6 is broken on at least one of my machines, even though it worked a while ago when I configured the network. Grrr [19:14]
kisspunchThere may be some repos with 'auth_failed' in them, in which case the actual github repo should be inaccessible too [19:22]
godanehere are the latest tapes uploaded: https://www.patreon.com/posts/digitize-tapes-15313676 [19:22]
***Ravenloft has joined #archiveteam-bs [19:29]
JAAkisspunch: Every second line in REPOS is "loltime". I doubt that's on purpose, right? Debugging statement somewhere in the code? [19:32]
.... (idle for 17mn)
***Pixi has quit IRC (Ping timeout: 255 seconds) [19:49]
.... (idle for 18mn)
Pixi has joined #archiveteam-bs
Ravenloft has quit IRC (Read error: Operation timed out)
Ravenloft has joined #archiveteam-bs
[20:07]
.......... (idle for 47mn)
Asparagir has quit IRC (Asparagir) [20:57]
..... (idle for 22mn)
Atom__ has joined #archiveteam-bs
Atom-- has quit IRC (Read error: Operation timed out)
jschwart has quit IRC (Read error: Operation timed out)
[21:19]
...... (idle for 25mn)
atrocity has quit IRC (Remote host closed the connection)
RichardG_ has quit IRC (Read error: Connection reset by peer)
atrocity has joined #archiveteam-bs
RichardG has joined #archiveteam-bs
Stilett0 has joined #archiveteam-bs
[21:49]
joepie91SketchCow: https://traveloregon.com/thegame/ [22:06]
..... (idle for 22mn)
JAAkisspunch: So I looked at a few random repos. One thing I noticed is that not all data about commits is actually from the repository, apparently.
For example, https://github.com/iciclespider/GoldenCheetah/commit/99c330edc2e2fac304031ec2ebf63cd2c9ee2e32 lists this as the author information: "rclasen committed with jknotzke on 8 Oct 2010"
But that reference to jknotzke is nowhere to be found in the actual commit.
Oh wait, I'm stupid, it is there. jknotzke is the committer, rclasen the author.
So I guess the only thing missing there would be the mapping between author/committer identities and GitHub usernames.
Otherwise, I didn't see anything missing.
One suggestion though: since these are really just bare repositories, I'd recommend naming the directories user/repo.git rather than user/repo.
(I mean inside the tars, primarily.)
[22:28]
..... (idle for 21mn)
***Stilett0 has quit IRC (Read error: Operation timed out)
Pixi has quit IRC (Ping timeout: 255 seconds)
ZexaronS- has quit IRC (Quit: Leaving)
[22:53]
Pixi has joined #archiveteam-bs [23:07]
drumstick has joined #archiveteam-bs [23:16]
Pixi` has joined #archiveteam-bs
Pixi has quit IRC (Ping timeout: 255 seconds)
[23:29]
...... (idle for 28mn)
Asparagir has joined #archiveteam-bs
svchfoo1 sets mode: +o Asparagir
[23:59]

↑back Search ←Prev date Next date→ Show only urls(Click on time to select a line by its url)