#internetarchive.bak 2016-06-12,Sun

↑back Search

Time Nickname Message
04:16 πŸ”— VADemon has quit IRC (Read error: Connection reset by peer)
06:56 πŸ”— JesseW has quit IRC (Read error: Operation timed out)
09:06 πŸ”— demize has joined #internetarchive.bak
09:19 πŸ”— closure has quit IRC (Ping timeout: 250 seconds)
09:19 πŸ”— closure has joined #internetarchive.bak
09:20 πŸ”— svchfoo3 sets mode: +o closure
09:21 πŸ”— demize Is it to be expected to get a bunch of 403 errors from the IA when doing git-annex-get's?
09:22 πŸ”— demize or 417 Expectation Failed
09:26 πŸ”— HCross is it doing it on the metadata?
12:09 πŸ”— VADemon has joined #internetarchive.bak
13:40 πŸ”— demize On actual files, like https://ptpb.pw/mWuT
14:03 πŸ”— demize It seems that the 39 files that still only have 1 copy in shard 10 are ones that return either 403, 404, or 417.
15:33 πŸ”— db48x yep
15:33 πŸ”— db48x sometimes the IA has to hide items
16:02 πŸ”— GLaDOS has quit IRC (Quit: Oh crap, I died.)
16:03 πŸ”— GLaDOS has joined #internetarchive.bak
16:11 πŸ”— Rotab has quit IRC (Read error: Connection reset by peer)
16:11 πŸ”— demize I'd expect all hidden items to have the same status code though.
16:30 πŸ”— db48x I'm sure there are plenty of interesting edge cases
16:31 πŸ”— JesseW has joined #internetarchive.bak
17:10 πŸ”— demize Wish things like that was documented, hmm.
17:10 πŸ”— JesseW demize: like what?
17:10 πŸ”— demize The status code of missing files.
17:10 πŸ”— * JesseW is reading the logs now
17:10 πŸ”— demize Getting a mix of 403s, 404s, and 417s.
17:11 πŸ”— JesseW well, we can document them now. :-) That's one of the points of this project. :-)
17:12 πŸ”— demize It's hard to document them when there's no information given about them though.
17:12 πŸ”— JesseW well, we can start by documenting what the range of responses is
17:13 πŸ”— JesseW and include those on the wiki page (or a subpage): http://archiveteam.org/index.php?title=Talk:INTERNETARCHIVE.BAK
17:14 πŸ”— JesseW demize: one very useful way to investigate particular items on IA is (while logged in) go to https://archive.org/history/IDENTIFIER (where IDENTIFIER is the item identifier, in this case crankygeeks_080_episode )
17:15 πŸ”— JesseW This shows that the item was (likely incorrectly) marked as spam back on 2015-09-30
17:15 πŸ”— JesseW I'll send an email to info@ and they will likely make it visible again.
17:16 πŸ”— JesseW demize: please drop the full list of other items that are generating errors into a pastebin
17:20 πŸ”— demize Cool, I'll save the list.
17:20 πŸ”— JesseW Want me to CC you on the email to info@?
17:22 πŸ”— demize Sure, johannes@kyriasis.com
17:23 πŸ”— demize Should we also make a sub page for each shard with a list of files that cannot be mirrored?
17:23 πŸ”— JesseW yes
17:24 πŸ”— JesseW ok, email sent
17:34 πŸ”— demize Hmm, https://archive.org/download/tolcher2005-03-13.shnf/tolcher2005-03-13.shnf_64kb_mp3.zip is another one that fails, with 417, with the page saying "missing required path parameter", hmm.
17:38 πŸ”— JesseW Hm, will check /history/
17:38 πŸ”— db48x that item doesn't have a zip file in it
17:38 πŸ”— JesseW ok, yeah, that's a *different* (and interesting) error
17:38 πŸ”— db48x https://ia902305.us.archive.org/25/items/tolcher2005-03-13.shnf/tolcher2005-03-13.shnf_files.xml
17:40 πŸ”— JesseW I don't see anything in the history that would explain why that file used to exist but doesn't now...
17:41 πŸ”— JesseW ah, maybe this task: https://catalogd.archive.org/log/345422350
17:41 πŸ”— demize https://catalogd.archive.org/log/400391779
17:41 πŸ”— demize IMPROPERLY _files.xml MARKED LIKELY DERIVATIVES:
17:41 πŸ”— demize ...
17:41 πŸ”— demize DELETING tolcher2005-03-13.shnf_64kb_mp3.zip
17:41 πŸ”— JesseW ha
17:41 πŸ”— JesseW well, that explains it yeah
17:42 πŸ”— db48x the error is probably weird because it's interacting with their online zip browser
17:42 πŸ”— db48x which lets you load individual files from inside archives
17:43 πŸ”— JesseW this is certainly exactly the sort of thing the IA.BAK project is intended to flush out :-)
17:46 πŸ”— demize https://ptpb.pw/M2GY are the 39 files that aren't available in shard 10
17:46 πŸ”— JesseW nice!
17:47 πŸ”— db48x we should just remove the zips from the shard
17:47 πŸ”— demize Yeah
17:47 πŸ”— JesseW https://archive.org/download/hhtm2008-12-04.Schoeps_64_24_bit/hhtm2008-12-04_mk4_2496_FLAC/.pureftpd-upload.4939a348.15.4bd9.b521510 -- this looks like a temporary file that got swept up in the shard
17:47 πŸ”— JesseW agreed about removing the zips
17:48 πŸ”— db48x ekafon048kmutant was darked
17:49 πŸ”— demize Yeah, .pureftpd-upload.* files are essentially .part files.
17:50 πŸ”— demize It seems that different darked files get either 403 or 404
17:51 πŸ”— JesseW Hm, I wonder why the difference.
17:51 πŸ”— demize Actually, hmmm. Maybe the darked ones anly get 403
17:51 πŸ”— demize And the 404s are just ones that were deleted due to <reason>
17:51 πŸ”— JesseW that would make sense, yeah
17:53 πŸ”— demize Seems many of the 404s are mp3s that are previews
17:54 πŸ”— db48x we should leave the dark items in the shard, in case they come back later
17:54 πŸ”— demize Also
17:54 πŸ”— demize "get nacion-libre-diy/NXL064/03. By my self (Feat. Raz - Anti-Ven%C3%B6m).mp3 (from web...) "
17:54 πŸ”— db48x it's even possible we got a copy before they went dark
17:55 πŸ”— JesseW yep :-)
17:55 πŸ”— demize Is there, but getting a 404, which seems to be due to weird URL encoding
17:55 πŸ”— JesseW well, that's a bug
17:55 πŸ”— db48x hrm. I had thought that bug was fixed
17:55 πŸ”— demize Real URL is https://archive.org/download/NXL064/03.%20By%20my%20self%20%28Feat.%20Raz%20-%20Anti-Ven%25C3%25B6m%29.mp3
17:55 πŸ”— demize But it tries https://archive.org/download/NXL064/03.%20By%20my%20self%20(Feat.%20Raz%20-%20Anti-Ven%C3%B6m).mp3_meta.txt
17:56 πŸ”— demize err, https://archive.org/download/NXL064/03.%20By%20my%20self%20(Feat.%20Raz%20-%20Anti-Ven%C3%B6m).mp3
17:57 πŸ”— demize Actually
17:57 πŸ”— demize I think it might be that the ΓΆ in VenΓΆm was a different encoding and was then normalized..
18:00 πŸ”— demize Or, no...
18:00 πŸ”— demize Gah, it's double URL encoded.
18:01 πŸ”— demize %C3%B6 turned into %25 C3 %25 B6
18:01 πŸ”— db48x hah
18:01 πŸ”— JesseW That would be a problem, yeah
18:01 πŸ”— db48x that's probably why it's not fixed
18:02 πŸ”— db48x the fix would be inelegant :)
18:02 πŸ”— demize Yeah..
18:02 πŸ”— JesseW It's still a bug in iabak, that we got the wrong encoding, no?
18:02 πŸ”— db48x I wonder if IA would fix it on their end
18:03 πŸ”— demize https://catalogd.archive.org/log/151966441
18:03 πŸ”— demize So the file seems to have been uploaded with a percent-encoded filename.
18:03 πŸ”— demize And then iabak didn't percent-encode the percent-encoded filename.
18:03 πŸ”— db48x yep
18:04 πŸ”— db48x technically iabak doesn't encode anything
18:04 πŸ”— db48x but the script that creates the shard could
18:05 πŸ”— db48x keep up the good work; I'll be around later
18:06 πŸ”— JesseW yeah, I was thinking of iabak in the sense of the whole project
18:07 πŸ”— JesseW specifically the shard creation script
18:07 πŸ”— HCross2 has joined #internetarchive.bak
19:26 πŸ”— JesseW has quit IRC (Read error: Operation timed out)
19:41 πŸ”— Start has quit IRC (Read error: Connection reset by peer)
19:41 πŸ”— Start has joined #internetarchive.bak
19:42 πŸ”— svchfoo3 sets mode: +o Start
21:43 πŸ”— JesseW has joined #internetarchive.bak
23:04 πŸ”— xperia64 has joined #internetarchive.bak
23:55 πŸ”— demize has quit IRC (Ping timeout: 250 seconds)

irclogger-viewer