#internetarchive 2018-05-04,Fri

↑back Search

Time Nickname Message
04:06 🔗 odemg has quit IRC (Read error: Operation timed out)
04:11 🔗 odemg has joined #internetarchive
06:28 🔗 sahya has joined #internetarchive
12:02 🔗 Nemo_bis Why does this not update existing metadata with the internetarchive library? https://github.com/WikiTeam/wikiteam/commit/5db991bfbbf10dc86a29243eceeb6aa6fd22cbd9#diff-78e323ef0b4f5972e99865640365018bR281
12:06 🔗 HCross has quit IRC (Read error: Operation timed out)
12:07 🔗 HCross has joined #internetarchive
12:10 🔗 JAA Huh. That's exactly what 'ia metadata' does as well, which has always worked for me (except during a derive, which is an issue I've just reported to Jake yesterday).
12:11 🔗 JAA Unless it returns an error and that's ignored. 'ia metadata' does this error checking: https://github.com/jjjake/internetarchive/blob/27e387a7245699a1ead14e2214261bff5629333d/internetarchive/cli/ia_metadata.py#L72-L87
12:12 🔗 JAA Item.modify_metadata does no error-checking whatsoever, it just sends the request and returns the response.
12:13 🔗 Nemo_bis "except during a derive" is probably the issue, since we're launching that right after the upload
12:13 🔗 Nemo_bis I could try to set the upload to not trigger a derive, or wait some seconds
12:15 🔗 JAA Even during a derive, most metadata should be fine. I've only had it happen for the 'date' field, which gets reset to the previous value when the derive finishes on mediatype:web items with WARC files.
12:16 🔗 Nemo_bis Ah right, I had forgotten this.
12:17 🔗 Nemo_bis At any rate, let's disable the derive.
12:17 🔗 JAA Also, it seems that it should be sufficient to specify the metadata on the upload call.
12:18 🔗 JAA Though the documentation for Item.upload_file about the metadata parameter says "Metadata used to create a new item.", so not sure.
12:18 🔗 JAA It's a shame that the backend isn't open source.
12:20 🔗 Nemo_bis The IA S3 API had a specific option to make the new metadata override the old, but I rarely had success using it.
12:22 🔗 Nemo_bis o to update _meta.xml do a bucket PUT with the header
12:22 🔗 Nemo_bis x-archive-ignore-preexisting-bucket:1
12:22 🔗 Nemo_bis this will erase the old _meta.xml and replace it with
12:22 🔗 Nemo_bis a new _meta.xml generated from the x-archive-meta-* headers in the PUT
12:22 🔗 Nemo_bis https://archive.org/help/abouts3.txt
12:22 🔗 JAA Yeah, just saw that.
12:23 🔗 JAA So I guess the metadata dict is really only used on item creation.
12:23 🔗 JAA Unless you specify that header, that is.
12:25 🔗 Nemo_bis Header which I think we're no longer supposed to use since 2013, maybe https://blog.archive.org/2013/07/04/metadata-api/
12:26 🔗 Nemo_bis The only mention I see in the repo is this curl response https://github.com/jjjake/internetarchive/issues/48#issuecomment-33986273
12:27 🔗 JAA :-| The API situation at IA is really a mess...
12:27 🔗 JAA But yeah, that metadata API is the one Item.modify_metadata uses.
12:34 🔗 HCross has quit IRC (Read error: Connection reset by peer)
12:39 🔗 HCross has joined #internetarchive
12:56 🔗 mistym has quit IRC (Quit: ZNC - http://znc.in)
13:05 🔗 mistym has joined #internetarchive
13:11 🔗 HCross_ has joined #internetarchive
13:16 🔗 HCross has quit IRC (Read error: Operation timed out)
13:16 🔗 HCross_ is now known as HCross
17:58 🔗 sahya has quit IRC (Read error: Operation timed out)
18:59 🔗 sahya has joined #internetarchive
19:47 🔗 sahya has quit IRC (Read error: Operation timed out)

irclogger-viewer