Time |
Nickname |
Message |
04:06
🔗
|
|
odemg has quit IRC (Read error: Operation timed out) |
04:11
🔗
|
|
odemg has joined #internetarchive |
06:28
🔗
|
|
sahya has joined #internetarchive |
12:02
🔗
|
Nemo_bis |
Why does this not update existing metadata with the internetarchive library? https://github.com/WikiTeam/wikiteam/commit/5db991bfbbf10dc86a29243eceeb6aa6fd22cbd9#diff-78e323ef0b4f5972e99865640365018bR281 |
12:06
🔗
|
|
HCross has quit IRC (Read error: Operation timed out) |
12:07
🔗
|
|
HCross has joined #internetarchive |
12:10
🔗
|
JAA |
Huh. That's exactly what 'ia metadata' does as well, which has always worked for me (except during a derive, which is an issue I've just reported to Jake yesterday). |
12:11
🔗
|
JAA |
Unless it returns an error and that's ignored. 'ia metadata' does this error checking: https://github.com/jjjake/internetarchive/blob/27e387a7245699a1ead14e2214261bff5629333d/internetarchive/cli/ia_metadata.py#L72-L87 |
12:12
🔗
|
JAA |
Item.modify_metadata does no error-checking whatsoever, it just sends the request and returns the response. |
12:13
🔗
|
Nemo_bis |
"except during a derive" is probably the issue, since we're launching that right after the upload |
12:13
🔗
|
Nemo_bis |
I could try to set the upload to not trigger a derive, or wait some seconds |
12:15
🔗
|
JAA |
Even during a derive, most metadata should be fine. I've only had it happen for the 'date' field, which gets reset to the previous value when the derive finishes on mediatype:web items with WARC files. |
12:16
🔗
|
Nemo_bis |
Ah right, I had forgotten this. |
12:17
🔗
|
Nemo_bis |
At any rate, let's disable the derive. |
12:17
🔗
|
JAA |
Also, it seems that it should be sufficient to specify the metadata on the upload call. |
12:18
🔗
|
JAA |
Though the documentation for Item.upload_file about the metadata parameter says "Metadata used to create a new item.", so not sure. |
12:18
🔗
|
JAA |
It's a shame that the backend isn't open source. |
12:20
🔗
|
Nemo_bis |
The IA S3 API had a specific option to make the new metadata override the old, but I rarely had success using it. |
12:22
🔗
|
Nemo_bis |
o to update _meta.xml do a bucket PUT with the header |
12:22
🔗
|
Nemo_bis |
x-archive-ignore-preexisting-bucket:1 |
12:22
🔗
|
Nemo_bis |
this will erase the old _meta.xml and replace it with |
12:22
🔗
|
Nemo_bis |
a new _meta.xml generated from the x-archive-meta-* headers in the PUT |
12:22
🔗
|
Nemo_bis |
https://archive.org/help/abouts3.txt |
12:22
🔗
|
JAA |
Yeah, just saw that. |
12:23
🔗
|
JAA |
So I guess the metadata dict is really only used on item creation. |
12:23
🔗
|
JAA |
Unless you specify that header, that is. |
12:25
🔗
|
Nemo_bis |
Header which I think we're no longer supposed to use since 2013, maybe https://blog.archive.org/2013/07/04/metadata-api/ |
12:26
🔗
|
Nemo_bis |
The only mention I see in the repo is this curl response https://github.com/jjjake/internetarchive/issues/48#issuecomment-33986273 |
12:27
🔗
|
JAA |
:-| The API situation at IA is really a mess... |
12:27
🔗
|
JAA |
But yeah, that metadata API is the one Item.modify_metadata uses. |
12:34
🔗
|
|
HCross has quit IRC (Read error: Connection reset by peer) |
12:39
🔗
|
|
HCross has joined #internetarchive |
12:56
🔗
|
|
mistym has quit IRC (Quit: ZNC - http://znc.in) |
13:05
🔗
|
|
mistym has joined #internetarchive |
13:11
🔗
|
|
HCross_ has joined #internetarchive |
13:16
🔗
|
|
HCross has quit IRC (Read error: Operation timed out) |
13:16
🔗
|
|
HCross_ is now known as HCross |
17:58
🔗
|
|
sahya has quit IRC (Read error: Operation timed out) |
18:59
🔗
|
|
sahya has joined #internetarchive |
19:47
🔗
|
|
sahya has quit IRC (Read error: Operation timed out) |