Time |
Nickname |
Message |
00:05
🔗
|
|
Raccoon has quit IRC (Ping timeout: 745 seconds) |
00:06
🔗
|
|
Raccoon` has quit IRC (Ping timeout: 745 seconds) |
01:11
🔗
|
|
Ryz has quit IRC (Remote host closed the connection) |
01:11
🔗
|
|
kiska1825 has quit IRC (Remote host closed the connection) |
01:12
🔗
|
|
Ryz has joined #archiveteam-ot |
01:12
🔗
|
|
kiska1825 has joined #archiveteam-ot |
01:15
🔗
|
|
merami has quit IRC (Remote host closed the connection) |
01:16
🔗
|
|
merami has joined #archiveteam-ot |
01:17
🔗
|
merami |
im devastated my vm just stopped working halfway through my crawl |
01:18
🔗
|
merami |
is there any way to start a crawl again |
01:22
🔗
|
merami |
is it "-dir=DIR: Put control files, temporary files, and unfinished WARCs in DIR (default: a directory name based on the URL, date, and first 8 characters of the id)." |
01:22
🔗
|
JAA |
Hmm, not entirely sure. I believe grab-site has some resumption options, but I don't actually use it myself. |
01:22
🔗
|
JAA |
ivan: ^ |
01:53
🔗
|
|
yano_ is now known as yano |
03:52
🔗
|
|
qw3rty__ has joined #archiveteam-ot |
03:59
🔗
|
|
qw3rty_ has quit IRC (Read error: Operation timed out) |
03:59
🔗
|
|
BlueMax has joined #archiveteam-ot |
04:30
🔗
|
|
DLoader has quit IRC (Read error: Connection reset by peer) |
04:30
🔗
|
|
DLoader has joined #archiveteam-ot |
04:47
🔗
|
|
logchfo_1 starts logging #archiveteam-ot at Thu Sep 10 04:47:34 2020 |
04:47
🔗
|
|
logchfo_1 has joined #archiveteam-ot |
04:48
🔗
|
|
scorche has quit IRC (Read error: Operation timed out) |
04:49
🔗
|
|
phirephl- has joined #archiveteam-ot |
04:49
🔗
|
|
superkuh_ has joined #archiveteam-ot |
04:49
🔗
|
|
superkuh has quit IRC (Read error: Operation timed out) |
04:49
🔗
|
|
phirephly has quit IRC (Read error: Operation timed out) |
04:49
🔗
|
|
godane has quit IRC (Ping timeout: 260 seconds) |
04:49
🔗
|
|
godane has joined #archiveteam-ot |
05:17
🔗
|
|
Raccoon has joined #archiveteam-ot |
05:42
🔗
|
|
scorche has joined #archiveteam-ot |
06:06
🔗
|
|
HP_Archiv has joined #archiveteam-ot |
06:06
🔗
|
|
HP_Archiv has quit IRC (Client Quit) |
07:03
🔗
|
ivan |
grab-site does not support resumption |
07:03
🔗
|
ivan |
you can pull URLs out of the queue and grab them with --1 if you want |
07:04
🔗
|
ivan |
cc merami |
07:05
🔗
|
|
Ctrl has quit IRC (Read error: Operation timed out) |
07:22
🔗
|
jodizzle |
I think one time I "resumed" a grab-site run by assembling a wpull command similar to the one grab-site uses, but referencing the already existing queue. But unfortunately I don't remember the details. |
07:22
🔗
|
jodizzle |
It involved following several open GitHub tickets. |
07:43
🔗
|
ivan |
there's a grab-site option to dump the wpull args that it would use |
08:11
🔗
|
jodizzle |
Yeah, I think that's part of how I constructed the command I used. |
09:38
🔗
|
|
SynMonger has quit IRC (Ping timeout: 272 seconds) |
09:38
🔗
|
|
Laverne has quit IRC (Ping timeout: 272 seconds) |
09:38
🔗
|
|
SynMonger has joined #archiveteam-ot |
09:39
🔗
|
|
katocala has quit IRC (Ping timeout: 272 seconds) |
09:39
🔗
|
|
katocala has joined #archiveteam-ot |
09:56
🔗
|
|
scorche has quit IRC (hub.efnet.us irc.Prison.NET) |
10:18
🔗
|
|
yano has quit IRC (Remote host closed the connection) |
10:20
🔗
|
|
yano has joined #archiveteam-ot |
10:21
🔗
|
|
NatarajBt has joined #archiveteam-ot |
10:21
🔗
|
|
Laverne has joined #archiveteam-ot |
10:30
🔗
|
|
Laverne has quit IRC (Ping timeout: 272 seconds) |
10:30
🔗
|
|
NatarajBt has quit IRC (Ping timeout: 272 seconds) |
10:39
🔗
|
|
BlueMax has quit IRC (Quit: Leaving) |
11:11
🔗
|
|
NatarajBt has joined #archiveteam-ot |
11:12
🔗
|
|
Laverne has joined #archiveteam-ot |
11:42
🔗
|
|
scorche has joined #archiveteam-ot |
12:09
🔗
|
|
qw3rty has joined #archiveteam-ot |
12:09
🔗
|
|
qw3rty__ has quit IRC (Read error: Connection reset by peer) |
14:05
🔗
|
merami |
is uploading partial archives of a site acceptable? What is the best way to upload a small collection of warcs 20-40 gb to archiveteam collection, https://gist.github.com/Asparagirl/6206247 will this put me on the right track? |
14:05
🔗
|
merami |
thanks ivan and JAA for answering my dumb questions |
15:25
🔗
|
JAA |
merami: A partial archive is better than no archive, so sure. Just make sure to explain it clearly in the metadata (description). As for the upload, yes, you can follow that guide. Or you can use the official `ia` CLI, which is a bit more user-friendly than using curl. Documentation at https://archive.org/services/docs/api/internetarchive/ |
15:26
🔗
|
JAA |
Make sure to include mediatype:web in the initial upload. It's not possible to change that later. |
15:26
🔗
|
JAA |
I'm not sure what the criteria for whitelisting for inclusion in the Wayback Machine are. |
15:44
🔗
|
|
Arcorann_ has quit IRC (Read error: Connection reset by peer) |
16:29
🔗
|
|
systwi has quit IRC (Ping timeout: 622 seconds) |
16:52
🔗
|
merami |
@JAA i am not allowed to set mediatype:web |
16:56
🔗
|
merami |
https://archive.org/post/336796/how-to-change-mediatype-from-image-to-audio third post in this thread |
16:57
🔗
|
JAA |
merami: You can't change it, but as far as I know, everyone can set it. That's why I said that it has to be included in the initial upload of the first file that creates the item. |
16:57
🔗
|
JAA |
I'm pretty sure what Jeff means there is that this data is not ingested into the Wayback Machine by default. |
16:58
🔗
|
JAA |
Unless this is a new restriction I've never heard of before. |
16:59
🔗
|
merami |
i am unable to set mediatype:web in more options it gives me an error |
16:59
🔗
|
JAA |
'more options'? |
17:00
🔗
|
JAA |
Are you using the web interface? |
17:02
🔗
|
merami |
yes i was testing with web interface |
17:03
🔗
|
JAA |
Right, no idea about that. I only know it's pretty awful for anything sizeable. |
17:03
🔗
|
JAA |
I'd recommend trying the CLI. |
17:05
🔗
|
|
DogsRNice has joined #archiveteam-ot |
18:44
🔗
|
|
scorche has quit IRC (hub.efnet.us irc.Prison.NET) |
19:01
🔗
|
|
bithippo has joined #archiveteam-ot |
19:02
🔗
|
|
Raccoon has quit IRC (Remote host closed the connection) |
19:03
🔗
|
|
Raccoon has joined #archiveteam-ot |
19:07
🔗
|
|
scorche has joined #archiveteam-ot |
19:47
🔗
|
|
bithippo has quit IRC (Textual IRC Client: www.textualapp.com) |
19:51
🔗
|
|
lunik1 has quit IRC (Ping timeout: 265 seconds) |
19:53
🔗
|
|
lunik1 has joined #archiveteam-ot |
19:56
🔗
|
|
HP_Archiv has joined #archiveteam-ot |
20:04
🔗
|
|
HP_Archiv has quit IRC (Quit: Leaving) |
21:20
🔗
|
|
BlueMax has joined #archiveteam-ot |
21:53
🔗
|
|
Ctrl has joined #archiveteam-ot |
21:54
🔗
|
|
britmob_ has quit IRC (Read error: Connection reset by peer) |
21:58
🔗
|
|
britmob_ has joined #archiveteam-ot |
22:10
🔗
|
|
BlueMax has quit IRC (Read error: Connection reset by peer) |
23:43
🔗
|
|
Arcorann_ has joined #archiveteam-ot |
23:50
🔗
|
|
Laverne has quit IRC (Ping timeout: 272 seconds) |
23:50
🔗
|
|
NatarajBt has quit IRC (Ping timeout: 272 seconds) |
23:54
🔗
|
|
scorche has quit IRC (ircd.choopa.net irc.Prison.NET) |