#archiveteam-ot 2020-09-10,Thu

↑back Search

Time Nickname Message
00:05 🔗 Raccoon has quit IRC (Ping timeout: 745 seconds)
00:06 🔗 Raccoon` has quit IRC (Ping timeout: 745 seconds)
01:11 🔗 Ryz has quit IRC (Remote host closed the connection)
01:11 🔗 kiska1825 has quit IRC (Remote host closed the connection)
01:12 🔗 Ryz has joined #archiveteam-ot
01:12 🔗 kiska1825 has joined #archiveteam-ot
01:15 🔗 merami has quit IRC (Remote host closed the connection)
01:16 🔗 merami has joined #archiveteam-ot
01:17 🔗 merami im devastated my vm just stopped working halfway through my crawl
01:18 🔗 merami is there any way to start a crawl again
01:22 🔗 merami is it "-dir=DIR: Put control files, temporary files, and unfinished WARCs in DIR (default: a directory name based on the URL, date, and first 8 characters of the id)."
01:22 🔗 JAA Hmm, not entirely sure. I believe grab-site has some resumption options, but I don't actually use it myself.
01:22 🔗 JAA ivan: ^
01:53 🔗 yano_ is now known as yano
03:52 🔗 qw3rty__ has joined #archiveteam-ot
03:59 🔗 qw3rty_ has quit IRC (Read error: Operation timed out)
03:59 🔗 BlueMax has joined #archiveteam-ot
04:30 🔗 DLoader has quit IRC (Read error: Connection reset by peer)
04:30 🔗 DLoader has joined #archiveteam-ot
04:47 🔗 logchfo_1 starts logging #archiveteam-ot at Thu Sep 10 04:47:34 2020
04:47 🔗 logchfo_1 has joined #archiveteam-ot
04:48 🔗 scorche has quit IRC (Read error: Operation timed out)
04:49 🔗 phirephl- has joined #archiveteam-ot
04:49 🔗 superkuh_ has joined #archiveteam-ot
04:49 🔗 superkuh has quit IRC (Read error: Operation timed out)
04:49 🔗 phirephly has quit IRC (Read error: Operation timed out)
04:49 🔗 godane has quit IRC (Ping timeout: 260 seconds)
04:49 🔗 godane has joined #archiveteam-ot
05:17 🔗 Raccoon has joined #archiveteam-ot
05:42 🔗 scorche has joined #archiveteam-ot
06:06 🔗 HP_Archiv has joined #archiveteam-ot
06:06 🔗 HP_Archiv has quit IRC (Client Quit)
07:03 🔗 ivan grab-site does not support resumption
07:03 🔗 ivan you can pull URLs out of the queue and grab them with --1 if you want
07:04 🔗 ivan cc merami
07:05 🔗 Ctrl has quit IRC (Read error: Operation timed out)
07:22 🔗 jodizzle I think one time I "resumed" a grab-site run by assembling a wpull command similar to the one grab-site uses, but referencing the already existing queue. But unfortunately I don't remember the details.
07:22 🔗 jodizzle It involved following several open GitHub tickets.
07:43 🔗 ivan there's a grab-site option to dump the wpull args that it would use
08:11 🔗 jodizzle Yeah, I think that's part of how I constructed the command I used.
09:38 🔗 SynMonger has quit IRC (Ping timeout: 272 seconds)
09:38 🔗 Laverne has quit IRC (Ping timeout: 272 seconds)
09:38 🔗 SynMonger has joined #archiveteam-ot
09:39 🔗 katocala has quit IRC (Ping timeout: 272 seconds)
09:39 🔗 katocala has joined #archiveteam-ot
09:56 🔗 scorche has quit IRC (hub.efnet.us irc.Prison.NET)
10:18 🔗 yano has quit IRC (Remote host closed the connection)
10:20 🔗 yano has joined #archiveteam-ot
10:21 🔗 NatarajBt has joined #archiveteam-ot
10:21 🔗 Laverne has joined #archiveteam-ot
10:30 🔗 Laverne has quit IRC (Ping timeout: 272 seconds)
10:30 🔗 NatarajBt has quit IRC (Ping timeout: 272 seconds)
10:39 🔗 BlueMax has quit IRC (Quit: Leaving)
11:11 🔗 NatarajBt has joined #archiveteam-ot
11:12 🔗 Laverne has joined #archiveteam-ot
11:42 🔗 scorche has joined #archiveteam-ot
12:09 🔗 qw3rty has joined #archiveteam-ot
12:09 🔗 qw3rty__ has quit IRC (Read error: Connection reset by peer)
14:05 🔗 merami is uploading partial archives of a site acceptable? What is the best way to upload a small collection of warcs 20-40 gb to archiveteam collection, https://gist.github.com/Asparagirl/6206247 will this put me on the right track?
14:05 🔗 merami thanks ivan and JAA for answering my dumb questions
15:25 🔗 JAA merami: A partial archive is better than no archive, so sure. Just make sure to explain it clearly in the metadata (description). As for the upload, yes, you can follow that guide. Or you can use the official `ia` CLI, which is a bit more user-friendly than using curl. Documentation at https://archive.org/services/docs/api/internetarchive/
15:26 🔗 JAA Make sure to include mediatype:web in the initial upload. It's not possible to change that later.
15:26 🔗 JAA I'm not sure what the criteria for whitelisting for inclusion in the Wayback Machine are.
15:44 🔗 Arcorann_ has quit IRC (Read error: Connection reset by peer)
16:29 🔗 systwi has quit IRC (Ping timeout: 622 seconds)
16:52 🔗 merami @JAA i am not allowed to set mediatype:web
16:56 🔗 merami https://archive.org/post/336796/how-to-change-mediatype-from-image-to-audio third post in this thread
16:57 🔗 JAA merami: You can't change it, but as far as I know, everyone can set it. That's why I said that it has to be included in the initial upload of the first file that creates the item.
16:57 🔗 JAA I'm pretty sure what Jeff means there is that this data is not ingested into the Wayback Machine by default.
16:58 🔗 JAA Unless this is a new restriction I've never heard of before.
16:59 🔗 merami i am unable to set mediatype:web in more options it gives me an error
16:59 🔗 JAA 'more options'?
17:00 🔗 JAA Are you using the web interface?
17:02 🔗 merami yes i was testing with web interface
17:03 🔗 JAA Right, no idea about that. I only know it's pretty awful for anything sizeable.
17:03 🔗 JAA I'd recommend trying the CLI.
17:05 🔗 DogsRNice has joined #archiveteam-ot
18:44 🔗 scorche has quit IRC (hub.efnet.us irc.Prison.NET)
19:01 🔗 bithippo has joined #archiveteam-ot
19:02 🔗 Raccoon has quit IRC (Remote host closed the connection)
19:03 🔗 Raccoon has joined #archiveteam-ot
19:07 🔗 scorche has joined #archiveteam-ot
19:47 🔗 bithippo has quit IRC (Textual IRC Client: www.textualapp.com)
19:51 🔗 lunik1 has quit IRC (Ping timeout: 265 seconds)
19:53 🔗 lunik1 has joined #archiveteam-ot
19:56 🔗 HP_Archiv has joined #archiveteam-ot
20:04 🔗 HP_Archiv has quit IRC (Quit: Leaving)
21:20 🔗 BlueMax has joined #archiveteam-ot
21:53 🔗 Ctrl has joined #archiveteam-ot
21:54 🔗 britmob_ has quit IRC (Read error: Connection reset by peer)
21:58 🔗 britmob_ has joined #archiveteam-ot
22:10 🔗 BlueMax has quit IRC (Read error: Connection reset by peer)
23:43 🔗 Arcorann_ has joined #archiveteam-ot
23:50 🔗 Laverne has quit IRC (Ping timeout: 272 seconds)
23:50 🔗 NatarajBt has quit IRC (Ping timeout: 272 seconds)
23:54 🔗 scorche has quit IRC (ircd.choopa.net irc.Prison.NET)

irclogger-viewer