#archiveteam 2012-09-30,Sun

↑back Search

Time	Nickname	Message
10:40 ^🔗	underscor	http://archive.org/about/dmca.php I had no idea this was a thing
10:40 ^🔗	underscor	\o/
16:07 ^🔗	godane	i got the blazetv doc called The Project
16:59 ^🔗	dashcloud	for wget warc do I need to include a header?
17:06 ^🔗	dashcloud	apparently you can't use two separate warc headers- everything has to be combined into a single --warc-header command
17:58 ^🔗	alard	dashcloud: You should be able to use multiple --warc-header options.
18:00 ^🔗	alard	You could do something like wget --warc-header="operator: Archive Team" --warc-header="x-something-else: value"
18:01 ^🔗	alard	You can use any header you want, as long as it follows the name: value format. The headers will be stored in the warc-info record at the top of the warc file.
18:03 ^🔗	dashcloud	ah- that explains it
18:03 ^🔗	dashcloud	I didn't have a colon in the second header command
18:05 ^🔗	dashcloud	so how much should I set recursion to in order to avoid infinite loops?
18:40 ^🔗	alard	I'm not sure if Wget checks the headers, it might just copy the strings.
18:40 ^🔗	alard	Recursion, well, that depends on what you're doing, I guess.
18:42 ^🔗	alard	It can be lower for very shallow sites, but must be high for sites with a deep structure. You could also set try to ignore the looping urls with one of the ignore options.
18:52 ^🔗	dashcloud	thanks
19:31 ^🔗	dashcloud	hi folks, I did a basic grab of touchatag.com using these settings: http://pastebin.com/nzSnPfz7 and it would be great if someone could double check it- I appear to have missed this page: http://www.touchatag.com/downloads and I'm not quite sure how
20:00 ^🔗	alard	dashcloud: I'm getting http://www.touchatag.com/downloads , so no idea what's wrong. (You might want to add --page-requisites, but that's something else.)
21:01 ^🔗	dashcloud	thanks!

irclogger-viewer