#archiveteam 2012-09-30,Sun

↑back Search

Time Nickname Message
10:40 🔗 underscor http://archive.org/about/dmca.php I had no idea this was a thing
10:40 🔗 underscor \o/
16:07 🔗 godane i got the blazetv doc called The Project
16:59 🔗 dashcloud for wget warc do I need to include a header?
17:06 🔗 dashcloud apparently you can't use two separate warc headers- everything has to be combined into a single --warc-header command
17:58 🔗 alard dashcloud: You should be able to use multiple --warc-header options.
18:00 🔗 alard You could do something like wget --warc-header="operator: Archive Team" --warc-header="x-something-else: value"
18:01 🔗 alard You can use any header you want, as long as it follows the name: value format. The headers will be stored in the warc-info record at the top of the warc file.
18:03 🔗 dashcloud ah- that explains it
18:03 🔗 dashcloud I didn't have a colon in the second header command
18:05 🔗 dashcloud so how much should I set recursion to in order to avoid infinite loops?
18:40 🔗 alard I'm not sure if Wget checks the headers, it might just copy the strings.
18:40 🔗 alard Recursion, well, that depends on what you're doing, I guess.
18:42 🔗 alard It can be lower for very shallow sites, but must be high for sites with a deep structure. You could also set try to ignore the looping urls with one of the ignore options.
18:52 🔗 dashcloud thanks
19:31 🔗 dashcloud hi folks, I did a basic grab of touchatag.com using these settings: http://pastebin.com/nzSnPfz7 and it would be great if someone could double check it- I appear to have missed this page: http://www.touchatag.com/downloads and I'm not quite sure how
20:00 🔗 alard dashcloud: I'm getting http://www.touchatag.com/downloads , so no idea what's wrong. (You might want to add --page-requisites, but that's something else.)
21:01 🔗 dashcloud thanks!

irclogger-viewer