[10:40] http://archive.org/about/dmca.php I had no idea this was a thing [10:40] \o/ [16:07] i got the blazetv doc called The Project [16:59] for wget warc do I need to include a header? [17:06] apparently you can't use two separate warc headers- everything has to be combined into a single --warc-header command [17:58] dashcloud: You should be able to use multiple --warc-header options. [18:00] You could do something like wget --warc-header="operator: Archive Team" --warc-header="x-something-else: value" [18:01] You can use any header you want, as long as it follows the name: value format. The headers will be stored in the warc-info record at the top of the warc file. [18:03] ah- that explains it [18:03] I didn't have a colon in the second header command [18:05] so how much should I set recursion to in order to avoid infinite loops? [18:40] I'm not sure if Wget checks the headers, it might just copy the strings. [18:40] Recursion, well, that depends on what you're doing, I guess. [18:42] It can be lower for very shallow sites, but must be high for sites with a deep structure. You could also set try to ignore the looping urls with one of the ignore options. [18:52] thanks [19:31] hi folks, I did a basic grab of touchatag.com using these settings: http://pastebin.com/nzSnPfz7 and it would be great if someone could double check it- I appear to have missed this page: http://www.touchatag.com/downloads and I'm not quite sure how [20:00] dashcloud: I'm getting http://www.touchatag.com/downloads , so no idea what's wrong. (You might want to add --page-requisites, but that's something else.) [21:01] thanks!