#archiveteam 2014-01-09,Thu

↑back Search

Time Nickname Message
15:28 🔗 balrog Nemo_bis: as the deadline for lists.apple.com gets closer, I'd like to re-archive just the 2014 messages. is there an easy way to grab the index URLs for these from the wget log and feed them to wget-warc to produce an "update warc"?
15:28 🔗 balrog right now it's still going... there's a lot of stuff here.
15:31 🔗 Nemo_bis can't you just use wget patterns so that it rejects anything not from 2014?
15:31 🔗 balrog I could but then it would re-crawl a bunch of stuff
15:31 🔗 Nemo_bis I've only archived pipermail archives, not that custom kind
15:32 🔗 balrog they're custom but pretty simple; it's all html based
15:32 🔗 balrog lists.apple.com/archives/LISTNAME/year/month/msg#####.html
15:33 🔗 balrog I probably could run a regex on the log looking for urls matching lists.apple.com/archives/*/2014/January/index.html
15:34 🔗 balrog hmm then again
15:34 🔗 balrog that would miss stuff if currently there aren't posts from 2014 and someone adds a post
15:34 🔗 balrog your method would probably be better
23:05 🔗 SketchCow DFJustin: That Chatnfiles FTP grab is going to be a month, I can feel it.
23:06 🔗 SketchCow The bandwidth is essentially smoke signals and one of the indians is drunk
23:41 🔗 DFJustin may be better off contacting the guy and working something out, there is a shoutout to you on the front page of chatnfiles.com
23:49 🔗 DFJustin so it's not enemy territory

irclogger-viewer