#archiveteam 2012-04-29,Sun

↑back Search

Time Nickname Message
00:17 🔗 Wyatt|Wor wepp@Hildr ~/Archiveteam/sdb $ du -sh mobileme-grab04/data/t/ta/tak/take_junichiro/
00:17 🔗 Wyatt|Wor 32G mobileme-grab04/data/t/ta/tak/take_junichiro/
00:17 🔗 Wyatt|Wor Hmmmmm....
00:18 🔗 Wyatt|Wor Well, that explains why it's been going for two and a half days now.
00:23 🔗 SmileyG_ lol
00:54 🔗 Pronoiac Wyatt|Wor: When that happens, do you check the wget log for problems?
00:56 🔗 Wyatt|Wor Pronoiac: I haven't yet. I only just noticed that it was happening an hour ago
00:58 🔗 shaqfu Hm, time to figure out why wget refused to follow links within these pages
00:59 🔗 shaqfu Despite all being under the same structure
01:01 🔗 Wyatt|Wor Pronoiac: Good call. This is in loop-hell
01:03 🔗 Wyatt|Wor Looking for even this one file, http://dpaste.com/739562/
01:04 🔗 Wyatt|Wor I cut it off at 50, but it shows in the log 486 times...
01:05 🔗 Wyatt|Wor Should I just remove that whole directory?
01:08 🔗 Pronoiac Uh, I'm not an authority on this, but I just ctrl-C'ed it & restarted seesaw.
01:09 🔗 Pronoiac I'd leave the directory in place, unless you need the space.
01:09 🔗 Pronoiac It might be useful for diagnosis later.
01:09 🔗 Pronoiac I figure another pass will be useful for weird items like this.
01:15 🔗 Pronoiac I have some technical thoughts on the recursive loops. Should I share?
01:22 🔗 Wyatt|Wor Pronoiac: I've got no objections. I'll note that alard is AFAIK the maintainer for universal-tracker and the downloader clients.
01:23 🔗 Pronoiac Should I braindump here, or contact alard?
01:24 🔗 Wyatt|Wor I think he'll see it when ever he sees it if you put it here. Also opens it up for commentary from others.
01:37 🔗 Pronoiac Okay, so here's one problem: Some items form a recursive loop happening with multiple slashes - / -> // -> /// etc.
01:37 🔗 Pronoiac I *think* the norm to avoid recursive loops in wget uses timestamping (-N) to avoid re-fetching items.
01:37 🔗 Pronoiac This option's disabled by default, to avoid warc searching or something, even when we're saving files into the usual tree.
01:37 🔗 Pronoiac I edited a local wget-warc to enable that, but it didn't work -
01:37 🔗 Pronoiac after fetching a // file, it would parse it before checking for an already-existing file. (This might have been due to my faulty coding.)
01:37 🔗 Pronoiac So, I see a couple of ways forward:
01:37 🔗 Pronoiac Option 1. Fix timestamping in wget-warc, & check for an existent file before trying to fetch or parse.
01:38 🔗 Pronoiac Option 2. Instead of doing an automatic recursive fetch, build a list of files somehow, and pass that list to wget.
01:38 🔗 Pronoiac As a bonus, option 2 would avoid a problem I've seen elsewhere, where hundreds of references to 404'ed files result in hundreds of requests to those files, with wget using gigabytes of memory and getting oom-killed.
01:39 🔗 Pronoiac Cool?
01:39 🔗 underscor So, I got my drivers license yesterday
01:39 🔗 underscor FIRST TIME ALONE?
01:39 🔗 underscor Going to Burger King
01:39 🔗 underscor hell yes
01:39 🔗 underscor hahahahah
02:12 🔗 Wyatt|Wor See the annoying thing here is wget can't suppress multiple consecutive slashes or it leads to unexpected behaviour.
02:13 🔗 Wyatt|Wor According to this thread, it used to but no longer does: http://marc.info/?l=wget&m=108972466200930&w=2
02:13 🔗 balrog_ ... that's a problem :[
02:17 🔗 Wyatt|Wor Also relevant http://osdir.com/ml/bug-wget-gnu/2010-06/msg00012.html
02:20 🔗 Wyatt|Wor Seems the proposal for a URI slash-normalising option was never acked.
02:27 🔗 Wyatt|Wor Now that I look...this looks like it was just spidering over a bunch of other users' homepage.mac.com
02:29 🔗 Wyatt|Wor Yeah, totally.
02:29 🔗 Wyatt|Wor I just downloaded about 340 people's homepage sites. :/
02:30 🔗 balrog_ is that bad?
02:30 🔗 balrog_ or have those already been downloaded?
02:32 🔗 Wyatt|Wor balrog_: I have no way of knowing that, but it's probably somewhat suboptimal that attemptin to mirror homepage.mac.com/take_junichiro/ started me off on spidering what could have been all of homepage.mac.com
02:32 🔗 balrog_ this is true.
02:35 🔗 Pronoiac That sounds familiar - let me see if I can help narrow down the problem.
02:37 🔗 Pronoiac I had swampyhatto spread out over around 150 others.
02:51 🔗 Pronoiac Okay, I've done a bit of grepping - I think the problem came from weirdly formed links - like <link href="http://homepage.mac.com/[elsewhere]" ...>
02:52 🔗 Wyatt|Wor That's what I thought.
02:53 🔗 Pronoiac If you want to do similar diagnosis, go into the data directory that it's fetching into, "ls -alt | tail", and recursively grep for some of the oldest stuff in the right folder.
02:54 🔗 Pronoiac I think it's meant for stylesheets & whatnot, so wget thinks they're necessary for proper display.
05:20 🔗 SketchCo1 Ops, please.
05:58 🔗 LordNlptp cannot give ops, sry
05:59 🔗 LordNlptp someone op SketchCo1 plz
07:52 🔗 quaggmyre greetings dan
07:52 🔗 quaggmyre how's Michigan tonight?
07:53 🔗 dan hello
07:53 🔗 dan cold.
07:53 🔗 quaggmyre almost may, should be warming up!
07:53 🔗 dan would be nice.
15:16 🔗 SketchCo1 OK.
15:17 🔗 SketchCo1 1. Batcave had the vast majority of material fly over. Almost done with it. So, alard - please start moving people away from batcave.
15:18 🔗 SketchCo1 2. A new batcave will replace it in the future, but I didn't think it fair to be chop-chop when I was 2 months to get batcave clean
15:19 🔗 SketchCo1 3. The temporary holdoff worked - I've gotten us caught up on the mobileme uploads. 40% disk utilitzation and dropping on that disk.
15:19 🔗 kennethre oli: is this you? http://flask.pocoo.org/snippets/57/
15:48 🔗 ersi_ kennethre: hm? microtransactions? ;o
15:58 🔗 archinet 01,00/whois archinet
16:45 🔗 godane SketchCo1: I can't fix this error: http://archive.org/details/this_week_in_fun
16:45 🔗 godane SketchCo1: this is also have the same problem here: http://archive.org/details/abbys_road
16:46 🔗 SketchCo1 YEah
16:46 🔗 SketchCo1 Yes.
16:46 🔗 SketchCo1 Yes, your XML makes the system crash.
16:46 🔗 godane its cause some meta tags in the mp3s are weird
16:46 🔗 SketchCo1 Yes.
16:47 🔗 SketchCo1 I'll mention it to the devs.
16:47 🔗 godane thanks
16:48 🔗 godane i emailed in there error report on friday and no reply yet
16:49 🔗 godane i think i stop this from happen with roz_rows_the_pacific since had to tell archive to look at .nfo as text files
16:50 🔗 godane i just noticed the titles of the mp3s was off
16:51 🔗 godane SketchCo1: this new podcast can be added to twit-podcasts: http://archive.org/details/the_laporte_report
17:00 🔗 godane i'm now uploading jumping monkeys
17:17 🔗 Schbirid oh jamendo, http://blog.jamendo.com/2012/04/26/jamendo-has-a-new-look/
19:46 🔗 alard SketchCo1: Thanks. I have switched the memac upload target. The current uploads will finish via batcave, new ones will go directly to s3.
20:00 🔗 godane Jumping Monkeys is uploaded finally: http://archive.org/details/jumping_monkeys
21:12 🔗 godane i found game shark magazine
22:49 🔗 underscor http://twitter.com/#!/h0mfr0g/status/28509114379
22:50 🔗 nitro2k01 OH SNAP

irclogger-viewer