Time |
Nickname |
Message |
00:17
🔗
|
Wyatt|Wor |
wepp@Hildr ~/Archiveteam/sdb $ du -sh mobileme-grab04/data/t/ta/tak/take_junichiro/ |
00:17
🔗
|
Wyatt|Wor |
32G mobileme-grab04/data/t/ta/tak/take_junichiro/ |
00:17
🔗
|
Wyatt|Wor |
Hmmmmm.... |
00:18
🔗
|
Wyatt|Wor |
Well, that explains why it's been going for two and a half days now. |
00:23
🔗
|
SmileyG_ |
lol |
00:54
🔗
|
Pronoiac |
Wyatt|Wor: When that happens, do you check the wget log for problems? |
00:56
🔗
|
Wyatt|Wor |
Pronoiac: I haven't yet. I only just noticed that it was happening an hour ago |
00:58
🔗
|
shaqfu |
Hm, time to figure out why wget refused to follow links within these pages |
00:59
🔗
|
shaqfu |
Despite all being under the same structure |
01:01
🔗
|
Wyatt|Wor |
Pronoiac: Good call. This is in loop-hell |
01:03
🔗
|
Wyatt|Wor |
Looking for even this one file, http://dpaste.com/739562/ |
01:04
🔗
|
Wyatt|Wor |
I cut it off at 50, but it shows in the log 486 times... |
01:05
🔗
|
Wyatt|Wor |
Should I just remove that whole directory? |
01:08
🔗
|
Pronoiac |
Uh, I'm not an authority on this, but I just ctrl-C'ed it & restarted seesaw. |
01:09
🔗
|
Pronoiac |
I'd leave the directory in place, unless you need the space. |
01:09
🔗
|
Pronoiac |
It might be useful for diagnosis later. |
01:09
🔗
|
Pronoiac |
I figure another pass will be useful for weird items like this. |
01:15
🔗
|
Pronoiac |
I have some technical thoughts on the recursive loops. Should I share? |
01:22
🔗
|
Wyatt|Wor |
Pronoiac: I've got no objections. I'll note that alard is AFAIK the maintainer for universal-tracker and the downloader clients. |
01:23
🔗
|
Pronoiac |
Should I braindump here, or contact alard? |
01:24
🔗
|
Wyatt|Wor |
I think he'll see it when ever he sees it if you put it here. Also opens it up for commentary from others. |
01:37
🔗
|
Pronoiac |
Okay, so here's one problem: Some items form a recursive loop happening with multiple slashes - / -> // -> /// etc. |
01:37
🔗
|
Pronoiac |
I *think* the norm to avoid recursive loops in wget uses timestamping (-N) to avoid re-fetching items. |
01:37
🔗
|
Pronoiac |
This option's disabled by default, to avoid warc searching or something, even when we're saving files into the usual tree. |
01:37
🔗
|
Pronoiac |
I edited a local wget-warc to enable that, but it didn't work - |
01:37
🔗
|
Pronoiac |
after fetching a // file, it would parse it before checking for an already-existing file. (This might have been due to my faulty coding.) |
01:37
🔗
|
Pronoiac |
So, I see a couple of ways forward: |
01:37
🔗
|
Pronoiac |
Option 1. Fix timestamping in wget-warc, & check for an existent file before trying to fetch or parse. |
01:38
🔗
|
Pronoiac |
Option 2. Instead of doing an automatic recursive fetch, build a list of files somehow, and pass that list to wget. |
01:38
🔗
|
Pronoiac |
As a bonus, option 2 would avoid a problem I've seen elsewhere, where hundreds of references to 404'ed files result in hundreds of requests to those files, with wget using gigabytes of memory and getting oom-killed. |
01:39
🔗
|
Pronoiac |
Cool? |
01:39
🔗
|
underscor |
So, I got my drivers license yesterday |
01:39
🔗
|
underscor |
FIRST TIME ALONE? |
01:39
🔗
|
underscor |
Going to Burger King |
01:39
🔗
|
underscor |
hell yes |
01:39
🔗
|
underscor |
hahahahah |
02:12
🔗
|
Wyatt|Wor |
See the annoying thing here is wget can't suppress multiple consecutive slashes or it leads to unexpected behaviour. |
02:13
🔗
|
Wyatt|Wor |
According to this thread, it used to but no longer does: http://marc.info/?l=wget&m=108972466200930&w=2 |
02:13
🔗
|
balrog_ |
... that's a problem :[ |
02:17
🔗
|
Wyatt|Wor |
Also relevant http://osdir.com/ml/bug-wget-gnu/2010-06/msg00012.html |
02:20
🔗
|
Wyatt|Wor |
Seems the proposal for a URI slash-normalising option was never acked. |
02:27
🔗
|
Wyatt|Wor |
Now that I look...this looks like it was just spidering over a bunch of other users' homepage.mac.com |
02:29
🔗
|
Wyatt|Wor |
Yeah, totally. |
02:29
🔗
|
Wyatt|Wor |
I just downloaded about 340 people's homepage sites. :/ |
02:30
🔗
|
balrog_ |
is that bad? |
02:30
🔗
|
balrog_ |
or have those already been downloaded? |
02:32
🔗
|
Wyatt|Wor |
balrog_: I have no way of knowing that, but it's probably somewhat suboptimal that attemptin to mirror homepage.mac.com/take_junichiro/ started me off on spidering what could have been all of homepage.mac.com |
02:32
🔗
|
balrog_ |
this is true. |
02:35
🔗
|
Pronoiac |
That sounds familiar - let me see if I can help narrow down the problem. |
02:37
🔗
|
Pronoiac |
I had swampyhatto spread out over around 150 others. |
02:51
🔗
|
Pronoiac |
Okay, I've done a bit of grepping - I think the problem came from weirdly formed links - like <link href="http://homepage.mac.com/[elsewhere]" ...> |
02:52
🔗
|
Wyatt|Wor |
That's what I thought. |
02:53
🔗
|
Pronoiac |
If you want to do similar diagnosis, go into the data directory that it's fetching into, "ls -alt | tail", and recursively grep for some of the oldest stuff in the right folder. |
02:54
🔗
|
Pronoiac |
I think it's meant for stylesheets & whatnot, so wget thinks they're necessary for proper display. |
05:20
🔗
|
SketchCo1 |
Ops, please. |
05:58
🔗
|
LordNlptp |
cannot give ops, sry |
05:59
🔗
|
LordNlptp |
someone op SketchCo1 plz |
07:52
🔗
|
quaggmyre |
greetings dan |
07:52
🔗
|
quaggmyre |
how's Michigan tonight? |
07:53
🔗
|
dan |
hello |
07:53
🔗
|
dan |
cold. |
07:53
🔗
|
quaggmyre |
almost may, should be warming up! |
07:53
🔗
|
dan |
would be nice. |
15:16
🔗
|
SketchCo1 |
OK. |
15:17
🔗
|
SketchCo1 |
1. Batcave had the vast majority of material fly over. Almost done with it. So, alard - please start moving people away from batcave. |
15:18
🔗
|
SketchCo1 |
2. A new batcave will replace it in the future, but I didn't think it fair to be chop-chop when I was 2 months to get batcave clean |
15:19
🔗
|
SketchCo1 |
3. The temporary holdoff worked - I've gotten us caught up on the mobileme uploads. 40% disk utilitzation and dropping on that disk. |
15:19
🔗
|
kennethre |
oli: is this you? http://flask.pocoo.org/snippets/57/ |
15:48
🔗
|
ersi_ |
kennethre: hm? microtransactions? ;o |
15:58
🔗
|
archinet |
01,00/whois archinet |
16:45
🔗
|
godane |
SketchCo1: I can't fix this error: http://archive.org/details/this_week_in_fun |
16:45
🔗
|
godane |
SketchCo1: this is also have the same problem here: http://archive.org/details/abbys_road |
16:46
🔗
|
SketchCo1 |
YEah |
16:46
🔗
|
SketchCo1 |
Yes. |
16:46
🔗
|
SketchCo1 |
Yes, your XML makes the system crash. |
16:46
🔗
|
godane |
its cause some meta tags in the mp3s are weird |
16:46
🔗
|
SketchCo1 |
Yes. |
16:47
🔗
|
SketchCo1 |
I'll mention it to the devs. |
16:47
🔗
|
godane |
thanks |
16:48
🔗
|
godane |
i emailed in there error report on friday and no reply yet |
16:49
🔗
|
godane |
i think i stop this from happen with roz_rows_the_pacific since had to tell archive to look at .nfo as text files |
16:50
🔗
|
godane |
i just noticed the titles of the mp3s was off |
16:51
🔗
|
godane |
SketchCo1: this new podcast can be added to twit-podcasts: http://archive.org/details/the_laporte_report |
17:00
🔗
|
godane |
i'm now uploading jumping monkeys |
17:17
🔗
|
Schbirid |
oh jamendo, http://blog.jamendo.com/2012/04/26/jamendo-has-a-new-look/ |
19:46
🔗
|
alard |
SketchCo1: Thanks. I have switched the memac upload target. The current uploads will finish via batcave, new ones will go directly to s3. |
20:00
🔗
|
godane |
Jumping Monkeys is uploaded finally: http://archive.org/details/jumping_monkeys |
21:12
🔗
|
godane |
i found game shark magazine |
22:49
🔗
|
underscor |
http://twitter.com/#!/h0mfr0g/status/28509114379 |
22:50
🔗
|
nitro2k01 |
OH SNAP |