Time |
Nickname |
Message |
00:58
🔗
|
SketchCow |
I'm actually pulling down Jamendo. |
01:11
🔗
|
db48x |
slurrp |
01:12
🔗
|
SketchCow |
http://www.flickr.com/photos/ssmcintyre/6027454101/ |
01:13
🔗
|
db48x |
nice hat |
01:26
🔗
|
Aranje |
best title slide |
02:00
🔗
|
SketchCow |
Wow |
02:01
🔗
|
* |
SketchCow is listening to Brewster address some visiting Chinese officials related to libraries. |
02:01
🔗
|
SketchCow |
He switched completely to phraseology related to status and respect. |
02:01
🔗
|
SketchCow |
AWESOME |
02:01
🔗
|
SketchCow |
He speaks multiple social construct |
03:53
🔗
|
db48xOthe |
SketchCow: link? |
04:03
🔗
|
SketchCow |
Which |
04:03
🔗
|
SketchCow |
No, dude, I meant IN FRONT OF ME |
04:04
🔗
|
SketchCow |
http://www.flickr.com/photos/textfiles/6031354308/in/photostream |
04:10
🔗
|
db48xOthe |
heh |
04:13
🔗
|
db48xOthe |
VGA Spectrum |
04:15
🔗
|
db48xOthe |
aww, I missed the Linotype book |
04:15
🔗
|
db48xOthe |
guess I'll just have to go back |
04:24
🔗
|
swebb |
ndurner: is the google groups profile tracker broken or is there just no more work? |
04:28
🔗
|
db48xOthe |
http://www.youtube.com/watch?v=YzG94ct2d5k |
05:06
🔗
|
db48xOthe |
doh |
05:06
🔗
|
db48xOthe |
alard: wget now crashes when I run it |
05:14
🔗
|
SketchCow |
Disappearing for the night. |
05:14
🔗
|
SketchCow |
Let's check on things tomorrow! |
05:14
🔗
|
SketchCow |
I'm onsite, we can have fun. |
05:21
🔗
|
db48xOthe |
SketchCow: later |
05:22
🔗
|
db48xOthe |
#0 0x0000003c92446781 in _IO_vfprintf_internal (s=<value optimized out>, format=<value optimized out>, ap=<value optimized out>) at vfprintf.c:1567 |
05:22
🔗
|
db48xOthe |
#1 0x000000000041ed2b in log_vprintf_internal () |
05:22
🔗
|
db48xOthe |
#2 0x000000000041efc0 in logprintf () |
05:22
🔗
|
db48xOthe |
#3 0x0000000000429fb4 in warc_start_new_file () |
05:22
🔗
|
db48xOthe |
#4 0x000000000042a208 in warc_init () |
05:22
🔗
|
db48xOthe |
#5 0x00000000004202e8 in main () |
05:22
🔗
|
db48xOthe |
sadly, no line numbers |
05:28
🔗
|
db48xOthe |
hrm |
05:28
🔗
|
db48xOthe |
nothing jumps out at me in the source |
05:28
🔗
|
* |
db48xOthe will recompile with debug infor |
06:46
🔗
|
ersi |
jch: haha, no.. sorry :) |
09:09
🔗
|
alard |
db48x: Sorry, I really can't reproduce your Wget errors. Have you updated and recompiled everything? Which command line arguments are you using? |
14:19
🔗
|
emijrp |
thinking about... twitter |
14:20
🔗
|
emijrp |
it may be a solution for the digital dark age about user communication |
14:20
🔗
|
emijrp |
we cannot archive emails, but we can tweets |
14:25
🔗
|
Wyatt |
Is not the Library of Congress sufficient in this regard? |
14:28
🔗
|
db48x2 |
can't archive all tweets |
14:29
🔗
|
db48x2 |
and I would say that not being able to archive all emails is a feature, not a drawback |
14:32
🔗
|
db48x2 |
sweet |
14:33
🔗
|
db48x2 |
rsync finally finished |
14:33
🔗
|
db48x2 |
a stupendous average speed of 300kBps |
16:36
🔗
|
underscor |
http://www.flickr.com/photos/textfiles/6031354308/in/photostream Sparticus? |
16:36
🔗
|
underscor |
Spartacus* |
16:56
🔗
|
SketchCow |
Spartacus as a concept is a whole thing related to what we now call anonymous |
16:58
🔗
|
SketchCow |
Along with all the usual caveats of success has a thousand fathers, I actually came up with a framework for what you'd call a peer to peer system in 1995. |
16:58
🔗
|
SketchCow |
And the name was Spartacus. |
17:18
🔗
|
underscor |
Really? |
17:18
🔗
|
underscor |
That's neat |
17:21
🔗
|
SketchCow |
Well, people come up with ideas all the time. |
17:21
🔗
|
SketchCow |
This one didn't happen either. |
17:22
🔗
|
SketchCow |
http://www.archive.org/details/ColossalCookbook |
17:22
🔗
|
SketchCow |
COLOSSAL |
17:22
🔗
|
SketchCow |
COOKBOOK |
17:22
🔗
|
SketchCow |
Only 63mb |
17:34
🔗
|
DFJustin |
ha why does it have a picture of the enterprise |
17:37
🔗
|
SketchCow |
NOT SURE |
17:40
🔗
|
alard |
SketchCow: I uploaded a download of Sprouter.com to blindtiger/gv14 (assuming you're interested). |
17:40
🔗
|
alard |
They're shutting down, or not, they don't know yet. |
17:43
🔗
|
db48x |
SketchCow: my upload finished, so you can have the slot back |
17:45
🔗
|
SketchCow |
OK. |
17:45
🔗
|
SketchCow |
To both. |
17:45
🔗
|
SketchCow |
alard, I need you using batcave from now on. |
17:46
🔗
|
SketchCow |
Also, did I hear this right, you've successfully gotten wget to accept the warc? |
17:46
🔗
|
SketchCow |
I.e. the project? |
17:46
🔗
|
db48x |
no, we've presented it on the mailing list and the respose was favorable |
17:47
🔗
|
alard |
SketchCow: Okay. Same login details? |
17:47
🔗
|
SketchCow |
No, not even close. :) |
17:47
🔗
|
SketchCow |
I still need to set it all up, I think. |
17:48
🔗
|
alard |
The wget maintainer will look at the WARC code in the coming days, but he seems to be interested. |
17:48
🔗
|
alard |
So with a bit of luck, it will be accepted eventually. |
17:50
🔗
|
db48x |
there was some push-back in the form of questions about the coding style, FSF copyright assignment, introduction of new libraries, etc |
17:50
🔗
|
db48x |
but less than I expected |
17:52
🔗
|
db48x |
in fact, they didn't even ask why anyone would possibly want such a thing, which is good |
19:08
🔗
|
alard |
Hey, would it be a good idea to start downloading stuff from mac.com / mobileme? |
19:09
🔗
|
alard |
There's still a lot of time before it closes, but maybe people will start moving their sites well before that. |
19:10
🔗
|
alard |
That, in turn, would lead to broken links. |
19:11
🔗
|
SketchCow |
I am not against some tests in trying. |
19:14
🔗
|
emijrp |
SketchCow: where are you donwloading jamendo, at home or at IA? |
19:15
🔗
|
emijrp |
I think we can create IA items using the metadata database |
19:37
🔗
|
SketchCow |
IA |
20:37
🔗
|
alard |
db48x: I think I may have found the source of the Wget-warc crashes. I guess you logged to a file (with -o )? |
20:38
🔗
|
alard |
If you did, then it should be fixed now. |
20:41
🔗
|
alard |
Also: the MobileMe scripts are ready. They work for me, but if anyone else wants to try them, I more than welcome your views. :) |
20:41
🔗
|
alard |
http://archiveteam.org/index.php?title=MobileMe |
20:44
🔗
|
db48x |
alard: ah, yes |
20:47
🔗
|
alard |
Apparently you're not allowed to use the same variable arguments list twice. |
20:48
🔗
|
alard |
It even says so in the comments above the logging function. I should have read that before I copied the print line. |
20:51
🔗
|
db48x |
gotcha |
20:51
🔗
|
db48x |
I never did manage a successful debug build |
20:52
🔗
|
db48x |
it failed to compile |
20:52
🔗
|
db48x |
has anyone else archived the Google Friends Newsletter? |
20:52
🔗
|
SketchCow |
didn't I ask you to do that? |
20:53
🔗
|
SketchCow |
Or did another person do it. |
20:53
🔗
|
db48x |
yea |
20:53
🔗
|
db48x |
but wget is failing me in this case, so I wanted to know if anyone else had tried it |
20:59
🔗
|
alard |
You can get an RSS feed that lists all messages on google groups (from May 2005) |
20:59
🔗
|
alard |
http://groups.google.com/group/google-friends/browse_thread/thread/42d1d0cdbef323f7 |
20:59
🔗
|
alard |
Sorry. |
21:00
🔗
|
db48x |
that would be one way to do it |
21:00
🔗
|
alard |
http://groups.google.com/group/google-friends/feed/rss_v2_0_topics.xml?num=500 |
21:00
🔗
|
db48x |
the problem is that wget just never recurses into the threads |
21:00
🔗
|
chronomex |
does the rss feed give just message urls or all the content? |
21:00
🔗
|
chronomex |
oh wait I can click on that can't i |
21:01
🔗
|
db48x |
bbl |
21:01
🔗
|
chronomex |
harrumph |
21:01
🔗
|
alard |
No, it only gives the message urls. |
21:02
🔗
|
alard |
But if you download that, you'll get the message. |
21:04
🔗
|
alard |
Okay, got a list of docids now. |
21:06
🔗
|
SketchCow |
So one bummer with Jamendo is that when it fails, and it does fail, it wastes a lot of time re-checking the files. |
21:06
🔗
|
SketchCow |
A resume that did some investigation would be nice. |
21:10
🔗
|
yipdw |
alard: re: MobileMe archiving -- do you know of a way we can grab a username set? |
21:10
🔗
|
alard |
I have a list of +/- 250,000 of those. (Gathered from scraping Google and the wayback machine.) |
21:10
🔗
|
yipdw |
ok |
21:10
🔗
|
yipdw |
I was just plugging site:web.me.com into Google |
21:11
🔗
|
alard |
But there may be other sources too. |
21:11
🔗
|
alard |
I did something like that. But you'll probably find different ones than the ones I have. |
21:11
🔗
|
yipdw |
it's possible, yeah |
21:11
🔗
|
yipdw |
want to merge lists? |
21:12
🔗
|
yipdw |
(I've not made one yet0 |
21:17
🔗
|
alard |
Sure, I'll upload it somewhere. |
21:23
🔗
|
alard |
yipdw: http://db.tt/TUFAMav |
21:23
🔗
|
yipdw |
awesome |
21:23
🔗
|
yipdw |
I've almost finished writing a scraper for web.me.com/[x], will run it for a day or so and see what I can get |
21:24
🔗
|
yipdw |
how often can you hit google.com/search without getting flagged as a robot? I'm waiting 5 seconds between requests |
21:25
🔗
|
alard |
I'm not sure. I used the ipv6 trick, with 1024 ip addresses I just continued to send one request at a time, without pausing. |
21:25
🔗
|
yipdw |
ahh |
21:25
🔗
|
alard |
Each time using an ip from a different subnet. |
21:25
🔗
|
* |
yipdw should really get a 6to4 tunnel |
21:25
🔗
|
alard |
They're fun. |
21:26
🔗
|
yipdw |
huh |
21:26
🔗
|
yipdw |
I actually found one that isn't in your list |
21:26
🔗
|
yipdw |
chado2010 |
21:26
🔗
|
yipdw |
go figure |
21:26
🔗
|
yipdw |
that was fast |
21:28
🔗
|
alard |
There must be much more than 250,000. |
21:28
🔗
|
yipdw |
indeed |
21:29
🔗
|
alard |
(And my list contains some artifacts from my parser, so not everything is really a username.) |
21:32
🔗
|
alard |
SketchCow: db48x: Here's a tar.gz with the Google Friends archive. http://db.tt/hKYfWbM |
21:34
🔗
|
db48x |
alard: cool |
21:34
🔗
|
db48x |
doh |
21:34
🔗
|
db48x |
wget failed to compile :P |
21:34
🔗
|
alard |
Again? |
21:34
🔗
|
db48x |
wget_warc.c:16:20: fatal error: base32.h: No such file or directory |
21:34
🔗
|
alard |
./bootstrap |
21:34
🔗
|
db48x |
gcc -DHAVE_CONFIG_H -DSYSTEM_WGETRC=\"/usr/local/etc/wgetrc\" -DLOCALEDIR=\"/usr/local/share/locale\" -I. -I../lib -I../lib -I../libwarc/warc-tools/lib/public -I../libwarc/warc-tools/lib/private -I../libwarc/warc-tools/lib/private/os -I/include -O2 -Wall -MT wget_warc.o -MD -MP -MF .deps/wget_warc.Tpo -c -o wget_warc.o wget_warc.c |
21:34
🔗
|
db48x |
did that |
21:34
🔗
|
* |
db48x does it again |
21:35
🔗
|
db48x |
alard: did you create that archive by parsing the feed? |
21:36
🔗
|
alard |
More or less. I grepped the feed to get the docids of the messages. |
21:36
🔗
|
db48x |
right |
21:36
🔗
|
alard |
Then I used wget to get the original messages: http://groups.google.com/group/google-friends/msg/eb52c83c13717d31?dmode=source&output=gplain |
21:36
🔗
|
alard |
(and filling in the docid) |
21:36
🔗
|
alard |
Plus some tricks to get the date in the filename. |
21:38
🔗
|
alard |
(That's when I found out that the January 2006 newsletter has January 2005 in its subject line.) |
21:39
🔗
|
db48x |
lol |
21:43
🔗
|
yipdw |
ok |
21:43
🔗
|
yipdw |
Google's about to get a lot more traffic from a particularly bored Linux user running Firefox 3.6 |
21:44
🔗
|
yipdw |
in other words, totally normal usage profile |
21:44
🔗
|
alard |
db48x: Just tried an fresh clone, didn't compile. Missing m4/base32 |
21:45
🔗
|
alard |
Another .gitignore problem: I didn't commit the gnulib-local/m4 directory because it was hidden. I've added it now. |
21:46
🔗
|
alard |
yipdw: :) |
21:47
🔗
|
db48x |
alard: aha |
21:47
🔗
|
db48x |
yipdw: lol |
22:28
🔗
|
underscor |
Command is the most irritating fucking key ever invented |
22:32
🔗
|
db48x |
underscor: yup |
22:33
🔗
|
chronomex |
no, sysrq is |
22:53
🔗
|
db48x |
sysrq is merely unused |
22:56
🔗
|
Wyatt |
Hehe, not _always_. (Reboot System Even If Utterly Broken :D) |
22:58
🔗
|
chronomex |
ok ok capslock is more irritating than cmd |
22:59
🔗
|
Wyatt |
Caps lock is a plague |
22:59
🔗
|
chronomex |
yup |
23:13
🔗
|
underscor |
I use sysrq all the time |
23:14
🔗
|
underscor |
chronomex: I usually remap backspace to capslock |
23:14
🔗
|
underscor |
Saves a pinky extension |
23:22
🔗
|
dashcloud |
chronomex: do you have a keyboard where print screen and sysrq don't share the same key? |
23:32
🔗
|
SketchCow |
Google Friends Newsletter looks like 200k .tar'd. Right? |
23:49
🔗
|
chronomex |
I usually make "capslock" be a control key and I make "pause/break" be a capslock key |
23:49
🔗
|
chronomex |
dashcloud: no |
23:55
🔗
|
swebb |
I'm guessing that the google groups profiles job is done? |