[00:58] I'm actually pulling down Jamendo. [01:11] slurrp [01:12] http://www.flickr.com/photos/ssmcintyre/6027454101/ [01:13] nice hat [01:26] best title slide [02:00] Wow [02:01] * SketchCow is listening to Brewster address some visiting Chinese officials related to libraries. [02:01] He switched completely to phraseology related to status and respect. [02:01] AWESOME [02:01] He speaks multiple social construct [03:53] SketchCow: link? [04:03] Which [04:03] No, dude, I meant IN FRONT OF ME [04:04] http://www.flickr.com/photos/textfiles/6031354308/in/photostream [04:10] heh [04:13] VGA Spectrum [04:15] aww, I missed the Linotype book [04:15] guess I'll just have to go back [04:24] ndurner: is the google groups profile tracker broken or is there just no more work? [04:28] http://www.youtube.com/watch?v=YzG94ct2d5k [05:06] doh [05:06] alard: wget now crashes when I run it [05:14] Disappearing for the night. [05:14] Let's check on things tomorrow! [05:14] I'm onsite, we can have fun. [05:21] SketchCow: later [05:22] #0 0x0000003c92446781 in _IO_vfprintf_internal (s=, format=, ap=) at vfprintf.c:1567 [05:22] #1 0x000000000041ed2b in log_vprintf_internal () [05:22] #2 0x000000000041efc0 in logprintf () [05:22] #3 0x0000000000429fb4 in warc_start_new_file () [05:22] #4 0x000000000042a208 in warc_init () [05:22] #5 0x00000000004202e8 in main () [05:22] sadly, no line numbers [05:28] hrm [05:28] nothing jumps out at me in the source [05:28] * db48xOthe will recompile with debug infor [06:46] jch: haha, no.. sorry :) [09:09] db48x: Sorry, I really can't reproduce your Wget errors. Have you updated and recompiled everything? Which command line arguments are you using? [14:19] thinking about... twitter [14:20] it may be a solution for the digital dark age about user communication [14:20] we cannot archive emails, but we can tweets [14:25] Is not the Library of Congress sufficient in this regard? [14:28] can't archive all tweets [14:29] and I would say that not being able to archive all emails is a feature, not a drawback [14:32] sweet [14:33] rsync finally finished [14:33] a stupendous average speed of 300kBps [16:36] http://www.flickr.com/photos/textfiles/6031354308/in/photostream Sparticus? [16:36] Spartacus* [16:56] Spartacus as a concept is a whole thing related to what we now call anonymous [16:58] Along with all the usual caveats of success has a thousand fathers, I actually came up with a framework for what you'd call a peer to peer system in 1995. [16:58] And the name was Spartacus. [17:18] Really? [17:18] That's neat [17:21] Well, people come up with ideas all the time. [17:21] This one didn't happen either. [17:22] http://www.archive.org/details/ColossalCookbook [17:22] COLOSSAL [17:22] COOKBOOK [17:22] Only 63mb [17:34] ha why does it have a picture of the enterprise [17:37] NOT SURE [17:40] SketchCow: I uploaded a download of Sprouter.com to blindtiger/gv14 (assuming you're interested). [17:40] They're shutting down, or not, they don't know yet. [17:43] SketchCow: my upload finished, so you can have the slot back [17:45] OK. [17:45] To both. [17:45] alard, I need you using batcave from now on. [17:46] Also, did I hear this right, you've successfully gotten wget to accept the warc? [17:46] I.e. the project? [17:46] no, we've presented it on the mailing list and the respose was favorable [17:47] SketchCow: Okay. Same login details? [17:47] No, not even close. :) [17:47] I still need to set it all up, I think. [17:48] The wget maintainer will look at the WARC code in the coming days, but he seems to be interested. [17:48] So with a bit of luck, it will be accepted eventually. [17:50] there was some push-back in the form of questions about the coding style, FSF copyright assignment, introduction of new libraries, etc [17:50] but less than I expected [17:52] in fact, they didn't even ask why anyone would possibly want such a thing, which is good [19:08] Hey, would it be a good idea to start downloading stuff from mac.com / mobileme? [19:09] There's still a lot of time before it closes, but maybe people will start moving their sites well before that. [19:10] That, in turn, would lead to broken links. [19:11] I am not against some tests in trying. [19:14] SketchCow: where are you donwloading jamendo, at home or at IA? [19:15] I think we can create IA items using the metadata database [19:37] IA [20:37] db48x: I think I may have found the source of the Wget-warc crashes. I guess you logged to a file (with -o )? [20:38] If you did, then it should be fixed now. [20:41] Also: the MobileMe scripts are ready. They work for me, but if anyone else wants to try them, I more than welcome your views. :) [20:41] http://archiveteam.org/index.php?title=MobileMe [20:44] alard: ah, yes [20:47] Apparently you're not allowed to use the same variable arguments list twice. [20:48] It even says so in the comments above the logging function. I should have read that before I copied the print line. [20:51] gotcha [20:51] I never did manage a successful debug build [20:52] it failed to compile [20:52] has anyone else archived the Google Friends Newsletter? [20:52] didn't I ask you to do that? [20:53] Or did another person do it. [20:53] yea [20:53] but wget is failing me in this case, so I wanted to know if anyone else had tried it [20:59] You can get an RSS feed that lists all messages on google groups (from May 2005) [20:59] http://groups.google.com/group/google-friends/browse_thread/thread/42d1d0cdbef323f7 [20:59] Sorry. [21:00] that would be one way to do it [21:00] http://groups.google.com/group/google-friends/feed/rss_v2_0_topics.xml?num=500 [21:00] the problem is that wget just never recurses into the threads [21:00] does the rss feed give just message urls or all the content? [21:00] oh wait I can click on that can't i [21:01] bbl [21:01] harrumph [21:01] No, it only gives the message urls. [21:02] But if you download that, you'll get the message. [21:04] Okay, got a list of docids now. [21:06] So one bummer with Jamendo is that when it fails, and it does fail, it wastes a lot of time re-checking the files. [21:06] A resume that did some investigation would be nice. [21:10] alard: re: MobileMe archiving -- do you know of a way we can grab a username set? [21:10] I have a list of +/- 250,000 of those. (Gathered from scraping Google and the wayback machine.) [21:10] ok [21:10] I was just plugging site:web.me.com into Google [21:11] But there may be other sources too. [21:11] I did something like that. But you'll probably find different ones than the ones I have. [21:11] it's possible, yeah [21:11] want to merge lists? [21:12] (I've not made one yet0 [21:17] Sure, I'll upload it somewhere. [21:23] yipdw: http://db.tt/TUFAMav [21:23] awesome [21:23] I've almost finished writing a scraper for web.me.com/[x], will run it for a day or so and see what I can get [21:24] how often can you hit google.com/search without getting flagged as a robot? I'm waiting 5 seconds between requests [21:25] I'm not sure. I used the ipv6 trick, with 1024 ip addresses I just continued to send one request at a time, without pausing. [21:25] ahh [21:25] Each time using an ip from a different subnet. [21:25] * yipdw should really get a 6to4 tunnel [21:25] They're fun. [21:26] huh [21:26] I actually found one that isn't in your list [21:26] chado2010 [21:26] go figure [21:26] that was fast [21:28] There must be much more than 250,000. [21:28] indeed [21:29] (And my list contains some artifacts from my parser, so not everything is really a username.) [21:32] SketchCow: db48x: Here's a tar.gz with the Google Friends archive. http://db.tt/hKYfWbM [21:34] alard: cool [21:34] doh [21:34] wget failed to compile :P [21:34] Again? [21:34] wget_warc.c:16:20: fatal error: base32.h: No such file or directory [21:34] ./bootstrap [21:34] gcc -DHAVE_CONFIG_H -DSYSTEM_WGETRC=\"/usr/local/etc/wgetrc\" -DLOCALEDIR=\"/usr/local/share/locale\" -I. -I../lib -I../lib -I../libwarc/warc-tools/lib/public -I../libwarc/warc-tools/lib/private -I../libwarc/warc-tools/lib/private/os -I/include -O2 -Wall -MT wget_warc.o -MD -MP -MF .deps/wget_warc.Tpo -c -o wget_warc.o wget_warc.c [21:34] did that [21:34] * db48x does it again [21:35] alard: did you create that archive by parsing the feed? [21:36] More or less. I grepped the feed to get the docids of the messages. [21:36] right [21:36] Then I used wget to get the original messages: http://groups.google.com/group/google-friends/msg/eb52c83c13717d31?dmode=source&output=gplain [21:36] (and filling in the docid) [21:36] Plus some tricks to get the date in the filename. [21:38] (That's when I found out that the January 2006 newsletter has January 2005 in its subject line.) [21:39] lol [21:43] ok [21:43] Google's about to get a lot more traffic from a particularly bored Linux user running Firefox 3.6 [21:44] in other words, totally normal usage profile [21:44] db48x: Just tried an fresh clone, didn't compile. Missing m4/base32 [21:45] Another .gitignore problem: I didn't commit the gnulib-local/m4 directory because it was hidden. I've added it now. [21:46] yipdw: :) [21:47] alard: aha [21:47] yipdw: lol [22:28] Command is the most irritating fucking key ever invented [22:32] underscor: yup [22:33] no, sysrq is [22:53] sysrq is merely unused [22:56] Hehe, not _always_. (Reboot System Even If Utterly Broken :D) [22:58] ok ok capslock is more irritating than cmd [22:59] Caps lock is a plague [22:59] yup [23:13] I use sysrq all the time [23:14] chronomex: I usually remap backspace to capslock [23:14] Saves a pinky extension [23:22] chronomex: do you have a keyboard where print screen and sysrq don't share the same key? [23:32] Google Friends Newsletter looks like 200k .tar'd. Right? [23:49] I usually make "capslock" be a control key and I make "pause/break" be a capslock key [23:49] dashcloud: no [23:55] I'm guessing that the google groups profiles job is done?