#archiveteam 2011-08-11,Thu

↑back Search

Time Nickname Message
00:58 🔗 SketchCow I'm actually pulling down Jamendo.
01:11 🔗 db48x slurrp
01:12 🔗 SketchCow http://www.flickr.com/photos/ssmcintyre/6027454101/
01:13 🔗 db48x nice hat
01:26 🔗 Aranje best title slide
02:00 🔗 SketchCow Wow
02:01 🔗 * SketchCow is listening to Brewster address some visiting Chinese officials related to libraries.
02:01 🔗 SketchCow He switched completely to phraseology related to status and respect.
02:01 🔗 SketchCow AWESOME
02:01 🔗 SketchCow He speaks multiple social construct
03:53 🔗 db48xOthe SketchCow: link?
04:03 🔗 SketchCow Which
04:03 🔗 SketchCow No, dude, I meant IN FRONT OF ME
04:04 🔗 SketchCow http://www.flickr.com/photos/textfiles/6031354308/in/photostream
04:10 🔗 db48xOthe heh
04:13 🔗 db48xOthe VGA Spectrum
04:15 🔗 db48xOthe aww, I missed the Linotype book
04:15 🔗 db48xOthe guess I'll just have to go back
04:24 🔗 swebb ndurner: is the google groups profile tracker broken or is there just no more work?
04:28 🔗 db48xOthe http://www.youtube.com/watch?v=YzG94ct2d5k
05:06 🔗 db48xOthe doh
05:06 🔗 db48xOthe alard: wget now crashes when I run it
05:14 🔗 SketchCow Disappearing for the night.
05:14 🔗 SketchCow Let's check on things tomorrow!
05:14 🔗 SketchCow I'm onsite, we can have fun.
05:21 🔗 db48xOthe SketchCow: later
05:22 🔗 db48xOthe #0 0x0000003c92446781 in _IO_vfprintf_internal (s=<value optimized out>, format=<value optimized out>, ap=<value optimized out>) at vfprintf.c:1567
05:22 🔗 db48xOthe #1 0x000000000041ed2b in log_vprintf_internal ()
05:22 🔗 db48xOthe #2 0x000000000041efc0 in logprintf ()
05:22 🔗 db48xOthe #3 0x0000000000429fb4 in warc_start_new_file ()
05:22 🔗 db48xOthe #4 0x000000000042a208 in warc_init ()
05:22 🔗 db48xOthe #5 0x00000000004202e8 in main ()
05:22 🔗 db48xOthe sadly, no line numbers
05:28 🔗 db48xOthe hrm
05:28 🔗 db48xOthe nothing jumps out at me in the source
05:28 🔗 * db48xOthe will recompile with debug infor
06:46 🔗 ersi jch: haha, no.. sorry :)
09:09 🔗 alard db48x: Sorry, I really can't reproduce your Wget errors. Have you updated and recompiled everything? Which command line arguments are you using?
14:19 🔗 emijrp thinking about... twitter
14:20 🔗 emijrp it may be a solution for the digital dark age about user communication
14:20 🔗 emijrp we cannot archive emails, but we can tweets
14:25 🔗 Wyatt Is not the Library of Congress sufficient in this regard?
14:28 🔗 db48x2 can't archive all tweets
14:29 🔗 db48x2 and I would say that not being able to archive all emails is a feature, not a drawback
14:32 🔗 db48x2 sweet
14:33 🔗 db48x2 rsync finally finished
14:33 🔗 db48x2 a stupendous average speed of 300kBps
16:36 🔗 underscor http://www.flickr.com/photos/textfiles/6031354308/in/photostream Sparticus?
16:36 🔗 underscor Spartacus*
16:56 🔗 SketchCow Spartacus as a concept is a whole thing related to what we now call anonymous
16:58 🔗 SketchCow Along with all the usual caveats of success has a thousand fathers, I actually came up with a framework for what you'd call a peer to peer system in 1995.
16:58 🔗 SketchCow And the name was Spartacus.
17:18 🔗 underscor Really?
17:18 🔗 underscor That's neat
17:21 🔗 SketchCow Well, people come up with ideas all the time.
17:21 🔗 SketchCow This one didn't happen either.
17:22 🔗 SketchCow http://www.archive.org/details/ColossalCookbook
17:22 🔗 SketchCow COLOSSAL
17:22 🔗 SketchCow COOKBOOK
17:22 🔗 SketchCow Only 63mb
17:34 🔗 DFJustin ha why does it have a picture of the enterprise
17:37 🔗 SketchCow NOT SURE
17:40 🔗 alard SketchCow: I uploaded a download of Sprouter.com to blindtiger/gv14 (assuming you're interested).
17:40 🔗 alard They're shutting down, or not, they don't know yet.
17:43 🔗 db48x SketchCow: my upload finished, so you can have the slot back
17:45 🔗 SketchCow OK.
17:45 🔗 SketchCow To both.
17:45 🔗 SketchCow alard, I need you using batcave from now on.
17:46 🔗 SketchCow Also, did I hear this right, you've successfully gotten wget to accept the warc?
17:46 🔗 SketchCow I.e. the project?
17:46 🔗 db48x no, we've presented it on the mailing list and the respose was favorable
17:47 🔗 alard SketchCow: Okay. Same login details?
17:47 🔗 SketchCow No, not even close. :)
17:47 🔗 SketchCow I still need to set it all up, I think.
17:48 🔗 alard The wget maintainer will look at the WARC code in the coming days, but he seems to be interested.
17:48 🔗 alard So with a bit of luck, it will be accepted eventually.
17:50 🔗 db48x there was some push-back in the form of questions about the coding style, FSF copyright assignment, introduction of new libraries, etc
17:50 🔗 db48x but less than I expected
17:52 🔗 db48x in fact, they didn't even ask why anyone would possibly want such a thing, which is good
19:08 🔗 alard Hey, would it be a good idea to start downloading stuff from mac.com / mobileme?
19:09 🔗 alard There's still a lot of time before it closes, but maybe people will start moving their sites well before that.
19:10 🔗 alard That, in turn, would lead to broken links.
19:11 🔗 SketchCow I am not against some tests in trying.
19:14 🔗 emijrp SketchCow: where are you donwloading jamendo, at home or at IA?
19:15 🔗 emijrp I think we can create IA items using the metadata database
19:37 🔗 SketchCow IA
20:37 🔗 alard db48x: I think I may have found the source of the Wget-warc crashes. I guess you logged to a file (with -o )?
20:38 🔗 alard If you did, then it should be fixed now.
20:41 🔗 alard Also: the MobileMe scripts are ready. They work for me, but if anyone else wants to try them, I more than welcome your views. :)
20:41 🔗 alard http://archiveteam.org/index.php?title=MobileMe
20:44 🔗 db48x alard: ah, yes
20:47 🔗 alard Apparently you're not allowed to use the same variable arguments list twice.
20:48 🔗 alard It even says so in the comments above the logging function. I should have read that before I copied the print line.
20:51 🔗 db48x gotcha
20:51 🔗 db48x I never did manage a successful debug build
20:52 🔗 db48x it failed to compile
20:52 🔗 db48x has anyone else archived the Google Friends Newsletter?
20:52 🔗 SketchCow didn't I ask you to do that?
20:53 🔗 SketchCow Or did another person do it.
20:53 🔗 db48x yea
20:53 🔗 db48x but wget is failing me in this case, so I wanted to know if anyone else had tried it
20:59 🔗 alard You can get an RSS feed that lists all messages on google groups (from May 2005)
20:59 🔗 alard http://groups.google.com/group/google-friends/browse_thread/thread/42d1d0cdbef323f7
20:59 🔗 alard Sorry.
21:00 🔗 db48x that would be one way to do it
21:00 🔗 alard http://groups.google.com/group/google-friends/feed/rss_v2_0_topics.xml?num=500
21:00 🔗 db48x the problem is that wget just never recurses into the threads
21:00 🔗 chronomex does the rss feed give just message urls or all the content?
21:00 🔗 chronomex oh wait I can click on that can't i
21:01 🔗 db48x bbl
21:01 🔗 chronomex harrumph
21:01 🔗 alard No, it only gives the message urls.
21:02 🔗 alard But if you download that, you'll get the message.
21:04 🔗 alard Okay, got a list of docids now.
21:06 🔗 SketchCow So one bummer with Jamendo is that when it fails, and it does fail, it wastes a lot of time re-checking the files.
21:06 🔗 SketchCow A resume that did some investigation would be nice.
21:10 🔗 yipdw alard: re: MobileMe archiving -- do you know of a way we can grab a username set?
21:10 🔗 alard I have a list of +/- 250,000 of those. (Gathered from scraping Google and the wayback machine.)
21:10 🔗 yipdw ok
21:10 🔗 yipdw I was just plugging site:web.me.com into Google
21:11 🔗 alard But there may be other sources too.
21:11 🔗 alard I did something like that. But you'll probably find different ones than the ones I have.
21:11 🔗 yipdw it's possible, yeah
21:11 🔗 yipdw want to merge lists?
21:12 🔗 yipdw (I've not made one yet0
21:17 🔗 alard Sure, I'll upload it somewhere.
21:23 🔗 alard yipdw: http://db.tt/TUFAMav
21:23 🔗 yipdw awesome
21:23 🔗 yipdw I've almost finished writing a scraper for web.me.com/[x], will run it for a day or so and see what I can get
21:24 🔗 yipdw how often can you hit google.com/search without getting flagged as a robot? I'm waiting 5 seconds between requests
21:25 🔗 alard I'm not sure. I used the ipv6 trick, with 1024 ip addresses I just continued to send one request at a time, without pausing.
21:25 🔗 yipdw ahh
21:25 🔗 alard Each time using an ip from a different subnet.
21:25 🔗 * yipdw should really get a 6to4 tunnel
21:25 🔗 alard They're fun.
21:26 🔗 yipdw huh
21:26 🔗 yipdw I actually found one that isn't in your list
21:26 🔗 yipdw chado2010
21:26 🔗 yipdw go figure
21:26 🔗 yipdw that was fast
21:28 🔗 alard There must be much more than 250,000.
21:28 🔗 yipdw indeed
21:29 🔗 alard (And my list contains some artifacts from my parser, so not everything is really a username.)
21:32 🔗 alard SketchCow: db48x: Here's a tar.gz with the Google Friends archive. http://db.tt/hKYfWbM
21:34 🔗 db48x alard: cool
21:34 🔗 db48x doh
21:34 🔗 db48x wget failed to compile :P
21:34 🔗 alard Again?
21:34 🔗 db48x wget_warc.c:16:20: fatal error: base32.h: No such file or directory
21:34 🔗 alard ./bootstrap
21:34 🔗 db48x gcc -DHAVE_CONFIG_H -DSYSTEM_WGETRC=\"/usr/local/etc/wgetrc\" -DLOCALEDIR=\"/usr/local/share/locale\" -I. -I../lib -I../lib -I../libwarc/warc-tools/lib/public -I../libwarc/warc-tools/lib/private -I../libwarc/warc-tools/lib/private/os -I/include -O2 -Wall -MT wget_warc.o -MD -MP -MF .deps/wget_warc.Tpo -c -o wget_warc.o wget_warc.c
21:34 🔗 db48x did that
21:34 🔗 * db48x does it again
21:35 🔗 db48x alard: did you create that archive by parsing the feed?
21:36 🔗 alard More or less. I grepped the feed to get the docids of the messages.
21:36 🔗 db48x right
21:36 🔗 alard Then I used wget to get the original messages: http://groups.google.com/group/google-friends/msg/eb52c83c13717d31?dmode=source&output=gplain
21:36 🔗 alard (and filling in the docid)
21:36 🔗 alard Plus some tricks to get the date in the filename.
21:38 🔗 alard (That's when I found out that the January 2006 newsletter has January 2005 in its subject line.)
21:39 🔗 db48x lol
21:43 🔗 yipdw ok
21:43 🔗 yipdw Google's about to get a lot more traffic from a particularly bored Linux user running Firefox 3.6
21:44 🔗 yipdw in other words, totally normal usage profile
21:44 🔗 alard db48x: Just tried an fresh clone, didn't compile. Missing m4/base32
21:45 🔗 alard Another .gitignore problem: I didn't commit the gnulib-local/m4 directory because it was hidden. I've added it now.
21:46 🔗 alard yipdw: :)
21:47 🔗 db48x alard: aha
21:47 🔗 db48x yipdw: lol
22:28 🔗 underscor Command is the most irritating fucking key ever invented
22:32 🔗 db48x underscor: yup
22:33 🔗 chronomex no, sysrq is
22:53 🔗 db48x sysrq is merely unused
22:56 🔗 Wyatt Hehe, not _always_. (Reboot System Even If Utterly Broken :D)
22:58 🔗 chronomex ok ok capslock is more irritating than cmd
22:59 🔗 Wyatt Caps lock is a plague
22:59 🔗 chronomex yup
23:13 🔗 underscor I use sysrq all the time
23:14 🔗 underscor chronomex: I usually remap backspace to capslock
23:14 🔗 underscor Saves a pinky extension
23:22 🔗 dashcloud chronomex: do you have a keyboard where print screen and sysrq don't share the same key?
23:32 🔗 SketchCow Google Friends Newsletter looks like 200k .tar'd. Right?
23:49 🔗 chronomex I usually make "capslock" be a control key and I make "pause/break" be a capslock key
23:49 🔗 chronomex dashcloud: no
23:55 🔗 swebb I'm guessing that the google groups profiles job is done?

irclogger-viewer