[00:05] Where's my hug [00:05] * marceloan hugs SketchCow [00:05] LOL [00:44] likwid_: Willing to share that script? [00:44] ndurner1: wow, you're burning up the wires [00:47] for a long time i wanted to record links people posted in irc and couldnt get a mysql database on the machine irssi ran on so i wrote a script which tweeted the links [00:47] literally the first perl script i wrote after "hello world" [00:57] underscor: sure, not done yet tho http://pastebin.com/JFgMKKJc [00:57] underscor: right now it only gets 200 entries, but it does owrk [00:58] THIS PLACE NEEDS MORE FIRE [00:58] I went down down down in a burning ring of fire [00:58] underscor: so I need to build in "pagination" kinda. Basically I have to refactor line 244 down into a function that I can call a bunch of times until results returned > 200 [00:59] oic, cool [00:59] likwid_: are you using twitter's search api [00:59] What goes in config.yaml? [00:59] underscor: hopefully i'll have it done tonight, but not sure since I just remembered that I need to back up my LJ, hahahaha [00:59] haha [00:59] underscor: it should build that for you [00:59] Well, gimme a poke when it's done, I'm definitely interested [00:59] likwid_: Oh, great! [01:00] underscor: cool :) yeah check out the create_default_config_file method [01:00] the twitter boilerplate crap is from another thing I made [01:00] but it works [01:00] are these consumer key and secret tied to your account? [01:00] underscor: they're tied to my developer account, yeah [01:01] every app that uses the Twitter API needs an API key [01:01] I meant are they okay for me to use, or should I sub them out? [01:01] they're fine [01:01] k, awesome [01:01] just bound to a registered application that I own (twitter-avatar-update)...I'll probably make new ones before I release this [01:01] I see :) [01:02] damn I've had back luck with these Western Digital Green drives [01:02] we have a stack of dead ones at work and my 2TB just died (WD20EARS) [01:03] they last ~7 months [01:31] At the speed we're downloading AnyHub, when do you think will we finish? [01:34] Well, we've done 28k in what, 3 days? [01:34] so roughly 10k a day [01:34] probably sometime tomorrow afternoon [02:03] Hey, is anyhub down? [02:03] Everything on the recently added page is empty [02:05] Should I touch stop? [02:05] Dunno, maybe those are legitamately empty [02:06] don't touch that dial! [02:06] Better to run them for now [02:06] I touched the STOP [02:40] Good Night [04:36] Hi, is SketchCow here? [04:36] Might be [04:37] This is the best time to catch him usually [04:37] (late at night, not this exact time specifically) [04:37] Man [04:38] Did I miss him? [04:40] http://games.yahoo.com/blogs/plugged-in/peta-slams-mario-over-fur-suit-211025773.html [04:40] Nah, he flits in and out [04:41] lol [04:42] * Hydriz needs a rsync slot [04:50] http://lefty333boy.splinder.com/ this is 4.4 gb, and counting [04:50] christ [04:51] haha [04:52] Gah, so many projects :( [04:52] I wish they were more spaced out [05:00] oh, I missed SketchCow 4 hours ago... [07:53] Hello, is SketchCow online now? [07:54] Hydriz: I don't think so [07:55] ... [07:55] Hydriz: why do you ask? [07:56] I need an rsync module [07:56] ah [07:56] *slot [07:56] send him a private message, he may see it before he hits the sack [07:57] maybe, it's like 3am where he lives. [07:57] though idfk when he sleeps [07:58] SketchCow! [07:58] Hey!!!!!!!!! [07:58] I need an rsync slot to upload AnyHub grab [07:58] clearly he doesn't sleep right now [07:59] lol [08:00] I have about 60+ GB to transfer [08:01] are you able to provide one? [08:01] Yes [08:02] Give me a moment, the IRC client a-sploded [08:02] okie :) [08:04] erm, do I have a time limit? [08:05] Can you see the sun outside? [08:05] yes [08:05] OK. [08:05] If that burns out, stop uploading. [08:05] Ah, service. [08:05] hahaha [08:05] though it is cloudy now haha [08:05] RUN! [08:06] ** TODO [#A] upload continuously [08:06] DEADLINE: <5000000000-01-01> [08:07] So, here's a project. [08:08] A non-profit group just made a global social downloader. [08:08] Deadline: 4,294,967,295 [08:08] Most of the guys in the group, I don't personally like. [08:08] But the main coder? Love her. LOVE HER. [08:08] So she made this thing. [08:08] It's called ThinkUp [08:08] * db48x googles [08:08] It's basically a personal social network backup [08:08] Run it, it constantly keeps your shit backed up off these sites. [08:08] http://thinkupapp.com/ ? [08:08] Sounds right [08:09] If it's got Anil Dash bloviating, that's it [08:09] Looks jazzyshebang [08:09] Yeah [08:09] Then it drops you right into a fucking github [08:10] * ersi laughs [08:10] Regardless [08:10] it's a good idea [08:10] I'd love for us to look at it [08:10] It's open source [08:10] That is cool [08:10] if it's really good, we should grab it and work with it [08:10] Ah, gina's the main coder [08:10] LOVE HER [08:10] bumped into her on teh netz before [08:11] ah, yeah - she writes for Lifehacker as well :-) [08:11] SketchCow: Where is that rsync server located? [08:11] In sunny california, where the disks - always go hot! [08:11] 280 forks [08:12] California... [08:12] Yeah, that's where the Internet Archive lives [08:13] ok, then afterI rsync I can delete everything I have? [08:13] rsync a few times first [08:13] nevermind, that was a dumb question [08:13] just to make sure you got everything [08:13] ok [08:13] if you don't need the space immediately, it's always Very nice to have an extra copy laying around [08:13] ^ [08:13] just worried that we can't archive everything in time [08:13] it is closing in two days [08:14] I mean, mistakes happens.. things disappear and stuff [08:14] I think we'll get anyhub [08:14] we request you keep everything around unless you run out of space [08:14] I know [08:14] :) [08:14] but this is not my server [08:14] aye [08:14] I'm throwing all my bandwidth at it, 300gb until I'm at my download limit [08:14] Hydriz: For uploading anyhub, please use the upload script from github. It'll upload things in a more or less standard way, which is handy for postprocessing. [08:14] I ran 20 terminals for it :( [08:14] yeap [08:14] following the script [08:14] Good! [08:15] Yeah, I'm running 20 instances too [08:15] but something slowed things down [08:15] I am stopping already [08:15] restarting soon [08:15] Yeah, things have slowed down [08:15] Generate SSH Keys (http://help.github.com/msysgit-key-setup/) [08:15] Install Git for Windows (http://help.github.com/win-git-installation/) [08:15] Install XAMPP (http://www.apachefriends.org/en/xampp.html) [08:15] Get the latest ThinkUp files from GitHub git clone git://github.com/ginatrapani/ThinkUp.git [08:15] Make a MySQL database [08:15] Edit httpd.conf to point to webapp directory [08:15] Register the app with Twitter: [08:15] Application Website: http://yourdomain.com/thinkup [08:15] Application Type: Browser [08:15] Callback URL: http://yourdomain.com/thinkup/plugins/twitter/auth.php [08:15] Copy the Consumer key and Consumer secret [08:15] Run the web-based install at http://yourdomain.com/thinkup/install/ and follow its steps [08:16] Authorize ThinkUp to use your Twitter account [08:16] Create a batch file to run crawler automatically [08:16] A SNAP [08:16] whoh dude, everyone stopped at MySQL [08:16] it's sad when my first reaction to a wall of steps is "is there a Puppet script or Chef recipe for that" [08:17] means I've become lazy as hell [08:17] yipdw: friend of mine did a lot of work on Chef :) [08:18] chronomex: I like it a lot; been rolling it out at work [08:18] have you seen lockerproject.org? Similar concept, being done by a company and I know one of the guys. [08:19] http://pastebin.com/DJrGrzT4 <- TweetDelete.rb , lets you archive and/or delete tweets from a twitter account [08:20] chronomex: it's probably the biggest thing that has made me not go completely insane over SELinux configuration, because I can shove all of that into recipes and comment just what the hell is going on [08:20] current limitations: it's slow when you're destroying the tweets, processing at about 0.62 tweets/sec (but that does include inserting them into SQLite) [08:20] yipdw: heh, seriously [08:21] it can use any DB that the Sequel gem can use, which includes SQLite, MySQL, Postgre, DB2, Oracle, etc. I will work tomorrow on a version that uses Ruby fibers to get better speeds. The limiting factor is the Twitter API (it is very slow) [08:21] The thing is, yeah. Mysql. [08:21] Why can't they do a flatfile? [08:22] I mean, at most, let's say you're a fucking maniac [08:22] You do 30,000 tweets [08:22] Wow, what a maniac [08:22] 30. thousand. [08:22] sqlite is a happy medium in most cases, flatfile is futureproof [08:22] I have 15K [08:22] tweets [08:22] 30k is a decent amount [08:22] That's, like, I mean, probably a meg or two. [08:22] You cna only access the 3,200 most recent tweets can't you? [08:22] another problem is that Twitter will only let you get the most recent 3,200 [08:22] Surely we need a massive database infrastructure. [08:22] yes [08:23] hmm, SQLite can dump to CSV easily [08:23] sqlite is plenty for most everything [08:23] It needs some archiveteam [08:23] sms message store on android and iphone uses sqlite happily [08:23] yea, sqlite is actually pretty awesome [08:23] chronomex: except apps that are multi-user :) [08:24] likwid_: yeah, well, that's out of the defined use-case ;) [08:24] chronomex: indeed, but I've seen lots of webapps using SQLite [08:24] plus, the user doesn't have to install and set up a database server [08:24] So, I propose a parallel version [08:24] "DrinkUp" [08:25] And we play around with it [08:25] does Twitter have some way of hooking into tweet feeds via XMPP/AMQP/whatever [08:25] yeah, that will come tomorrow because it's 3AM here :) [08:25] ? [08:25] I guess I should RTFAPIDocs [08:25] but I "solved" the 3,200 limit by making it remove all your tweets [08:25] yipdw: xmpp> hell no [08:25] so then, theoretically, that "3,200" window slides back in time and you can collect all your data [08:26] likwid_: weird, but okay. [08:26] chronomex: well, there is no other way to do it [08:26] suckass. [08:26] chronomex: I am OK with this since I'm rebooting my public persona anyway [08:26] oh, twitter has a streaming thing [08:26] 00:22:22 <@SketchCow> You do 30,000 tweets [08:26] 00:22:24 <@SketchCow> Wow, what a maniac [08:26] chronomex: but yeah, for most users it's not a workable solution [08:26] haha [08:26] SketchCow: you have 12,000 tweets. [08:26] doesn't help for archival of past tweets, but going forward [08:26] Yes [08:27] what a maniac! [08:27] I am saying that even that much is nothing [08:27] yeah Twitter actually has some crazy shit like the "Firehose" [08:27] No mysql needed [08:27] but you basically need to be DHS to get access to that [08:28] what, bloated and incompetent [08:28] creepy and overfunded. [08:28] there is stuff like this: https://www.tweetscan.com/data.php but that is pulling from their own DB [08:28] touching teenage girls [08:29] what is the data mining company that starts with PH something? [08:29] maybe it's just P.... they're in Palo Alto [08:29] Palantir [08:29] yes [08:29] I was considering, for about 10 seconds, working there [08:29] it seems to be popular amongst Stanford grads [08:29] but it seems that if you want a job in which you get to play with huge data sets [08:30] there must be better places to be [08:30] eh [08:30] they also have all the fun new toys [08:30] better meaning "I won't look back on this in 30 years and hate myself" [08:30] yes, that [08:30] Can I ask: If I have an rsync slot, and would like to upload another project, do I still have to ask? [08:30] Hydriz: I don't think you're allowed to stop [08:30] archive.org would be much more fun as a job for me anyway :) [08:30] wait, what? [08:31] put em in different directories. [08:31] it'll be FINE. [08:31] oh good haha [08:31] Hydriz: ;) [08:31] don't stop archeeving, hold on to that feeling [08:31] SketchCow: I read the article about scanning the braille Playboy. Cool stuff :) [08:32] no, I mean if we are lets say finished with AnyHub [08:32] (( http://ianews.wordpress.com/2011/08/17/scanning-a-braille-playboy/ )) [08:32] Yes. [08:32] (I guarantee the people in here.) [08:32] (Have seen it) [08:34] anyway, it got me thinking about converting stuff like that to Portable Embosser Format to allow electronic distribution of braille documents [08:35] or, taking it a step further, making a PEF -> plaintext converter so that you could eventually text->speech that stuff [08:44] haha, the downside of cache-busting query-string parameters [08:45] splinder has a TON of them for images [08:47] heh [08:47] so as a result, journal.splinder.com is at 48,000 PNGs and counting [08:47] or something [08:48] jeepers [08:48] it's been downloading for almost three das [08:48] days [08:49] actually, wait a sec, it's not cache-busting [08:50] different releases? [08:50] it's 47,218 PNGs all with unique URLs modulo query string [08:50] it looks like...a bunch of shrunken avatars [08:50] oh, christ [08:50] I think this might end up downloading a sizable chunk of all Splinder users' avatars [08:50] heh [08:50] because journal.splinder.com is like the Splinder dev journal or something [09:53] Erm, any professional helper here? [09:54] what kind of professional? [09:55] I mean, installing wget-warc [09:55] I am grabbing from a shell provider, but want to do it on my personal comp [09:55] ah. [09:55] but it always seems to fail [09:56] can you help? [09:57] I receive this error: configure: error: --with-ssl was given, but GNUTLS is not available. [09:57] and try installing GNUTLS [09:57] apt-get install gnutls-dev ? [09:58] hmm [09:58] lemme try [10:02] looks okay for now... [10:02] yeah! Thanks [10:02] :) [10:02] :D [10:55] !likes [10:55] oops :) [11:30] ypidw: Same with one of the ones I'm downloading. I have hundreds of thousands of files from files.splinder.com. [11:44] I lost the connection for some 20 min, now it's downloading very little, how do I know if a process is just long or stale? [15:12] http://adamsworld.name/copy_calc.php lol at that message [15:13] "Website Offline, No Cached Version Available" [15:13] and they're a.... caching service. [15:13] good game, asshats [15:44] HULLO [18:10] why are processes so different in a "pidstat -u -T CHILD -C dld-client" ? http://p.defau.lt/?y9WaTFshFkhXK868M0zVoQ [18:10] Does it depend on whether their children are still alive? [18:15] woot! I've passed the 100 thousand users downloaded mark on splinder [18:15] ndurner is gonna pass me tho [18:15] yep, he's accelerating :-p [18:16] closure, can you answer my first question above? [18:16] no idea [18:17] closure, I mean Nemo_bis> I lost the connection for some 20 min, now it's downloading very little, how do I know if a process is just long or stale? [18:17] oh, sure, I just strace the wget to see what it's doing [18:17] closure, eh, if only I knew the pid to do it selectively [18:18] or is there a way to do it quickly on all of them and automatically find the stale ones? [18:18] ps -fax [18:36] closure, thanks, looks like they're working [18:37] yeah, I've seen no full stalls, just some big blogs downloading for days on end [18:37] wget is remarkably stable even on crazy connection, better than expected :-D [18:38] closure, is there any way to tell wget to use more memory and write to disk less often? [19:28] Nemo_bis: maybe you want to ionice -c 3 wget? [19:30] closure, interesting, does it work on all running processes? [19:32] sure [19:39] closure, sure? « -p pid Pass in process PID(s) to view or change already running processes. If this argument is not given, ionice will run the listed program with the given parameters.» [20:45] Hi Splinder downloaders: Nothing urgent, but if you want full credit for your downloads, please do a git pull. (No restarts necessary.) It turns out that recent versions didn't report the size of the media.warc. [20:46] heh, oops [20:50] closure, looks better now, I think http://p.defau.lt/?9ahUP_jXIxJTSIWKX_KDZw (300 s intervals) [20:51] alard, yes, it's something like 1/4 of the real total [20:51] The media bit is well, somewhat smaller than it should be. [20:51] What's happening? [20:51] Hi Splinder downloaders: Nothing urgent, but if you want full credit for your downloads, please do a git pull. (No restarts necessary.) It turns out that recent versions didn't report the size of the media.warc. [20:51] Marcelo: [20:54] I'll go back to midpSSH and continue downloading. Bye. [21:58] hah, told you it was underreporting :P [22:08] closure: Heh, yes. (But you were not convincing enough. :) [22:40] huh, this is a fun anyhub-grab error [22:41] 2011-11-16 16:35:37 URL:http://f.anyhub.net/4o0J [84178/84178] -> "/dev/null" [1] [22:41] Cannot write to `/dev/null' (Success). [22:50] yipdw: I had that too, today, multiple times. [22:59] ASCII a stupid question, get a stupid ANSI! [22:59] EBCDICK [23:08] The fun's begun - uploading gigabytes of cool game convention videos to the GDC Vault [23:08] Everything I upload goes into their free section. [23:09] Awesome! [23:10] Only 5k anyhub users left! [23:23] SketchCow, how do I upload Splinder data to the batcave? [23:24] You ask me for a rsync slot. [23:24] Dod you have one? [23:24] Do you? [23:25] SketchCow, no [23:25] I need one for donbex also if possible [23:26] (a friend who's downloading Splinder)