[00:08] For those with scanners and all please Don't forget to pay black mail to concord pd and it's dispatchers for privacy. If not they take over internet and email as well as intercept any type of delivery mail, package and etc. ALL nothing to small or microscopic. Dispatchers at concord pd also signed a binding administrative document stating TOP SECRET, HANDS OFF, CLASSIFIED, EXTREEME PRIVACY [00:08] . Even I heard that is not all that is VIOLATED. This is is just a few things fact fully well that made up criminal charges, registered as child molester, retard, liar. I just found out that I'm not the only one [00:31] 58gb left on 4data [00:31] I estimate another day to download [07:33] And here we go - now I am setting up machines at my desk for CD/DVD ripping [07:35] Aw shit, son. Remember I have a lot of DVDs/CDs [08:29] Do we need people for Yahoo Messages who are running warriors, or what. [08:29] 45gb left on 4data [08:39] I'll run the warrior on Yahoo if there's a project for it [08:59] SketchCow: wonderfullll [08:59] I have a few more discs here [09:00] Given that sending paper is currently impossible, I guess I'll send some soon [09:53] InternetCensus2012 [09:53] done 583096.0 MB Rate: 15636.4 / 0.0 KB Uploaded: 4343169.8 MB [ R: 7.45] [09:53] Well, I don't believe it. [09:53] Da boom [09:55] GLaDOS: appently you should be able to grab the webseed for geocities from the IA. [09:55] *appently* [09:55] I don't understand exactly what webseeds are etc. [09:55] Basically a HTTP download I believe. [09:57] ah yes I see now [09:57] we got anything on the dedi box which we could play with to do that? :D [09:57] rtorrent? :D [09:58] I wonder how many terabytes of uploads are being a good citizen. [09:58] rtorrent is on it. [09:59] SketchCow: "more" [10:03] I feel that way with the DVDs/CDs. [10:03] Once you get me on that, man, game over. [10:03] I will fucking destroy this pile. [10:04] GLaDOS: so b/w isn't ap roblem? [10:04] Nope! [10:05] What I want is basically a video of SketchCow putting a CD in a computer, closing the drive, ripping the disk, taking the disc out, and repeating the process so I can loop it for a screensaver [10:05] GLaDOS: hmmm we should go -bs [11:21] alard: Punchfork is 100% uploaded into the group now. [11:21] So fuck that guy. [11:22] :D [11:23] yay [11:25] I plan to dl a copy to do some nlp on it. Where else can I find that much recipe data in one place [15:15] how the bleep do we grab pages that do weird json stuff? :S [15:15] With wget-lua [15:16] * Smiley googles [15:17] ok yeah I'm lost ¬_¬ [15:17] I think we already had a project were we fetched some JSON and parsed it using wget-lua [15:18] I *think* it's json, I don't actually know [15:18] Someone mentioned json, but maybe they were wrong :S [15:18] Well, what are you trying to download? [15:20] saintsrow.com/community/forums [15:20] errr forum* [15:27] The content is directly in the html, not loaded via JSON? [15:37] hmm [15:37] well i tried to crawl with wget + cookies and failed :( [15:38] Weird [15:44] yeah [20:51] SketchCow: http://archive.org/details/archiveteam_punchfork_index [21:04] http://politics.slashdot.org/story/13/03/21/203249/political-pressure-pushes-nasa-technical-reports-offline ... ugh [21:05] and of course there used to be a robots.txt that said to allow "archive.org_bot", not "ia_archiver", so nothing's archived in wayback. [21:05] v_v [21:06] may be a good idea to edit http://archive.org/details/archive.org_bot to clarify... [21:07] http://www.fas.org/blog/secrecy/2013/03/ntrs_dark.html *sigh* [21:07] is there a way to reduce the rate-limiting on the yahoo messages tracker? [21:09] hdevalenc: No. There is no rate limit on the tracker. The rate limiting messages you see are coming from Yahoo: they block you really fast. [21:09] (Go to http://messages.yahoo.com/ to see the message when you're blocked.) [21:09] yeah, I meant strategies for dealing with their rate limiting [21:09] change your ip [21:13] alard: I'm getting "No item received. Retrying after 30 seconds..." with the grabber [21:15] balrog_: With Yahoo Messages? [21:15] yeah [21:16] oh wait [21:16] my bad :P [21:16] Ah, good. :) [21:17] how many parallel instances are recommended? [21:17] I don't know. I think they count. [21:17] But maybe Yahoo does something more difficult, I don't know. One instance is enough to get you blocked, so running more probably won't help. [21:18] A breakdown of the most popular domains in the twitter unrolled urls: https://gist.github.com/scumola/5216839 [21:18] I've not gotten the second column processed at this time. [21:18] swebb: be warned, t.co may redirect through bit.ly/etc [21:19] Yea, I know. I follow those too. [21:19] ok [21:19] I recursively follow redirects. [21:19] bit.ly is #2 [21:19] approx 10% of the t.co urls went through bit.ly [21:20] what the hell [21:21] I'd like to see a distribution of the shortener chain length [21:21] I only posted the top 1000 domains. I have all of the domains if you'd like to see them all. [21:22] maybe sure [21:22] 860726 domains in all. [21:22] later [21:22] :) [21:24] po.st o_o [21:25] archive.org is on the list [21:26] Smiley, soultcer: Yes, wget-lua can do JSON parsing (with the Lua JSON parser, see the punchfork repository, for example). [21:28] ooo [21:29] Here is the list of most popular domains that people were linking to: https://gist.github.com/scumola/5216839#file-col2-histogram [21:29] the top-1000 anyway [21:30] We seem to have missed the gist of that gist you were looking for. [21:30] Whoops. [21:31] damn [21:31] https://gist.github.com/scumola/5216839/raw/7d5dfbd48f2d2b216501990d9daa3e2b23cbe7c6/col2.histogram [21:31] crap [21:31] that didn't work either [21:31] o_)O [21:31] My gists are gone. [21:32] :( [21:32] :< [21:34] email github [21:34] stupid github [21:34] I had the same problem a year or so ago and they fixed it all up [21:35] https://www.dropbox.com/s/b6qq7avhtwsos3s/col1.histogram.bz2 [21:43] I got the gist open in a tab [21:47] https://www.dropbox.com/s/oe0sh1qze8qbs5p/col2.histogram.bz2 [21:47] Those are the full lists. [21:49] https://gist.github.com/ersi/5217120 if you just want to take a quickie look on col1 (copy of swebb's gist) [23:02] alard: You have name and date reversed in the columns. [23:41] SketchCow: where will the new CDs you're ripping end up? any link I can watch to see the latest ones? [23:42] All well end up in cdbbsarchive [23:45] https://archive.org/services/collection-rss.php?collection=cdbbsarchive [23:55] The "problem" right now is if I rip CD-ROMs, it can rip them so fast, it breaks up me doing anything else. [23:56] can you slow down the process? rip at a lower speed to give you a longer break between disc swaps? [23:56] No point. It'll just be done each time I look at it. [23:56] But ironically, I could get more "data" from the DVD-ROM ripping since I could run two machines as well as work. [23:57] Now it slams through a CD-ROM in under a minute and a half. [23:57] shame you weren't able to get a scanning intern- that would be a perfect job