#archiveteam 2013-03-21,Thu

↑back Search

Time Nickname Message
00:08 🔗 STR8FWD For those with scanners and all please Don't forget to pay black mail to concord pd and it's dispatchers for privacy. If not they take over internet and email as well as intercept any type of delivery mail, package and etc. ALL nothing to small or microscopic. Dispatchers at concord pd also signed a binding administrative document stating TOP SECRET, HANDS OFF, CLASSIFIED, EXTREEME PRIVACY
00:08 🔗 STR8FWD . Even I heard that is not all that is VIOLATED. This is is just a few things fact fully well that made up criminal charges, registered as child molester, retard, liar. I just found out that I'm not the only one
00:31 🔗 omf_ 58gb left on 4data
00:31 🔗 omf_ I estimate another day to download
07:33 🔗 SketchCow And here we go - now I am setting up machines at my desk for CD/DVD ripping
07:35 🔗 SketchCow Aw shit, son. Remember I have a lot of DVDs/CDs
08:29 🔗 SketchCow Do we need people for Yahoo Messages who are running warriors, or what.
08:29 🔗 omf_ 45gb left on 4data
08:39 🔗 BlueMax I'll run the warrior on Yahoo if there's a project for it
08:59 🔗 Nemo_bis SketchCow: wonderfullll
08:59 🔗 Nemo_bis I have a few more discs here
09:00 🔗 Nemo_bis Given that sending paper is currently impossible, I guess I'll send some soon
09:53 🔗 SketchCow InternetCensus2012
09:53 🔗 SketchCow done 583096.0 MB Rate: 15636.4 / 0.0 KB Uploaded: 4343169.8 MB [ R: 7.45]
09:53 🔗 SketchCow Well, I don't believe it.
09:53 🔗 SketchCow Da boom
09:55 🔗 Smiley GLaDOS: appently you should be able to grab the webseed for geocities from the IA.
09:55 🔗 Smiley *appently*
09:55 🔗 Smiley I don't understand exactly what webseeds are etc.
09:55 🔗 GLaDOS Basically a HTTP download I believe.
09:57 🔗 Smiley ah yes I see now
09:57 🔗 Smiley we got anything on the dedi box which we could play with to do that? :D
09:57 🔗 Smiley rtorrent? :D
09:58 🔗 SketchCow I wonder how many terabytes of uploads are being a good citizen.
09:58 🔗 GLaDOS rtorrent is on it.
09:59 🔗 Smiley SketchCow: "more"
10:03 🔗 SketchCow I feel that way with the DVDs/CDs.
10:03 🔗 SketchCow Once you get me on that, man, game over.
10:03 🔗 SketchCow I will fucking destroy this pile.
10:04 🔗 Smiley GLaDOS: so b/w isn't ap roblem?
10:04 🔗 GLaDOS Nope!
10:05 🔗 BlueMax What I want is basically a video of SketchCow putting a CD in a computer, closing the drive, ripping the disk, taking the disc out, and repeating the process so I can loop it for a screensaver
10:05 🔗 Smiley GLaDOS: hmmm we should go -bs
11:21 🔗 SketchCow alard: Punchfork is 100% uploaded into the group now.
11:21 🔗 SketchCow So fuck that guy.
11:22 🔗 Smiley :D
11:23 🔗 BlueMax yay
11:25 🔗 omf_ I plan to dl a copy to do some nlp on it. Where else can I find that much recipe data in one place
15:15 🔗 Smiley how the bleep do we grab pages that do weird json stuff? :S
15:15 🔗 soultcer With wget-lua
15:16 🔗 * Smiley googles
15:17 🔗 Smiley ok yeah I'm lost ¬_¬
15:17 🔗 soultcer I think we already had a project were we fetched some JSON and parsed it using wget-lua
15:18 🔗 Smiley I *think* it's json, I don't actually know
15:18 🔗 Smiley Someone mentioned json, but maybe they were wrong :S
15:18 🔗 soultcer Well, what are you trying to download?
15:20 🔗 Smiley saintsrow.com/community/forums
15:20 🔗 Smiley errr forum*
15:27 🔗 soultcer The content is directly in the html, not loaded via JSON?
15:37 🔗 Smiley hmm
15:37 🔗 Smiley well i tried to crawl with wget + cookies and failed :(
15:38 🔗 soultcer Weird
15:44 🔗 Smiley yeah
20:51 🔗 alard SketchCow: http://archive.org/details/archiveteam_punchfork_index
21:04 🔗 balrog_ http://politics.slashdot.org/story/13/03/21/203249/political-pressure-pushes-nasa-technical-reports-offline ... ugh
21:05 🔗 balrog_ and of course there used to be a robots.txt that said to allow "archive.org_bot", not "ia_archiver", so nothing's archived in wayback.
21:05 🔗 chronomex v_v
21:06 🔗 balrog_ may be a good idea to edit http://archive.org/details/archive.org_bot to clarify...
21:07 🔗 balrog_ http://www.fas.org/blog/secrecy/2013/03/ntrs_dark.html *sigh*
21:07 🔗 hdevalenc is there a way to reduce the rate-limiting on the yahoo messages tracker?
21:09 🔗 alard hdevalenc: No. There is no rate limit on the tracker. The rate limiting messages you see are coming from Yahoo: they block you really fast.
21:09 🔗 alard (Go to http://messages.yahoo.com/ to see the message when you're blocked.)
21:09 🔗 hdevalenc yeah, I meant strategies for dealing with their rate limiting
21:09 🔗 balrog_ change your ip
21:13 🔗 balrog_ alard: I'm getting "No item received. Retrying after 30 seconds..." with the grabber
21:15 🔗 alard balrog_: With Yahoo Messages?
21:15 🔗 balrog_ yeah
21:16 🔗 balrog_ oh wait
21:16 🔗 balrog_ my bad :P
21:16 🔗 alard Ah, good. :)
21:17 🔗 balrog_ how many parallel instances are recommended?
21:17 🔗 alard I don't know. I think they count.
21:17 🔗 alard But maybe Yahoo does something more difficult, I don't know. One instance is enough to get you blocked, so running more probably won't help.
21:18 🔗 swebb A breakdown of the most popular domains in the twitter unrolled urls: https://gist.github.com/scumola/5216839
21:18 🔗 swebb I've not gotten the second column processed at this time.
21:18 🔗 balrog_ swebb: be warned, t.co may redirect through bit.ly/etc
21:19 🔗 swebb Yea, I know. I follow those too.
21:19 🔗 balrog_ ok
21:19 🔗 swebb I recursively follow redirects.
21:19 🔗 swebb bit.ly is #2
21:19 🔗 swebb approx 10% of the t.co urls went through bit.ly
21:20 🔗 chronomex what the hell
21:21 🔗 chronomex I'd like to see a distribution of the shortener chain length
21:21 🔗 swebb I only posted the top 1000 domains. I have all of the domains if you'd like to see them all.
21:22 🔗 chronomex maybe sure
21:22 🔗 swebb 860726 domains in all.
21:22 🔗 chronomex later
21:22 🔗 swebb :)
21:24 🔗 ersi po.st o_o
21:25 🔗 swebb archive.org is on the list
21:26 🔗 alard Smiley, soultcer: Yes, wget-lua can do JSON parsing (with the Lua JSON parser, see the punchfork repository, for example).
21:28 🔗 Smiley ooo
21:29 🔗 swebb Here is the list of most popular domains that people were linking to: https://gist.github.com/scumola/5216839#file-col2-histogram
21:29 🔗 swebb the top-1000 anyway
21:30 🔗 Smiley We seem to have missed the gist of that gist you were looking for.
21:30 🔗 Smiley Whoops.
21:31 🔗 swebb damn
21:31 🔗 swebb https://gist.github.com/scumola/5216839/raw/7d5dfbd48f2d2b216501990d9daa3e2b23cbe7c6/col2.histogram
21:31 🔗 swebb crap
21:31 🔗 swebb that didn't work either
21:31 🔗 Smiley o_)O
21:31 🔗 swebb My gists are gone.
21:32 🔗 swebb :(
21:32 🔗 Smiley :<
21:34 🔗 Famicoman email github
21:34 🔗 swebb stupid github
21:34 🔗 Famicoman I had the same problem a year or so ago and they fixed it all up
21:35 🔗 swebb https://www.dropbox.com/s/b6qq7avhtwsos3s/col1.histogram.bz2
21:43 🔗 ersi I got the gist open in a tab
21:47 🔗 swebb https://www.dropbox.com/s/oe0sh1qze8qbs5p/col2.histogram.bz2
21:47 🔗 swebb Those are the full lists.
21:49 🔗 ersi https://gist.github.com/ersi/5217120 if you just want to take a quickie look on col1 (copy of swebb's gist)
23:02 🔗 SketchCow alard: You have name and date reversed in the columns.
23:41 🔗 dashcloud SketchCow: where will the new CDs you're ripping end up? any link I can watch to see the latest ones?
23:42 🔗 SketchCow All well end up in cdbbsarchive
23:45 🔗 DFJustin https://archive.org/services/collection-rss.php?collection=cdbbsarchive
23:55 🔗 SketchCow The "problem" right now is if I rip CD-ROMs, it can rip them so fast, it breaks up me doing anything else.
23:56 🔗 dashcloud can you slow down the process? rip at a lower speed to give you a longer break between disc swaps?
23:56 🔗 SketchCow No point. It'll just be done each time I look at it.
23:56 🔗 SketchCow But ironically, I could get more "data" from the DVD-ROM ripping since I could run two machines as well as work.
23:57 🔗 SketchCow Now it slams through a CD-ROM in under a minute and a half.
23:57 🔗 dashcloud shame you weren't able to get a scanning intern- that would be a perfect job

irclogger-viewer