#archiveteam 2011-12-10,Sat

↑back Search

Time Nickname Message
00:15 🔗 yipdw| heh
00:15 🔗 yipdw| running ffgrab.rb on AT&T's 3G network is an exercise in omg
00:19 🔗 Coderjoe eek
01:21 🔗 bsmith093 /home/ben/.rvm/gems/ruby-1.9.3-p0/gems/girl_friday-0.9.7/lib/girl_friday/actor.rb:212:in `block in initialize'
01:21 🔗 bsmith093 /home/ben/.rvm/gems/ruby-1.9.3-p0/gems/girl_friday-0.9.7/lib/girl_friday/actor.rb:212:in `initialize'
01:21 🔗 bsmith093 /home/ben/.rvm/gems/ruby-1.9.3-p0/gems/girl_friday-0.9.7/lib/girl_friday/actor.rb:377:in `watchdog'
01:21 🔗 bsmith093 /home/ben/.rvm/gems/ruby-1.9.3-p0/gems/girl_friday-0.9.7/lib/girl_friday/actor.rb:66:in `new'
01:21 🔗 bsmith093 yipdw im getting a whole mess of these/home/ben/.rvm/gems/ruby-1.9.3-p0/gems/girl_friday-0.9.7/lib/girl_friday/actor.rb:69:in `block (2 levels) in spawn'
01:21 🔗 bsmith093 /home/ben/.rvm/gems/ruby-1.9.3-p0/gems/girl_friday-0.9.7/lib/girl_friday/actor.rb:66:in `block in spawn'
01:21 🔗 bsmith093 E, [2011-12-09T20:21:18.680575 #18455] ERROR -- : Exception Errno::ECONNREFUSED (Connection refused - Unable to connect to Redis on 127.0.0.1:6379) raised while scraping /cartoon/; requeuing.
01:26 🔗 yipdw| yeah
01:26 🔗 yipdw| I said you need a Redis instance
01:31 🔗 bsmith093 i figured out how to run reedis, now where is it saving these links, to, or isnt it?
01:32 🔗 yipdw| all data is being saved in the Redis database
01:32 🔗 yipdw| the list of story IDs is present in the "stories" key
01:32 🔗 yipdw| all other keys are discovery state
01:32 🔗 bsmith093 which is where?
01:33 🔗 yipdw| the stories key is in the Redis database
01:33 🔗 yipdw| redis-cli is the Redis CLI interface
01:33 🔗 yipdw| to query, launch it and run e.g. scard stories, smembers stories
01:33 🔗 yipdw| more info available at http://redis.io
01:34 🔗 yipdw| also, you should pull again, because I fixed a problem with usage of the If-Modified-Since header
01:34 🔗 bsmith093 wow, 5k stories in 45sec thats fast
01:34 🔗 yipdw| it's not actually pulling story data
01:35 🔗 yipdw| but, yeah, it's decently quick
01:35 🔗 yipdw| it can be faster without the random waits, but
01:35 🔗 bsmith093 i know its just saving valid ids right
01:35 🔗 yipdw| I feel bad about taking that out
01:35 🔗 yipdw| if you run it on JRuby 1.6+, it is possible to scale it to a very high number of threads
01:36 🔗 bsmith093 do i need to run anything else before running ffgrab agian after apull
01:36 🔗 yipdw| but again, I don't feel good about doing that, because I don't know fanfiction.net's capabilities
01:36 🔗 yipdw| no, terminate it
01:36 🔗 bsmith093 i did
01:36 🔗 yipdw| ok
01:36 🔗 yipdw| then just re-run it
01:36 🔗 bsmith093 and i git pull, and now im running again
01:36 🔗 yipdw| the scraper will pick up from where it left off
01:37 🔗 yipdw| anyway, brb
01:37 🔗 yipdw| well bbl more like, heh
01:37 🔗 bsmith093 thanks bye
06:55 🔗 Coderjoe http://retropc.net/
06:57 🔗 Coderjoe the japanese counterpart to SketchCow?
07:01 🔗 Coderjoe man... why didn't I buy another HR-S9911U or two before they disappeared?
07:38 🔗 DFJustin the japanese guys are like negative sketchcow, they collect shitloads of stuff and then never digitize any of it ever
07:40 🔗 DFJustin there are a few exceptions
08:47 🔗 Coderjoe not sure if there was a link from the main page of the site, so: http://retropc.net/alice/
09:49 🔗 yipdw well this is awesome
09:49 🔗 yipdw 9) "/Wrestling_and_CSI_Miami_Crossovers/230/1686/_cache_control"
09:50 🔗 chronomex woooo
09:50 🔗 chronomex the best
09:50 🔗 Coderjoe O_O
09:51 🔗 Coderjoe fan fiction gets weeeeeird
09:51 🔗 yipdw 47) "/Frasier_and_Megami_Tensei_Crossovers/381/1074/_cache_control"
09:51 🔗 Coderjoe i wonder how many ff.net stories have self-insertion
09:51 🔗 dnova fan fiction starts weird, remains weird, and ends weird.
09:51 🔗 chronomex is _cache_control some sort of strange fanfic thing?
09:52 🔗 yipdw no, that's my Cache-Control observance mechanism
09:52 🔗 chronomex :P
09:52 🔗 yipdw it's kind of hacky
09:52 🔗 yipdw but it works
09:54 🔗 Schbirid argh, what is the linux tool to compare two textfiles where you can specify eg "only show entries that appear in a but not b". not diff, something simpler
09:55 🔗 chronomex grep -v -f b a
09:56 🔗 chronomex works best when B is small and A is large
09:56 🔗 Schbirid it was something with the flags -1 , -2, -3
09:56 🔗 Schbirid but that sounds good too :)
09:56 🔗 chronomex it's not quite there, but that's where I would start.
09:57 🔗 chronomex you also want to tell grep to match whole lines only (dunno the option), and interpret the lines as fixed strings rather than patterns (-F, I think)
10:00 🔗 Schbirid chronomex: comm it is :)
10:00 🔗 chronomex ah
10:08 🔗 DFJustin ah this alice soft archive is nice
13:36 🔗 Schbirid http://dtrace.org/blogs/brendan/2011/12/08/2000x-performance-win/
21:05 🔗 chronomex Schbirid: I bet it would have really flown with fgrep.
21:58 🔗 bsmith093 gsc-game.com wget-warc appears to be done, 1.5gb total, including cdx and warc fle
21:58 🔗 bsmith093 44.185 items, at 1.8gb
23:14 🔗 SketchCow Yes, once again, archiveteam's site has spam on it.
23:18 🔗 SketchCow You know, for being all twitchy about the fact I've been porting their digitized magazines to archive.org, this site has done a spectacularly shitty job getting consistent spams.
23:18 🔗 SketchCow scans.
23:19 🔗 SketchCow Some of these are literally mish-mashes where issues 1 2 3 8 and 9 are collections of JPG files, and then 4 5 6 are pdfs and then 7 is a set of jpg files in two file directories.
23:47 🔗 underscor SketchCow: hah
23:58 🔗 chronomex scanning is boring, let's get high first

irclogger-viewer