' [21:21] paste me the full terminal outpuyt [21:22] /home/ben/.rvm/rubies/ruby-1.9.3-p0/lib/ruby/site_ruby/1.9.1/rubygems/custom_require.rb:36:in `require': cannot load such file -- connection_pool (LoadError) [21:22] ben@ben-laptop:~/1432483$ [21:22] from /home/ben/.rvm/rubies/ruby-1.9.3-p0/lib/ruby/site_ruby/1.9.1/rubygems/custom_require.rb:36:in `require' [21:22] from ffgrab.rb:1:in `

' [21:22] ruby ffgrab.rb [21:22] thats what i get [21:22] gem install bundler; bundle install [21:23] the Gemfile in the gist repo is a dependency manifest [21:23] for Bundler [21:24] i nthought that was important, i kept trying ruby Gemfile on the offchance something would happen, this is not an intuitive lang to install [21:25] Fetching source index for http://rubygems.org/ [21:25] now that seems like i would need that for gems, becasue thats where i found connection_pool and girl_friday [21:25] Rubygems is a packaging mechanism; bundler's a tool for managing packages [21:25] they're related, but Rubygems is independent of Bundler [21:26] well its finding the deps, and indtalling them , so whoo. [21:26] holy crap its running [21:27] I'd like to again point out that it doesn't do anything to record its results [21:27] and apparently, its timed itself to 6 decimal places? [21:27] times what [21:27] timestamp goes out to seconds.###### [21:28] that's the default behavior of the Ruby logger library [21:28] but, yeah, there's no point in running that as-is for a long time [21:28] man, thats precise [21:28] because it doesn't yet actually do anything aside from spit results to the console [21:29] I'm not even sure if it handles pages correctly -- I *think* it does, but I haven't run it long enough to see how they get processed in the queue [21:30] i just had a though, do user profiles show the stories all on one page, regardless of how many there are, cause that might be a help. [21:30] possibly, but AFAIK there is no way to get a list of all users [21:31] other than doing my original idea, and yours is much faster and uses less resources over all, on both ends [21:32] I can tell you that my method results in a lot of duplicates [21:32] in particular, it doesn't yet account for the "last page" link in each story [21:32] that will have to be filtered out in the discovery logic [21:33] yeah i dont really have any thoughts for that [21:33] it's just more HTML scraping [21:33] although the chapter is just a number appended to the link [21:33] not hard, just needs to be done [21:33] the next and back buttons are javascript, i think [21:34] https://gist.github.com/705cd333e06178057dec [21:34] that's a list of 4,506 story links recovered by ffgrab [21:34] well [21:34] 4506 / 2 roughly [21:34] wait the number before the title, thats the last chapter? [21:34] that's a chapter indicator [21:35] so let ffgrab run till its done then grep for dupes and keeep the higest number [21:35] I'd rather fix it in the grabber [21:35] to ignore it, you'll have to change what stories_and_categories_of does at lines 12-13 [21:36] I'm not sure what the change is, as I haven't looked at ff.net's page structure close enough to make the discernment [21:36] its still faster that iterating through 10mil semi fake links [21:37] I am also suspicious of results like this: [21:37] I, [2011-12-07T15:33:31.381205 #75544] INFO -- : Found 0 categories, 0 stories from /book/My_Sweet_Audrina/ [21:37] in that case, there really are no entries that show up [21:37] but any 0/0 results make me suspicious that the script is missing something [21:38] i was right, it is js here