#archiveteam 2011-12-07,Wed

โ†‘back Search

Time Nickname Message
00:39 ๐Ÿ”— pberry slow mobile me is still slow
01:54 ๐Ÿ”— SketchCow I can.t put my finger on the precise Reasons.
01:54 ๐Ÿ”— SketchCow I think the Audio-Quality needs work.
01:54 ๐Ÿ”— SketchCow I hear a bit of echo near the start.
01:54 ๐Ÿ”— SketchCow The Face (especially the eyes) is in some shots slightly out of Focus.
01:54 ๐Ÿ”— SketchCow The Moving shot is a bit of a risk with fixed lenses.
01:54 ๐Ÿ”— zetathust for mug shots
01:54 ๐Ÿ”— SketchCow The Focussing Bit near the End distracts a bit (I think Camera Lenses overshoot to fast when adjusting).
01:54 ๐Ÿ”— SketchCow Barely literate critics, have to love them.
02:10 ๐Ÿ”— underscor hahaha
02:19 ๐Ÿ”— rude___ Tell him that certain types of eyes simply absorb vast amounts of light into their cones, throwing the shot slightly out of focus.. it's rare, it's unavoidable, shit happens.
02:19 ๐Ÿ”— underscor lol
02:20 ๐Ÿ”— yipdw "camera lenses overshoot to fast when adjusting"
02:20 ๐Ÿ”— zetathust lenses
02:20 ๐Ÿ”— yipdw what
02:22 ๐Ÿ”— yipdw eesh, yikes
02:22 ๐Ÿ”— yipdw [ec2-user@ip-10-243-119-16 files.splinder.com]$ pwd; ls -1 | wc -l
02:22 ๐Ÿ”— yipdw 22046
02:34 ๐Ÿ”— * SketchCow boots zetathust, and does this: http://www.youtube.com/watch?v=Mu71EAdnjQ0
02:43 ๐Ÿ”— dashcloud if someone's got a better place for me to upload the 7z tell me, otherwise here's a link to it on mediafire: http://www.mediafire.com/?49kgs4umrb79a34
02:44 ๐Ÿ”— PatC If you get a dropbox you can upload it there and copy a public url
02:44 ๐Ÿ”— dashcloud I don't actually
02:45 ๐Ÿ”— dashcloud I don't really have anywhere else to put it online that I can share from, so my apologies there- but it is a pretty small download
02:58 ๐Ÿ”— SketchCow Why not just throw on batcave?
02:58 ๐Ÿ”— SketchCow I can make it browsable.
03:02 ๐Ÿ”— dashcloud I don't have any logins or access- if you want to PM me something, I can throw it up there right away
03:03 ๐Ÿ”— Coderjoe you don't have rsync?
03:05 ๐Ÿ”— dashcloud I do have rsync
03:10 ๐Ÿ”— dashcloud so how would I go out using rsync to get the folder onto batcave?
03:15 ๐Ÿ”— dashcloud okay- it's uploading to batcave
03:19 ๐Ÿ”— dashcloud okay- it's up there
03:23 ๐Ÿ”— dashcloud just as a note- there are some gaps in the archive, because some pages the site points to simply aren't there anymore
03:33 ๐Ÿ”— bsmith093 anything for the ffnet scrape
03:36 ๐Ÿ”— bsmith093 is it possible to upload directly into ia, using ftp
03:47 ๐Ÿ”— chronomex no, but you can use http
03:47 ๐Ÿ”— chronomex http://www.archive.org/help/abouts3.txt
03:47 ๐Ÿ”— yipdw oh, that's awesome
03:47 ๐Ÿ”— yipdw I didn't know IA had an S3 interface
03:48 ๐Ÿ”— yipdw that means I can reuse AWS::S3 and all the fun related bits
03:48 ๐Ÿ”— chronomex yeah, it's super rad.
03:49 ๐Ÿ”— chronomex that, and being able to specify metadata with http headers, means you can drop items into archive.org from shellscripts with 0 hassle
03:49 ๐Ÿ”— yipdw I quite like that
03:51 ๐Ÿ”— DFJustin http://www.archive.org/create.php?ftp=1
03:53 ๐Ÿ”— chronomex oh, yeah, you can do that but it's kind of lousy.
03:58 ๐Ÿ”— bsmith093 yes but its easier to do that , than to use ftp to login firsat and create the mxml by hand
03:58 ๐Ÿ”— bsmith093 btw, as a library, IA kicks LoC in the nutsack
03:59 ๐Ÿ”— DFJustin loc's catalog is really nice
04:01 ๐Ÿ”— bsmith093 mostly because, when I search for anything in the Archvie, i can take the url of the *search* and dump it into jdownloader, which will then proceed to load and look for links, and find *every single result on the page*,and give me human readable links for them so i can poick and choose, without even having o click on each individual result
04:02 ๐Ÿ”— bsmith093 seriously thought, LoC website search is worse than useless, because it makes me give up, rather than keep looking, its just that bad
04:03 ๐Ÿ”— DFJustin yeah the interface sucks
04:03 ๐Ÿ”— DFJustin but at least the metadata is correct and somewhat consistent
04:03 ๐Ÿ”— underscor yipdw: Unless you need to create items with directories
04:03 ๐Ÿ”— underscor Then it sucks
04:03 ๐Ÿ”— underscor Although, it's a lot easier now that I have internal access
04:04 ๐Ÿ”— yipdw oh, I was thinking of using it to shove WARCs at the IA
04:05 ๐Ÿ”— underscor Then it's probably perfect
04:07 ๐Ÿ”— SketchCow TECHNICALLY it's not S3
04:07 ๐Ÿ”— SketchCow It's S3 like.
04:08 ๐Ÿ”— SketchCow Until this calms down: http://www.archive.org/~tracey/mrtg/derivesg.html
04:08 ๐Ÿ”— SketchCow I'll be focusing on other things.
04:10 ๐Ÿ”— underscor Oh wow
04:11 ๐Ÿ”— underscor It's mostly ximm with all his forever-running heritrix crawls
04:14 ๐Ÿ”— bsmith093 yipdw: but wouldn't you need to hand vreate an xml for each warc file?
04:15 ๐Ÿ”— yipdw bsmith093: why would I need to hand-create it
04:16 ๐Ÿ”— underscor S3 automatically creates the necessary XML based off of the headers you pass in
05:13 ๐Ÿ”— SketchCow http://www.poe-news.com/forums/sp.php?pi=1002546492
05:13 ๐Ÿ”— SketchCow poe-news.com has announced they're shutting down.
05:14 ๐Ÿ”— bsmith093 start the warc
05:19 ๐Ÿ”— bsmith093 this good? wget-warc -mpke robots=off -U "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" --warc-cdx --warc-file=poe-news.com_12022011 www.poe-news.com
05:20 ๐Ÿ”— dnova bsmith093: can you give me a very succinct idea of the current state of ffnet project?
05:20 ๐Ÿ”— dnova or a very meandering, sloppy narrative
05:20 ๐Ÿ”— dnova that'll work too
05:20 ๐Ÿ”— bsmith093 have ideas, cant code, got someihing half baked and don ish
05:21 ๐Ÿ”— bsmith093 underscor's working on a script to grab reviews and stories with storyinator
05:21 ๐Ÿ”— bsmith093 im just iterating thriugh every possible ffnet id, and culling the bad ones to make a linklist
05:28 ๐Ÿ”— bsmith093 underscor's way is almost certianly faster
05:32 ๐Ÿ”— arrith spidering the site like yipdw suggested might be the fastest
05:33 ๐Ÿ”— dnova arrith: can you explain #2 in the "extra credit"?
05:33 ๐Ÿ”— dnova http://learnpythonthehardway.org/book/ex8.html
05:35 ๐Ÿ”— arrith dnova: notice where double quotes get used versus where single quotes get used
05:35 ๐Ÿ”— arrith there's something unique about he double quoted sentence
05:35 ๐Ÿ”— dnova OH.
05:35 ๐Ÿ”— dnova the single quote wasn't escaped
05:36 ๐Ÿ”— arrith kinda
05:36 ๐Ÿ”— arrith just that there is a single quote
05:36 ๐Ÿ”— arrith usually when there's a single quote people use doubles
05:36 ๐Ÿ”— dnova hmph. well ok. thanks :)
05:36 ๐Ÿ”— arrith but yeah, you can escape it
05:37 ๐Ÿ”— arrith i dunno actually if people usually escape or not
05:37 ๐Ÿ”— arrith i've only seen doubles used then but i've only seen tutorialish code
05:56 ๐Ÿ”— bsmith093 spidering, i dont know how to tell wget to spider and save a linklist to then go back to
05:56 ๐Ÿ”— arrith not spider with wget
05:56 ๐Ÿ”— arrith spider with a ruby script that goes through the categories
05:56 ๐Ÿ”— bsmith093 also, on IA is it possible to edit an existing item?
05:56 ๐Ÿ”— arrith or python script
05:59 ๐Ÿ”— bsmith093 wheres the script, and how do i runit?
06:00 ๐Ÿ”— bsmith093 ive got something by underscor from a repo, that looks like ruby
06:01 ๐Ÿ”— arrith there isn't one
06:01 ๐Ÿ”— arrith you gotta make it
06:01 ๐Ÿ”— bsmith093 ugh
06:02 ๐Ÿ”— bsmith093 pardon me by yipdw git://gist.github.com/1432483.git
06:10 ๐Ÿ”— yipdw eh?
06:10 ๐Ÿ”— yipdw oh
06:10 ๐Ÿ”— bsmith093 yeah hiws that going, any updates
06:10 ๐Ÿ”— yipdw yeah, I maintain that only hitting what you need to hit is the fastest way to do it
06:10 ๐Ÿ”— yipdw I haven't touched it since then
06:10 ๐Ÿ”— yipdw other work, etc.
06:11 ๐Ÿ”— yipdw I think arrith wanted to port it to Python
06:11 ๐Ÿ”— yipdw you can run it right now, if you have a Ruby 1.9 environment with the connection_pool, girl_friday and mechanize gems installed
06:12 ๐Ÿ”— bsmith093 ok , wonderful, now, how do i get those modules installed?
06:14 ๐Ÿ”— bsmith093 rubygems1.9.1 or 1.9
06:19 ๐Ÿ”— arrith haha
06:20 ๐Ÿ”— arrith yipdw: yeah i was basically waiting to see what underscor ends up with and go from there
06:20 ๐Ÿ”— bsmith093 seriously how do i get those ruby modules installed?
06:20 ๐Ÿ”— arrith possibly switching to a spidering method to get updates
06:20 ๐Ÿ”— arrith bsmith093: http://www.google.com/search?q=rubygems+ubuntu
06:31 ๐Ÿ”— dnova good god
06:31 ๐Ÿ”— dnova bsmith: how many stories are on ffnet?
06:31 ๐Ÿ”— dnova do we know?
06:33 ๐Ÿ”— dnova less than or equal to 10,000,000 it looks like?
06:35 ๐Ÿ”— dnova or: what is the highest valid ID you've found?
06:36 ๐Ÿ”— bsmith093 ~7million
06:36 ๐Ÿ”— bsmith093 can some kind [erson walk me through how to insatll girl_friday gem, ive found the darn thing but it wont install with gem install
06:37 ๐Ÿ”— bsmith093 https://github.com/mperham/girl_friday.git
06:41 ๐Ÿ”— bsmith093 anyone?
06:43 ๐Ÿ”— dnova I have no ruby experience, sorry.
06:44 ๐Ÿ”— bsmith093 arrith
06:46 ๐Ÿ”— dnova bsmith,
06:46 ๐Ÿ”— dnova I think you need to relax just a little bit
06:48 ๐Ÿ”— dnova I added the project to the wiki frontpage
06:49 ๐Ÿ”— bsmith093 yeah , i know, im overtired and really need to sleep
06:49 ๐Ÿ”— chronomex dnova: spot on.
06:50 ๐Ÿ”— dnova ooh, thanks, chronomex
06:50 ๐Ÿ”— dnova any ideas/critiques are welcome
06:51 ๐Ÿ”— chronomex I meant with respect to relaxing, but the link looks good :)
06:51 ๐Ÿ”— dnova oh, lol
06:52 ๐Ÿ”— chronomex dang, it's been two months since I've uploaded anything
06:52 ๐Ÿ”— chronomex get busy time
06:53 ๐Ÿ”— dnova are you running the fix-dld script or what
06:53 ๐Ÿ”— dnova where are you getting all those splinder profiles!!
06:53 ๐Ÿ”— chronomex me?
06:53 ๐Ÿ”— chronomex I'm fix-dld
06:53 ๐Ÿ”— dnova ahh figured :D
06:53 ๐Ÿ”— chronomex was offline for a while.
06:53 ๐Ÿ”— dnova I'm downloading 2 users. have been for like 4 days
06:53 ๐Ÿ”— dnova one is over 12gb
06:53 ๐Ÿ”— dnova one is over 3
06:54 ๐Ÿ”— dnova I lost one that was over 10gb because I ran out of ram+swap :(
06:54 ๐Ÿ”— chronomex using tmpfs?
06:54 ๐Ÿ”— chronomex tmpfs is only a good idea for when you're doing a bunch of threads simultaneously
06:54 ๐Ÿ”— dnova not the way its supposed to be (i.e., not a ramdisk)
06:55 ๐Ÿ”— chronomex ?
06:55 ๐Ÿ”— chronomex no, the upload I'm doing now is to archive.org and not an archiveteam thing.
06:56 ๐Ÿ”— chronomex http://www.archive.org/details/bellsystem_PK-1C901-01
06:56 ๐Ÿ”— bsmith093 well, gnight/gmorning ,all, im gonna go sleep like i should have done 2hrs ago bye
06:56 ๐Ÿ”— dnova bsmith093: sleep well.
06:56 ๐Ÿ”— chronomex sleep well!
06:56 ๐Ÿ”— chronomex arrrgh
06:56 ๐Ÿ”— dnova :D
06:57 ๐Ÿ”— bsmith093 chronomex: ook now what?
06:57 ๐Ÿ”— chronomex bsmith093: ?
06:57 ๐Ÿ”— bsmith093 you said aargh
06:57 ๐Ÿ”— chronomex nvm
06:58 ๐Ÿ”— bsmith093 k night bye
06:59 ๐Ÿ”— dnova heh.
07:07 ๐Ÿ”— yipdw bsmith093: easiest way to install it is to get a Ruby environment, get Bundler (gem install bundler), and then install all the gems in the bundle (bundle install)
07:31 ๐Ÿ”— SketchCow Ops, please
16:42 ๐Ÿ”— DFJustin http://rbelmont.mameworld.info/?p=689
17:37 ๐Ÿ”— emijrp SketchCow: http://fromthepage.balboaparkonline.org/display/display_page?ol=w_rw_p_pl&page_id=1363#page/n0/mode/1up
19:03 ๐Ÿ”— SketchCow Nice
19:09 ๐Ÿ”— PepsiMax Aww yeah
19:09 ๐Ÿ”— PepsiMax Got my new VDSL2 hooked up.
19:09 ๐Ÿ”— PepsiMax 263.90kB/s uploading to alard
19:09 ๐Ÿ”— PepsiMax alard: more anyhub is coming!
20:57 ๐Ÿ”— bsmith093 and i just installed the gem connection_pool
20:57 ๐Ÿ”— bsmith093 ok i got ruby gems to install finally, and their all setup, except im still getting this error ffgrab.rb:1:in `require': no such file to load -- connection_pool (LoadError)
21:01 ๐Ÿ”— yipdw bsmith093: ruby -v
21:02 ๐Ÿ”— yipdw actually, just send me your full terminal log
21:02 ๐Ÿ”— bsmith093 ruby 1.8.7 (2010-01-10 patchlevel 249) [i486-linux]
21:02 ๐Ÿ”— yipdw connection_pool does not work with Ruby 1.8.7, because it uses BasicObject, which only exists in Ruby 1.9
21:02 ๐Ÿ”— yipdw also, Ruby 1.9 automatically loads Rubygems; 1.8.7 doesn't
21:02 ๐Ÿ”— bsmith093 apt install ruby1.9
21:02 ๐Ÿ”— yipdw which is where the error you're seeing comes from
21:02 ๐Ÿ”— bsmith093 cause i think i did that
21:03 ๐Ÿ”— bsmith093 ruby1.9 is already the newest version.
21:03 ๐Ÿ”— yipdw ruby1.9 -v
21:03 ๐Ÿ”— bsmith093 ruby 1.9.0 (2008-10-04 revision 19669) [i486-linux]
21:04 ๐Ÿ”— yipdw ugh
21:04 ๐Ÿ”— yipdw that's...way behind
21:04 ๐Ÿ”— bsmith093 ah, another repo?
21:04 ๐Ÿ”— yipdw Ruby (and projects like it) move too fast for Debian/Ubuntu to keep up, IMO
21:04 ๐Ÿ”— bsmith093 oh wait yeah i just noticed the 2008 thing, wow, thats old
21:04 ๐Ÿ”— yipdw unless I can control the Ruby packages (e.g. for production environments) I use https://rvm.beginrescueend.com/
21:05 ๐Ÿ”— yipdw it bypasses package management, but for me, the benefit outweighs that cost
21:07 ๐Ÿ”— bsmith093 got rvm now, grabbing ruby 1.9.3
21:07 ๐Ÿ”— bsmith093 should i dump the ubuntu repo ruby?
21:07 ๐Ÿ”— yipdw only if you want to, it's not necessary
21:07 ๐Ÿ”— yipdw to dump it
21:08 ๐Ÿ”— bsmith093 k then, will this install it like a normal package?
21:08 ๐Ÿ”— yipdw RVM does not use apt, so no
21:08 ๐Ÿ”— ersi lol @ a language moving so fast you can't package it
21:09 ๐Ÿ”— yipdw it will, however, modify your environment's PATH to work out
21:09 ๐Ÿ”— yipdw ersi: it's not that uncommon
21:09 ๐Ÿ”— bsmith093 yeah ive never heard of that
21:09 ๐Ÿ”— ersi sounds more like a dialect, that forks all the time
21:09 ๐Ÿ”— yipdw I actually more often construct development environments directly from upstream than I do via OS packages
21:09 ๐Ÿ”— bsmith093 although i must say, this is the smoothest, complex thing i ve ever done
21:10 ๐Ÿ”— bsmith093 how do i keep it updated?
21:10 ๐Ÿ”— yipdw ersi: in particular, I've found that following upstream directly pays off for Node.js, factor, and GHC
21:10 ๐Ÿ”— yipdw bsmith093: rvm install [Ruby version]
21:11 ๐Ÿ”— bsmith093 so i have to know the version, i have , or the version i want to get?
21:11 ๐Ÿ”— yipdw ersi: also, the syntax and semantics of Ruby don't change that often (although ruby-core has been doing some WTFs in that regard lately)
21:11 ๐Ÿ”— yipdw ersi: the libraries, on the other hand
21:11 ๐Ÿ”— yipdw bsmith093: yes; rvm list will show you those
21:11 ๐Ÿ”— bsmith093 oh, wow this is cool, ive also never had this much feedback from a compiler that i could actually follow
21:12 ๐Ÿ”— ersi I'm having a hard time understanding how a 10+year language can move so fast it's bleeding edge all the time
21:12 ๐Ÿ”— yipdw the language itself does not
21:12 ๐Ÿ”— yipdw implementations and libraries do
21:14 ๐Ÿ”— bsmith093 you know what would be nice? a dummy package for every linux distro, that does [language]-all, and grabs everything in the repos for that language
21:14 ๐Ÿ”— yipdw that would be infeasibly huge
21:14 ๐Ÿ”— bsmith093 how big could that possibly be?
21:14 ๐Ÿ”— yipdw for Ruby alone there's 31,503 libraries
21:15 ๐Ÿ”— yipdw Java would be an order of magnitude larger
21:15 ๐Ÿ”— bsmith093 mother of Turing, that's a lot of development
21:15 ๐Ÿ”— bsmith093 and to be fair, java mostly take care of it self as it needs to
21:16 ๐Ÿ”— bsmith093 keep jvm updated and afaik thats all u need to worry about
21:16 ๐Ÿ”— yipdw Hackage lists, uh
21:17 ๐Ÿ”— yipdw something around 3633 packages for Haskell
21:17 ๐Ÿ”— bsmith093 ok, ok, so languages are much bigger that I thought, in their entirety
21:18 ๐Ÿ”— yipdw yeah -- I find that a language is really nothing without its libraries
21:18 ๐Ÿ”— yipdw I mean, sure, you can install an implementation of a language
21:18 ๐Ÿ”— yipdw but it's really pretty useless on its own
21:18 ๐Ÿ”— bsmith093 hey another thing, does a sudo operation keep root until its done, or is their a timer somewhere?
21:19 ๐Ÿ”— bsmith093 because ive had things crap out asking for rights halfway through
21:19 ๐Ÿ”— bsmith093 rubys's done
21:20 ๐Ÿ”— bsmith093 annnnd.. same error as last time only this time ruby -v ruby 1.8.7 (2010-01-10 patchlevel 249) [i486-linux]
21:20 ๐Ÿ”— yipdw rvm use 1.9.3
21:21 ๐Ÿ”— bsmith093 /home/ben/.rvm/rubies/ruby-1.9.3-p0/lib/ruby/site_ruby/1.9.1/rubygems/custom_require.rb:36:in `require': cannot load such file -- connection_pool (LoadError)
21:21 ๐Ÿ”— bsmith093 from /home/ben/.rvm/rubies/ruby-1.9.3-p0/lib/ruby/site_ruby/1.9.1/rubygems/custom_require.rb:36:in `require'
21:21 ๐Ÿ”— bsmith093 from ffgrab.rb:1:in `<main>'
21:21 ๐Ÿ”— yipdw paste me the full terminal outpuyt
21:22 ๐Ÿ”— bsmith093 /home/ben/.rvm/rubies/ruby-1.9.3-p0/lib/ruby/site_ruby/1.9.1/rubygems/custom_require.rb:36:in `require': cannot load such file -- connection_pool (LoadError)
21:22 ๐Ÿ”— bsmith093 ben@ben-laptop:~/1432483$
21:22 ๐Ÿ”— bsmith093 from /home/ben/.rvm/rubies/ruby-1.9.3-p0/lib/ruby/site_ruby/1.9.1/rubygems/custom_require.rb:36:in `require'
21:22 ๐Ÿ”— bsmith093 from ffgrab.rb:1:in `<main>'
21:22 ๐Ÿ”— bsmith093 ruby ffgrab.rb
21:22 ๐Ÿ”— bsmith093 thats what i get
21:22 ๐Ÿ”— yipdw gem install bundler; bundle install
21:23 ๐Ÿ”— yipdw the Gemfile in the gist repo is a dependency manifest
21:23 ๐Ÿ”— yipdw for Bundler
21:24 ๐Ÿ”— bsmith093 i nthought that was important, i kept trying ruby Gemfile on the offchance something would happen, this is not an intuitive lang to install
21:25 ๐Ÿ”— bsmith093 Fetching source index for http://rubygems.org/
21:25 ๐Ÿ”— bsmith093 now that seems like i would need that for gems, becasue thats where i found connection_pool and girl_friday
21:25 ๐Ÿ”— yipdw Rubygems is a packaging mechanism; bundler's a tool for managing packages
21:25 ๐Ÿ”— yipdw they're related, but Rubygems is independent of Bundler
21:26 ๐Ÿ”— bsmith093 well its finding the deps, and indtalling them , so whoo.
21:26 ๐Ÿ”— bsmith093 holy crap its running
21:27 ๐Ÿ”— yipdw I'd like to again point out that it doesn't do anything to record its results
21:27 ๐Ÿ”— bsmith093 and apparently, its timed itself to 6 decimal places?
21:27 ๐Ÿ”— yipdw times what
21:27 ๐Ÿ”— bsmith093 timestamp goes out to seconds.######
21:28 ๐Ÿ”— yipdw that's the default behavior of the Ruby logger library
21:28 ๐Ÿ”— yipdw but, yeah, there's no point in running that as-is for a long time
21:28 ๐Ÿ”— bsmith093 man, thats precise
21:28 ๐Ÿ”— yipdw because it doesn't yet actually do anything aside from spit results to the console
21:29 ๐Ÿ”— yipdw I'm not even sure if it handles pages correctly -- I *think* it does, but I haven't run it long enough to see how they get processed in the queue
21:30 ๐Ÿ”— bsmith093 i just had a though, do user profiles show the stories all on one page, regardless of how many there are, cause that might be a help.
21:30 ๐Ÿ”— yipdw possibly, but AFAIK there is no way to get a list of all users
21:31 ๐Ÿ”— bsmith093 other than doing my original idea, and yours is much faster and uses less resources over all, on both ends
21:32 ๐Ÿ”— yipdw I can tell you that my method results in a lot of duplicates
21:32 ๐Ÿ”— yipdw in particular, it doesn't yet account for the "last page" link in each story
21:32 ๐Ÿ”— yipdw that will have to be filtered out in the discovery logic
21:33 ๐Ÿ”— bsmith093 yeah i dont really have any thoughts for that
21:33 ๐Ÿ”— yipdw it's just more HTML scraping
21:33 ๐Ÿ”— bsmith093 although the chapter is just a number appended to the link
21:33 ๐Ÿ”— yipdw not hard, just needs to be done
21:33 ๐Ÿ”— bsmith093 the next and back buttons are javascript, i think
21:34 ๐Ÿ”— yipdw https://gist.github.com/705cd333e06178057dec
21:34 ๐Ÿ”— yipdw that's a list of 4,506 story links recovered by ffgrab
21:34 ๐Ÿ”— yipdw well
21:34 ๐Ÿ”— yipdw 4506 / 2 roughly
21:34 ๐Ÿ”— bsmith093 wait the number before the title, thats the last chapter?
21:34 ๐Ÿ”— yipdw that's a chapter indicator
21:35 ๐Ÿ”— bsmith093 so let ffgrab run till its done then grep for dupes and keeep the higest number
21:35 ๐Ÿ”— yipdw I'd rather fix it in the grabber
21:35 ๐Ÿ”— yipdw to ignore it, you'll have to change what stories_and_categories_of does at lines 12-13
21:36 ๐Ÿ”— yipdw I'm not sure what the change is, as I haven't looked at ff.net's page structure close enough to make the discernment
21:36 ๐Ÿ”— bsmith093 its still faster that iterating through 10mil semi fake links
21:37 ๐Ÿ”— yipdw I am also suspicious of results like this:
21:37 ๐Ÿ”— yipdw I, [2011-12-07T15:33:31.381205 #75544] INFO -- : Found 0 categories, 0 stories from /book/My_Sweet_Audrina/
21:37 ๐Ÿ”— yipdw in that case, there really are no entries that show up
21:37 ๐Ÿ”— yipdw but any 0/0 results make me suspicious that the script is missing something
21:38 ๐Ÿ”— bsmith093 i was right, it is js here <input value="&nbsp;&lt; Prev&nbsp;" onclick="self.location='/s/7066342/6/The_Same_Will_Never_Happen_to_You'" type="BUTTON"> <select title="chapter navigation" name="chapter" onchange="self.location = '/s/7066342/'+ this.options[this.selectedIndex].value + '/The_Same_Will_Never_Happen_to_You';"><option value="1">1. Such a Shame</option><option value="2">2. Don't Do This</option><option value="3">3. The
21:39 ๐Ÿ”— yipdw that's not the link I was talking about
21:39 ๐Ÿ”— yipdw look at e.g. http://www.fanfiction.net/comic/300
21:39 ๐Ÿ”— yipdw see the ร‚ยป link?
21:39 ๐Ÿ”— yipdw that's the link to the last completed chapter
21:39 ๐Ÿ”— yipdw which is the link that the discovery code is picking up (and shouldn't pick up)
21:40 ๐Ÿ”— yipdw there's a few ways to fix that
21:40 ๐Ÿ”— bsmith093 hey I never noticed that before
21:41 ๐Ÿ”— yipdw anyway, I need to try to finish up some webapp work at work
21:41 ๐Ÿ”— yipdw which is a shitload of fuck related to the DOM and event propagation
21:41 ๐Ÿ”— yipdw as James Rolfe might put it
21:41 ๐Ÿ”— yipdw brb
21:41 ๐Ÿ”— bsmith093 wait so grab fanfiction.net/storyid/1 and that last chapter link, and generate all the rest of the links between them
21:41 ๐Ÿ”— bsmith093 yeah work comes first
21:42 ๐Ÿ”— bsmith093 lol nice reference
21:42 ๐Ÿ”— yipdw that's one possibility; another possibility is to just have wget-warc follow the links
21:42 ๐Ÿ”— bsmith093 thats what i tired it only grabbed 300k files
21:43 ๐Ÿ”— bsmith093 wget-warc -mcpke robots off with ua for firefox
21:43 ๐Ÿ”— bsmith093 speaking of which, I'm still grabbing poe-news
21:59 ๐Ÿ”— emijrp IA is going to create a collection for Occupy movement http://blog.archive.org/2011/12/07/archive-it-team-encourages-your-contributions-to-the-%E2%80%9Coccupy-movement%E2%80%9D-collection/
21:59 ๐Ÿ”— emijrp but I think that there is no collection for Spanish Revolution or Arab Spring
22:06 ๐Ÿ”— emijrp I have many links to share if IA creates an Archive-It collection. I offered my help some weeks ago.
22:07 ๐Ÿ”— emijrp (I mean about Spanish Rev.)
22:20 ๐Ÿ”— DFJustin you can upload the stuff now and the collection can be made later
22:23 ๐Ÿ”— emijrp I prefer to use the Archive-It system. I don't want to upload a tarball with websites that can be viewed online.
22:24 ๐Ÿ”— emijrp Or 200 gb of videos (i have 6000+) because i cannot with my home connection.
22:24 ๐Ÿ”— yipdw bsmith093: ok, so, I've got a variant of ffgrab recording story IDs in a Redis instance
22:25 ๐Ÿ”— emijrp I'm tired of content being uploaded to IA as huge boxes that cant be viewed easily.
22:27 ๐Ÿ”— bsmith093 emijrp: meaning hat, exactly
22:27 ๐Ÿ”— bsmith093 huge iso files?
22:28 ๐Ÿ”— emijrp scrapes of forums, blogs hostings, geocities, wikis, yahoo videos
22:28 ๐Ÿ”— bsmith093 whats wrong with that?
22:29 ๐Ÿ”— DFJustin it's not great, but it's better to get the stuff backed up in some form first
22:29 ๐Ÿ”— emijrp that you cant use them easily
22:29 ๐Ÿ”— bsmith093 IA , afaik, isn't really meant as a mirror, its an archive, of the raw data, meant for historical research purpose s
22:30 ๐Ÿ”— bsmith093 im sure there's a script for that somewhere
22:30 ๐Ÿ”— bsmith093 besides that, complain to them, not archiveteam.
22:31 ๐Ÿ”— emijrp IA has always offered content in a viewable way (wayback, videos and audio with metadata)
22:31 ๐Ÿ”— bsmith093 thers an entire section of ia dedicated to geocities
22:31 ๐Ÿ”— emijrp archiveteam is uploading dozen-GB tarballs with apcked content
22:32 ๐Ÿ”— DFJustin it's just a manpower thing, right now it's all jason can do just to keep up with the tarballs coming in
22:33 ๐Ÿ”— bsmith093 i would imagine so, you cant grep through a tarball that i know of, and im sure he's doing the best he can, speaking of which, he's not the only person there doing what he's doing , is he?
22:33 ๐Ÿ”— bsmith093 SketchCow: voworkers?
22:33 ๐Ÿ”— DFJustin priority #1 has to be getting things off random people's hard drives and into IA's backup infrastructure so it doesn't just go poof
22:34 ๐Ÿ”— bsmith093 true and thats better than nothing by a long shot
22:34 ๐Ÿ”— bsmith093 but emijrp has a valid point, there needs to be a way to search through all this crapload of otherwise nearly-useless data
22:34 ๐Ÿ”— DFJustin for sure
22:35 ๐Ÿ”— dnova yeah. you download it, untar it, and look through it
22:35 ๐Ÿ”— bsmith093 seriously, is it opossible to search through a remote tarball, cause that would be awsome
22:35 ๐Ÿ”— dnova most of these collections aren't meant for casual browsing, afaik.
22:35 ๐Ÿ”— yipdw bsmith093: curl http://[host][path][file].tar.gz | gunzip -c
22:35 ๐Ÿ”— bsmith093 and some, like the utzoo tapes are slightly damaged and should be repaired
22:36 ๐Ÿ”— bsmith093 yip,thats a remote search, asin it doesnt download all of it first
22:36 ๐Ÿ”— yipdw that doesn't download all of it
22:36 ๐Ÿ”— emijrp YES, I going to download the 600GB Geogicites pack to watch a site. The good approach is geocities.ws or the mirrors people created.
22:36 ๐Ÿ”— yipdw it only goes until you terminate the site
22:36 ๐Ÿ”— yipdw er, connection
22:36 ๐Ÿ”— Ymgve http://news.slashdot.org/story/11/12/07/2034200/library-of-congress-to-receive-entire-twitter-archive
22:36 ๐Ÿ”— Ymgve cool
22:37 ๐Ÿ”— dnova emijrp: nobody is stopping YOU from making mirrors
22:37 ๐Ÿ”— yipdw if you want something more sophisticated than that, you need to build an index
22:37 ๐Ÿ”— dnova not everyone can afford to host these things.
22:39 ๐Ÿ”— yipdw hmm, my ff link grabber is still a bit retarded
22:39 ๐Ÿ”— yipdw I, [2011-12-07T16:39:37.370755 #78706] INFO -- : Found 0 categories, 0 stories from /r/6551377/
22:39 ๐Ÿ”— yipdw should ignore these:
22:39 ๐Ÿ”— yipdw I, [2011-12-07T16:39:39.987928 #78706] INFO -- : Found 0 categories, 0 stories from /u/1148547/Hobbit4Lyfe
22:39 ๐Ÿ”— yipdw oh well
22:39 ๐Ÿ”— DFJustin videos can be uploaded to archive.org right now to the community videos collection, and then once there's a bunch of them it should be easy to poke someone and get them to create a collection
22:40 ๐Ÿ”— DFJustin if you don't have bandwidth then recruit some buddies
22:40 ๐Ÿ”— yipdw heh
22:41 ๐Ÿ”— yipdw 1.9.2-p290 :015 > b = Redis.new.smembers('stories').map(&:to_i).sort; [b.length, b.min, b.max]
22:41 ๐Ÿ”— yipdw => [89974, 158, 7617073]
22:41 ๐Ÿ”— yipdw that's a pretty sparsely inhabited space
22:42 ๐Ÿ”— yipdw granted, that doesn't include any of the crossovers etc
22:57 ๐Ÿ”— bsmith093 wait 80% full is sparsely inhabited
22:59 ๐Ÿ”— bsmith093 afaik, every genre page has its own crossover page, and would it kill somebody to back this script up by sorting the good/bad story ids?
23:00 ๐Ÿ”— bsmith093 becasue the lowest story id is 4, if im reading that right, yours says 158

irclogger-viewer