#archiveteam 2012-01-08,Sun

↑back Search

Time Nickname Message
00:48 🔗 yipdw proust position post: pull progressing; projection pending
01:18 🔗 yipdw and I just learned that not all x86_64 CPUs from Intel do 40-bit physical addressing
01:18 🔗 yipdw that's so weird
01:23 🔗 Wyatt|NOC Just a quick note before I forget; seems Tim Follin has floppies with his old source in "Einstein" format (Tatung Einstein?). Do we have the capability to read them?
01:27 🔗 Wyatt|NOC Speaking of those (this is the first I've heard of them), this page is pretty amazing http://www.tatungeinstein.co.uk/front/bandhcomputers.htm
01:27 🔗 Wyatt|NOC In various meanings.
01:44 🔗 yipdw SketchCow: do you think it makes sense to archive private stories on Proust?
01:45 🔗 yipdw we don't get much out of it
01:45 🔗 yipdw and from what I'm seeing, they are (1) the majority of stories; (2) easily identified
01:47 🔗 yipdw or, to put it another way, 100 * (12 / 174) = 6.89% of downloaded stories are public
03:09 🔗 Wyatt|NOC But that is a snazzy hat, SketchCow. How could people _not_ want to be interviewed by that?
03:19 🔗 chronomex for serious
03:38 🔗 nitro2k01 yipdw: Which don't? It would make sense if the early P4s or some budget chips don't for example
03:43 🔗 yipdw nitro2k01: the Xeon E5430 does 38-bit physical addressing, or at least the one reported on my EC2 instance is
03:43 🔗 yipdw a friend's Core i5-2500K reports 36-bit
03:43 🔗 yipdw my E5520 reports 40-bit
03:43 🔗 yipdw I think Intel just cripples their chips for some market segmentation dealie
03:46 🔗 Wyatt|Wor You know, I never thought about all that. How does that affect performance, exactly?
03:47 🔗 yipdw it doesn't realy
03:47 🔗 yipdw it mostly affects how much RAM you can access
03:48 🔗 yipdw because all those chips do 48-bit virtual addressing, and in 64-bit mode the software is juggling 64-bit pointers anyway
03:54 🔗 Wyatt|Wor So if you put more than 8GB memory in a machine with 36-bit physical addressing, it will...?
03:55 🔗 yipdw the memory above 8 GiB won't be addressable without bank-switching tomfoolery
03:56 🔗 Wyatt|Wor So it'll degrade performance a bit.
03:57 🔗 yipdw actually, I don't know if x86_64 can even address > 8 GiB in that situation
03:58 🔗 Wyatt|Wor I think it should be able to. PAE has been available on commodity parts for years.
03:58 🔗 yipdw IIRC, PAE is only applicable to 32-bit processors
03:59 🔗 yipdw or more specifically 32-bit modes
04:25 🔗 Wyatt|Wor Okay, my bad. I forgot that memory is byte-addressable. 36-bit memory address width is 64GB
04:57 🔗 bsmith093 is torrent.textfiles.com stil availible?
05:00 🔗 Coderjoe by your powers combined, i am...
05:00 🔗 Coderjoe CAPTAIN ARCHIVE!
05:00 🔗 Wyatt|Wor Wilford Brimley!
05:04 🔗 bsmith093 i had to google for that ref, good one
05:07 🔗 PatC Coderjoe, you the Coderjoe from tgg on freenode?
05:27 🔗 Coderjoe ...
05:45 🔗 underscor alard: Are we gonna try and get the new anyhub prefixes?
06:08 🔗 SketchCow yipdw: is archiving a private story hard?
06:08 🔗 SketchCow If you can get to it from the net, it's not private.
06:19 🔗 chronomex tumblr is the geocities of 2010: http://gutsygumshoe.tumblr.com/
07:05 🔗 yipdw SketchCow: I haven't looked into whether or not having an account gives you further access to said private stories
07:05 🔗 yipdw I'll check that
07:06 🔗 yipdw hmm, nope, no further access
07:09 🔗 no2pencil private stories? Sounds provocative
07:09 🔗 yipdw there is GET http://www.proust.com/ac/story/export/generate
07:09 🔗 yipdw that relies on session data
07:10 🔗 yipdw and for Proust, said session data is server-side
07:15 🔗 yipdw well, maybe
07:15 🔗 bsmith093 so what IS the status of the klol script?
07:15 🔗 yipdw underscor: what email address did you use?
07:15 🔗 yipdw underscor: to register with Proust
07:15 🔗 yipdw I wonder if I can trick it to download other users' data
07:15 🔗 yipdw "it" being the PDF exporter
07:16 🔗 bsmith093 wouldnt tht mean they had really horrible securityZ?
07:16 🔗 yipdw it's not uncommon
07:16 🔗 balrog hi SketchCow
07:16 🔗 balrog might want to be careful with SoftDisk for Apple II, those are still legitimately distributed.
07:16 🔗 yipdw also, it's the only way I see to actually get the stories that are shared only with family and friends
07:16 🔗 yipdw I mean, I *could* also just friend everyone on Proust and hope they reciprocate
07:16 🔗 yipdw but for now I'm just getting ones marked as public
07:18 🔗 bsmith093 my upload from yesterday is done its in bsmith on batcave as a 7z called ffnet_dump_and_script.7z, just fyi if someone wants to contiue where i left off 112025 stories grabbed out of 3.6 million, in the folder books
07:18 🔗 bsmith093 bye now, good luck with proust
07:18 🔗 yipdw just as an FYI, none of us have read access to that\
07:19 🔗 Wyatt|Wor bsmith093: Is the script up on github?
07:19 🔗 bsmith093 ah well ok then ummm it sdhould be hold on
07:20 🔗 bsmith093 http://code.google.com/p/fanficdownloader-fork/downloads/detail?name=fanficdownloader-fork0.0.1.7z
07:20 🔗 bsmith093 no, not github, but heres the link to a repo i set up
07:20 🔗 yipdw 7.2 megabytes?
07:20 🔗 bsmith093 that has everything except the stories
07:20 🔗 bsmith093 damn your quick
07:21 🔗 bsmith093 run automate.sh link to grab all the storeies in sequence
07:22 🔗 bsmith093 automate runs download.py using enery line of link in order, it will take several months to complete and there will be new stories by then anyway but this is a complete list as of several weeks ago
07:22 🔗 bsmith093 i reccommend using a vps or something you dont have to leave on yourself
07:23 🔗 yipdw there's no way it has to take several months
07:23 🔗 bsmith093 then you fix the code then, i just ran it for a week straight, and got oly 112k storis
07:23 🔗 yipdw I did fix it :P
07:24 🔗 bsmith093 you and your ruby voodoo, this is why i like bash, it Just Works (TM)
07:24 🔗 yipdw bash actually has some serious portability problems
07:24 🔗 yipdw we've hit them quite often here
07:24 🔗 bsmith093 anyway storis is the raw id list and link is the id list wrapped up into url form
07:25 🔗 yipdw for example, du-helper.sh in splinder-grab exists solely to paper over differences between GNU and BSD du
07:25 🔗 yipdw and it's not really perfect
07:25 🔗 yipdw to be fair, that's not bash per se, but a dependency of a bash script
07:25 🔗 bsmith093 well ruby has some serious noob coder issues, and likes to spit back cryptic error messages to me
07:25 🔗 yipdw but even within bash-the-language there's real problems between versions
07:26 🔗 bsmith093 from story_grab.rb:1
07:26 🔗 bsmith093 ruby story_grab.rb 8
07:26 🔗 bsmith093 story_grab.rb:1:in `require': no such file to load -- mechanize (LoadError)
07:26 🔗 bsmith093 for examply i thought i fixed this last night?!?!
07:26 🔗 yipdw that means that a file called "mechanize" can't be loaded
07:27 🔗 yipdw make sure you're using the right Ruby installation
07:27 🔗 bsmith093 rmv use 1.9.3
07:27 🔗 bsmith093 using 1.9.3p0
07:28 🔗 bsmith093 now what?
07:28 🔗 yipdw ensure the mechanize gem is present
07:29 🔗 bsmith093 gem install mechanize
07:29 🔗 yipdw gem list -i mechanize
07:29 🔗 bsmith093 true
07:29 🔗 yipdw then it's installed
07:29 🔗 yipdw run it again
07:29 🔗 yipdw the girl_friday and connection_pool gems are also used
07:29 🔗 bsmith093 stack trace
07:30 🔗 yipdw ok
07:30 🔗 yipdw what is it that you want to save from fanfiction.net?
07:30 🔗 yipdw http://archiveteam.org/index.php?title=FanFiction.Net doesn't state what
07:31 🔗 bsmith093 the stories, minimum, the reviews and author profiles would be really nice
07:31 🔗 yipdw ok
07:32 🔗 bsmith093 ruby story_grab.rb 8 maybe this is a stupid question, but i am running this right, right?
07:32 🔗 yipdw so stories, reviews, author profiles
07:32 🔗 yipdw yes, that's correct
07:33 🔗 yipdw although that script will not handle stories without chapters correctly; it needs to be modified for that
07:33 🔗 bsmith093 yes, that would be great
07:33 🔗 bsmith093 every story has atleast one chapter
07:33 🔗 yipdw that script will not handle stories without >= 2 chapters
07:34 🔗 bsmith093 ohhhh thats what u meant?!
07:34 🔗 bsmith093 ok that makes more sense, check the link i gave u, they solved that problem, the google group in fanficdownloader
07:34 🔗 yipdw I know what the problem is
07:34 🔗 bsmith093 really , ehat
07:35 🔗 yipdw see lines 24-28
07:35 🔗 yipdw there's an assumption that the chapter box is present
07:35 🔗 yipdw as I mentioned, that script is just a test
07:35 🔗 Jofo if anyone, I feel like this group would appreciate this link http://www.therestartpage.com/#
07:35 🔗 yipdw to demonstrate that it is possible to download a multi-chapter story in less than 2.5 seconds per chapter
07:35 🔗 yipdw for actual use it needs to be expanded
07:36 🔗 bsmith093 check downloader py and all the stuff it refs
07:36 🔗 bsmith093 they solved this somehow
07:36 🔗 yipdw I know how to solve it
07:36 🔗 bsmith093 ....and?
07:36 🔗 yipdw (1) download the first page; (2) if a chapter box is present, add chapters (2..n) to the queue
07:37 🔗 yipdw and I'm not working for you, so I haven't solved it?
07:37 🔗 bsmith093 ummm, ok so grep the page and se if the chaoter box is there?
07:37 🔗 yipdw yes, and if it's there then initiate further downloads
07:37 🔗 yipdw if it isn't there, you're done
07:39 🔗 yipdw I can expand the story finder and downloader, but I don't know when
07:39 🔗 bsmith093 sorry for being so rude, i thought this was a bigger issue than it truned out to be
07:39 🔗 yipdw it isn't
07:39 🔗 yipdw downloading fanfiction.net is really trivial
07:39 🔗 yipdw well, at least the reviews, stories, and user profiles
07:39 🔗 yipdw however I am working on other things
07:43 🔗 yipdw hmm
07:43 🔗 yipdw that said, if I say it's really trivial, I guess I better go do it, right
07:44 🔗 bsmith093 so, im looking thorugh the mechanize docs, and this looks like some if's and a agent.search thing
07:45 🔗 bsmith093 http://mechanize.rubyforge.org/GUIDE_rdoc.html way at the bottom
07:51 🔗 yipdw yeah, that's pretty much it
07:54 🔗 bsmith093 if agents.search("chapter" i am horrible with syntax
08:38 🔗 yipdw bsmith093: https://gist.github.com/1577729 is a set of scripts that will grab stories, reviews, and profile for that story
08:38 🔗 yipdw https://s3.amazonaws.com/nw-depot/example_run.tar.gz is an example of two runs of get_one_story.rb
08:38 🔗 bsmith093 thanks, seriously.
08:39 🔗 yipdw one on story ID 8, and one on story 4089014, which I chose because it has 701 reviews and 60 chapterws
08:39 🔗 yipdw I have not yet inspected the WARCs
08:39 🔗 yipdw but they should work
08:39 🔗 yipdw actually, they might be slightly broken -- I'm not sure if --page-requisites is doing what I think it's doing
08:39 🔗 yipdw time to fire up wayback
08:39 🔗 yipdw so, yeah
08:40 🔗 yipdw I don't think you need to grab one profile per story
08:40 🔗 yipdw it is probably better to queue all the URLs up and just fetch once per unique URL
08:40 🔗 yipdw but that depends on your approach
08:41 🔗 bsmith093 warcs can be fixed later, to be honest, i have no idea why session data is useful to anyone, even the archivers.
08:42 🔗 yipdw request/response headers tell you the circumstances under which a resource was retrieved, which is important for determining what state that resource is in
08:42 🔗 yipdw because Web resources can change their content depending on headers
08:43 🔗 bsmith093 thee that dynamic?
08:43 🔗 yipdw Web resources can change based on *anything*
08:43 🔗 bsmith093 i need to earn to type slower
08:43 🔗 bsmith093 oy "P
08:44 🔗 yipdw oh, fuck
08:44 🔗 yipdw yeah, I didn't fetch the images or CSS
08:44 🔗 yipdw that needs to be fixed
08:45 🔗 yipdw oh, damnit
08:45 🔗 yipdw the chapter selector doesn't work in the WARC
08:45 🔗 yipdw because it suffixes the name of the story
08:45 🔗 yipdw that's fairly annoying
08:46 🔗 yipdw bsmith093: if you want to see what I'm "oh, fuck"ing about: https://s3.amazonaws.com/nw-depot/wayback1.png
08:48 🔗 bsmith093 i would say thats fine the images dont change much ever
08:48 🔗 yipdw it's not fine, it's incomplete
08:48 🔗 bsmith093 grab once and link to them
08:48 🔗 yipdw just needs some wget tweaks though
08:48 🔗 Wyatt|Wor bsmith093: There's no emergency, so there's no reason not to do it right.
08:48 🔗 bsmith093 ok, then
08:48 🔗 yipdw also I want to find a way to get that chapter selector working
08:49 🔗 yipdw ALL THAT SAID
08:49 🔗 yipdw if all you want is the text, the text is there
08:50 🔗 yipdw hmm
08:50 🔗 yipdw I wonder how hard it'd be to set up our own Wayback Machine
08:50 🔗 yipdw with a WARC upload UI
08:50 🔗 yipdw that'd make checking archives pretty snazy
08:50 🔗 yipdw snazzy, too
08:51 🔗 * yipdw tries
08:51 🔗 Wyatt|Wor Haha, I was just thinking a warc viewer for my phone would be neat too.
08:51 🔗 yipdw wayback seems to already support that in some capacity, so maybe I just need to throw on some UI code
08:51 🔗 yipdw Wyatt|Wor: I wish there was a lightweight WARC viewer out there
08:51 🔗 Wyatt|Wor Actually, are there browser plugins for warc files or something?
08:51 🔗 yipdw I wish :P
08:51 🔗 Wyatt|Wor I didn't even think to look.
08:51 🔗 yipdw if you find one, let me know
08:52 🔗 Wyatt|Wor Ah...will do.
08:52 🔗 yipdw wayback is the only thing I've found that will render a WARC's content in a Web browser
08:52 🔗 bsmith093 WARNING: Installing to ~/.gem since /var/lib/gems/1.8 and /var/lib/gems/1.8/bin aren't both writable. WARNING: You don't have /home/ben/.gem/ruby/1.8/bin in your PATH, gem executables will not run.
08:52 🔗 yipdw and it's pretty heavy
08:52 🔗 bsmith093 thats the output of gem install mechanize
08:52 🔗 bsmith093 it worked but i figure huge warnings are notable
08:52 🔗 yipdw bsmith093: if you're using your system's Ruby installation, that'll happen
08:53 🔗 * Wyatt|Wor flinches at the mention of ruby gems.
08:53 🔗 bsmith093 happens every tim i try to run make_story_urls
08:53 🔗 yipdw you either need to grant your user write permission to those directories (ick) or use a Ruby distribution that your user controls
08:53 🔗 bsmith093 ive got rvm in my home dir
08:53 🔗 yipdw rvm is good for setting up the latter
08:54 🔗 Wyatt|Wor bsmith093: ...or set your $PATH.
08:54 🔗 yipdw rvm isn't a Ruby distribution; it just manages distributions
08:54 🔗 yipdw yeah, or that
08:54 🔗 yipdw but mechanize's executables are not used by make_story_urls so
08:54 🔗 bsmith093 wheres $PATH, in the configs
08:54 🔗 Wyatt|Wor Or use your distro's package manager to install the gem.
08:54 🔗 yipdw PATH is an environment variable, but don't worry about it
08:54 🔗 bsmith093 require': no such file to load -- mechanize (LoadError)
08:54 🔗 bsmith093 from make_story_urls.rb:3
08:54 🔗 bsmith093 before and after
08:54 🔗 yipdw do this
08:54 🔗 yipdw add require "rubygems" to the top of all Ruby source files
08:55 🔗 yipdw I don't like to do that for various reasons but it will ensure Rubygems is loaded
08:55 🔗 yipdw (the main reason is that Ruby programs should not have any dependency on a specific package manager)
08:56 🔗 bsmith093 /usr/lib/ruby/1.8/rubygems/custom_require.rb:31:in `gem_original_require': no such file to load -- /home/ben/1577729/url_generators (LoadError)
08:56 🔗 bsmith093 from /usr/lib/ruby/1.8/rubygems/custom_require.rb:31:in `require'
08:56 🔗 bsmith093 ran for a sec then that
08:56 🔗 yipdw you'll need that file from the gist, too
08:56 🔗 yipdw oh it's not in there
08:56 🔗 yipdw https://gist.github.com/1577729#file_url_generators.rb
08:56 🔗 Wyatt|Wor yipdw: That's the most salient argument I think I've heard against gem from a non-Debian/Gentoo developer.
08:57 🔗 yipdw Wyatt|Wor: heh
08:57 🔗 yipdw Wyatt|Wor: yeah, it's largely a theoretical argument but that is a good point
08:57 🔗 yipdw ruby programs shouldn't break just because you installed a library via Rubygems or apt-get or whatever
08:59 🔗 Wyatt|Wor yipdw: Oh I wouldn't say it's theoretical. If our experience in buying a couple "rails hosting" brands is any indication it's more like...a tsunami of ass-pain.
08:59 🔗 Wyatt|Wor See also: flameeyes adventures in gem packaging.
08:59 🔗 yipdw I feel really bad for people who have to package Ruby gems
09:00 🔗 yipdw gems move really, really freaking fast
09:00 🔗 bsmith093 ok actual code error this time /home/ben/.gem/ruby/1.8/gems/mechanize-2.1/lib/mechanize/http/agent.rb:303:in `fetch': 404 => Net::HTTPNotFound (Mechanize::ResponseCodeError) from /home/ben/.gem/ruby/1.8/gems/mechanize-2.1/lib/mechanize.rb:319:in `get' from make_story_urls.rb:16
09:00 🔗 Wyatt|Wor Kind of. Sometimes.
09:00 🔗 yipdw bsmith093: yeah, that script has no graceful error handling at all
09:00 🔗 yipdw but uh
09:00 🔗 yipdw are you sure you passed a valid story ID as the first argument
09:00 🔗 yipdw to get_one_story
09:01 🔗 bsmith093 oh, um i was running "ruby make_story_urls.rb" errr, whoops :D
09:02 🔗 yipdw Wyatt|Wor: actually, for most languages I work with -- python, ruby, occasionally haskell and node -- I've actually begun to not use the OS' package manager
09:02 🔗 yipdw and have been instead using easy_install, rubygems, cabal, npm
09:02 🔗 yipdw it's way more complex and makes my package manifest incomplete, but there's so many other people who just publish libraries in those languages in their specific package managers
09:03 🔗 yipdw which is quite a bit of inertia to overcome
09:03 🔗 yipdw the only language I can think of that I work in and use the distribution's packages is C/C++
09:03 🔗 yipdw and that's not entirely true for things like Qt :P
09:03 🔗 Wyatt|Wor Not familiar with the latter two, but python eggs have a lot of the same issues as gems, as far as I'm aware.
09:04 🔗 Wyatt|Wor At least CPAN gets it right~
09:04 🔗 yipdw I think I'll become rich and famous if I find a way to encapsulate a gem/egg/whatever as a deb or whatever
09:04 🔗 bsmith093 short version could not find gem custom_require locally or in a repository
09:04 🔗 bsmith093 YES YOU WILL, fantastically so
09:04 🔗 Wyatt|Wor Just formalise package metadata about the gem to the extent perl does and you can.
09:05 🔗 yipdw Wyatt|Wor: what does CPAN do? I've just used perl -MCPAN -e 'install ...'
09:05 🔗 yipdw is there a way to do it that doesn't involve doing that
09:05 🔗 yipdw or, more specifically, respects the OS' package management system
09:05 🔗 Wyatt|Wor yipdw: cpan itself is just software. It's all because they have a good packaging format that we can have things like g-cpan.
09:05 🔗 yipdw ahh
09:06 🔗 yipdw actually, that reminds me
09:06 🔗 yipdw the source code that drives rubygems.org is available
09:07 🔗 yipdw perhaps it is feasible to add a service endpoint to it that makes it behave as an apt repo
09:08 🔗 Wyatt|Wor BTW, here's the horse's mouth on the subject: http://blog.flameeyes.eu/2008/12/14/rubygems-cpan-and-other-languages
09:11 🔗 yipdw ahh
09:11 🔗 yipdw yeah, I agree with all of those points
09:11 🔗 yipdw there has been *some* success on the standardization front though
09:11 🔗 yipdw namely, running "rake" in an increasing number of projects runs the testsuite
09:11 🔗 yipdw regardless of test harness
09:12 🔗 yipdw but, yes, the file format of gems is scattersht
09:12 🔗 yipdw shot
09:17 🔗 Wyatt|Wor Ergh, yeah, If last week's tirade about mongo_mapper is any indication.
09:17 🔗 Wyatt|Wor Well a couple weeks, I guess.
09:18 🔗 yipdw oh, I had no idea mongo_mapper sucked that bad
09:19 🔗 Wyatt|Wor Oh, did you read his post about it?
09:19 🔗 yipdw yeah
09:20 🔗 yipdw I also realized that the gems I maintain do not include test files or a Rakefile in their gem form
09:20 🔗 yipdw under the rationale that tests and build process are useful only to a developer
09:20 🔗 yipdw I'll have to change that
09:20 🔗 yipdw somehow it didn't click that someone might want to use the *.gem and repackage it in a package manager that does things like run tests
09:21 🔗 Wyatt|Wor Hehe, yeah. The most recent post is a semi-continuation of the mongo_mapper post, too. This happens every couple months, or so, btw
09:22 🔗 yipdw I'm surprised he's stuck with it
09:22 🔗 yipdw (I didn't :P)
09:22 🔗 yipdw try to get gems to play nice with the package manager that is
09:22 🔗 Wyatt|Wor And thanks! I'm not a Ruby user, personally, but I'm always thankful when release engineering is improved.
09:23 🔗 Wyatt|Wor Yeah, he _really_ loves him some ruby
09:23 🔗 yipdw yeah, no problem
09:23 🔗 yipdw thanks for pointing out flameeyes' blog
09:24 🔗 yipdw I'll follow it, as he is the first person I've seen who is still sticking with it
09:24 🔗 yipdw most other people I know who do Ruby use rvm + bundler to just throw all of an application's dependencies into a directory
09:24 🔗 yipdw I mean, it works, and it isolates things
09:24 🔗 yipdw but it is very heavy
09:25 🔗 yipdw it makes sense on systems that don't really try to define their system configuration in terms of packages
09:25 🔗 yipdw like Windows, OS X
09:25 🔗 Wyatt|Wor RVM is kind of neat for developers.
09:26 🔗 Wyatt|Wor But it's a nightmare for our setup.
09:26 🔗 Wyatt|Wor (Speaking of dependency hell, http://blog.flameeyes.eu/files/bones-dependencies-graph.png)
09:28 🔗 yipdw haha what
09:28 🔗 yipdw oh bones
09:28 🔗 yipdw ugh
09:28 🔗 yipdw I do not like bones, jeweler, hoe
09:29 🔗 yipdw they make the process of making a gem so ridiculously complex
09:29 🔗 Wyatt|Wor Apparently they make the packaging difficult, too
09:29 🔗 yipdw actually, the gem command in bundler is very minimal and seems to do it bets
09:29 🔗 yipdw best
09:35 🔗 yipdw oh!
09:35 🔗 yipdw fanfiction.net pages include the canonical URL
09:35 🔗 yipdw badass
09:42 🔗 Wyatt|Wor They generate a lot of stuff into their pages, as I recall it.
09:42 🔗 yipdw yeah, really helps with retrieval
09:52 🔗 yipdw "To modrenaissancewoman: Thank you for pointing that out. I thought French kissing is the one where friends give each other on their cheeks. My mistakes."
09:52 🔗 yipdw whoops
09:52 🔗 Wyatt|Wor A common mistake.
09:52 🔗 yipdw yeah, but the implications are funny
09:53 🔗 Wyatt|Wor lol, was joking.
09:53 🔗 bsmith093 ruby get_one_story.rb http://www.fanfiction.net/s/4/1/get_one_story.rb:11: warning: already initialized constant VERSION get_one_story.rb:11: command not found: ./make_story_urls.rb http://www.fanfiction.net/s/4/1/ get_one_story.rb:26:in `initialize': No such file or directory - /home/ben/1577729/data/h/ht/htt/http://www.fanfiction.net/s/4/1//http://www.fanfiction.net/s/4/1/_urls (Errno::ENOENT) from get_one_story.rb:26:in `open
09:53 🔗 yipdw bsmith093: it's just the ID, not the full URL
09:53 🔗 bsmith093 /home/ben/1577729/wget-warc -U 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_8) AppleWebKit/535.2 (KHTML, like Gecko) Chrome/15.0.874.54 Safari/535.2' -o /home/ben/1577729/data/4/4/4/4/4.log -e 'robots=off' --warc-file=/home/ben/1577729/data/4/4/4/4/4 --warc-max-size=inf --warc-header='operator: Archive Team' --warc-header='ff-download-script-version: 20120108.01' -nd -nv --no-timestamping --page-requisites -i /home/ben/1577729/d
09:53 🔗 bsmith093 get_one_story.rb:11: command not found: ./make_story_urls.rb 4
09:53 🔗 bsmith093 get_one_story.rb:11: warning: already initialized constant VERSION
09:53 🔗 bsmith093 ruby get_one_story.rb 4
09:53 🔗 bsmith093 sh: /home/ben/1577729/wget-warc: not found
09:53 🔗 bsmith093 that, then
09:53 🔗 yipdw you need wget-warc
09:54 🔗 yipdw or some wget that does WARC
09:54 🔗 yipdw adjust the WGET_WARC constant as required
09:54 🔗 bsmith093 oy right hold on
09:56 🔗 bsmith093 whats gnutls and do i want wgetwarc compiled with it
09:57 🔗 bsmith093 get_one_story.rb:11: command not found: ./make_story_urls.rb 4
09:57 🔗 yipdw man, fanfiction.net really does not want to be archived
09:57 🔗 yipdw in addition to their robots.txt file there's a ROBOTS=NOARCHIVE meta tag in every generated output
09:58 🔗 yipdw I feel bad doing this
09:58 🔗 bsmith093 well it is technically against the tos not that i care
09:58 🔗 Wyatt|Wor yipdw: Yeah, I mentioned that a while back, I think.
09:58 🔗 yipdw hm
09:58 🔗 yipdw yeah
09:58 🔗 Wyatt|Wor bsmith093: Oh dear, I might get the account I don't have banned.
09:58 🔗 yipdw I think at this point I'll just stop
09:59 🔗 * chronomex slides in
09:59 🔗 yipdw I mean, yes, I understand the point of archiving this, but on the other hand ignoring all of those signs is really shitty netizen behavior
10:00 🔗 chronomex shitty netizen on one hand, but fanfic people on the other hand
10:00 🔗 chronomex the noarchive bullshit is just ff.n trying to force the internet to depend on its continued existence
10:01 🔗 Wyatt|Wor I think that's debatable. It's not very good netizenship to put yourself in a position where millions of users' work could just disappear, either.
10:01 🔗 yipdw right
10:01 🔗 chronomex I know that the existence of public logs is going to cause me to regret saying so eventually, but fuck that shit.
10:01 🔗 yipdw a moral quandrary
10:01 🔗 Wyatt|Wor History lasts longer than any one website.
10:02 🔗 bsmith093 story_page = agent.get(UrlGenerators::STORY_URL[sid, '']) this line in make_story_urls is throwing an error
10:02 🔗 Wyatt|Wor Which is about as weird a way as I could have found to express that.
10:02 🔗 chronomex in my experience, fanfic people can be rabidly anti-archivism, and I have no idea why -- especially because all the fannish people I've met save webpages religiously
10:03 🔗 Wyatt|Wor And they don't tend to keep backups.
10:03 🔗 Wyatt|Wor Of their own stuff, at least.
10:03 🔗 chronomex well, maybe.
10:03 🔗 yipdw chronomex: http://ansuz.sooke.bc.ca/entry/35 is one theory
10:03 🔗 bsmith093 the its my story, and ill kill it if i want to line of thought
10:03 🔗 yipdw bsmith093: there's more to it than that
10:03 🔗 chronomex bsmith093: yes, that. exactly.
10:04 🔗 bsmith093 /home/ben/.gem/ruby/1.8/gems/mechanize-2.1/lib/mechanize/http/agent.rb:303:in `fetch': 404 => Net::HTTPNotFound (Mechanize::ResponseCodeError)
10:04 🔗 bsmith093 ben@ben-laptop:~/1577729$ ruby make_story_urls.rb
10:04 🔗 bsmith093 from /home/ben/.gem/ruby/1.8/gems/mechanize-2.1/lib/mechanize.rb:319:in `get'
10:04 🔗 bsmith093 from make_story_urls.rb:16
10:04 🔗 bsmith093 siorry forgot to sump line breaks
10:04 🔗 yipdw a lot of fandoms are actually very sensitive to the legal complications surrounding their fandom
10:05 🔗 yipdw bsmith093: make_story_urls is meant to be called from get_one_story, and it requires a story ID
10:05 🔗 bsmith093 oy well that explains it
10:05 🔗 bsmith093 ruby make_story_urls.rb 4
10:06 🔗 bsmith093 worked perfectly
10:08 🔗 yipdw lol wtf
10:08 🔗 yipdw http://b.fanfiction.net/static/styles/fanfiction42.css
10:08 🔗 yipdw I do not know how the fuck that is coming back
10:09 🔗 yipdw if I get that with curl, I get gzipped CSS (?!)
10:09 🔗 yipdw if I get that with Chrome, I get an HTML page that has the CSS between <pre> tags
10:09 🔗 Wyatt|Wor yipdw: gzipped CSS!?
10:09 🔗 yipdw and I mean it's gzipped CSS, not merely sent with Content-Encoding: gzip and compressed by the server
10:09 🔗 yipdw Wyatt|Wor: yeah, try it
10:10 🔗 Wyatt|Wor I...
10:10 🔗 Wyatt|Wor What.
10:10 🔗 yipdw I am amazed that works
10:11 🔗 chronomex no <pre> tags in opera
10:11 🔗 yipdw oh
10:11 🔗 yipdw that might just be the web inspector
10:11 🔗 chronomex are you viewing-source in chrome?
10:11 🔗 yipdw I am now
10:11 🔗 yipdw and yeah, that appears fine
10:12 🔗 yipdw but that is so weird
10:12 🔗 bsmith093 quick thing i have a list of id numbers in a file, and they work individually, but the autogeneration part of the scrupt seems to be tripping over itself
10:13 🔗 bsmith093 could you just package the id list into the repo
10:13 🔗 yipdw well, wait
10:13 🔗 yipdw it IS sent with Content-Encoding: gzip
10:13 🔗 yipdw so I guess that's valid
10:13 🔗 Wyatt|Wor Huh, interesting.
10:13 🔗 yipdw I expected curl to inflate the stream, though
10:13 🔗 yipdw to say nothing of wget
10:14 🔗 yipdw are they gzipping gzipped data?
10:14 🔗 chronomex Content-Encoding: gzip
10:14 🔗 yipdw right
10:14 🔗 yipdw I thought curl/wget would be able to handle that by inflating the stream
10:14 🔗 chronomex it is single-gzipped
10:14 🔗 chronomex (curl | gunzip) --> plaintext
10:15 🔗 yipdw yeah, that works
10:17 🔗 yipdw ohh
10:17 🔗 yipdw b.fanfiction.net sends that regardless of Accept-Encoding
10:17 🔗 yipdw that's...broken
10:21 🔗 yipdw I guess we just need to download and gunzip that separately
10:21 🔗 yipdw or something
10:21 🔗 yipdw tricksy
10:37 🔗 bsmith093 well its 536am est so being in ny, im going to bed, keep the repo updated, ciao, night | morning depending on timezones
10:43 🔗 chronomex nite
11:46 🔗 SketchCow And here I am!!
11:46 🔗 SketchCow Packing the car up
11:46 🔗 Wyatt|Wor Gah, I thought dotwizards.com would be some cool Japanese pixel art site. Alas, corporate coaching.
11:47 🔗 Wyatt|Wor SketchCow: Ah, have a good Magfest?
11:47 🔗 SketchCow I had a very good magfest.
11:47 🔗 Wyatt|Wor Awesome. That couple with the arcade sounds like it's going to be an awesome...err, episode?
11:49 🔗 SketchCow Just more filming stuff
11:49 🔗 SketchCow But yeah, I like them a lot.
11:50 🔗 SketchCow http://www.facebook.com/SavePointMD
11:51 🔗 Wyatt|Wor How much do you think Arcade will cover in terms of pinball's role in arcades? I mean, yeah there's Tilt! (which I need to get a copy of, come to think of it), but I'm a fanatic. ;)
12:54 🔗 SketchCow Good question, no answer.
14:42 🔗 SketchCow can someoneca
14:43 🔗 SketchCow hey
14:44 🔗 SketchCow underscor gave a rough address in here a ways back. google link. can someone tell ot to me?
14:45 🔗 underscor http://g.co/maps/zge32
14:45 🔗 SketchCow on phone, keyboard fuckery limited.
14:45 🔗 SketchCow just give me the location, kid
14:45 🔗 underscor it's grassy knoll ct woodbridge, va 22193
14:45 🔗 SketchCow om
14:46 🔗 PatC Is there a nickserv here?
14:46 🔗 underscor no
14:46 🔗 underscor no services on efnet
14:46 🔗 underscor Besides chanfix
14:46 🔗 PatC ok
14:47 🔗 underscor SketchCow: does this mean you'll be here in like 45 minutes?
14:47 🔗 underscor or are you just planning ahead
14:47 🔗 SketchCow may e
14:47 🔗 underscor oh
14:47 🔗 underscor damn, we have church at 11
14:48 🔗 SketchCow see soon!
14:48 🔗 SketchCow when do you get back?
14:49 🔗 SketchCow no rush.
14:49 🔗 SketchCow no ticking off family.
14:49 🔗 underscor 1:15ish
14:49 🔗 underscor hahah
14:51 🔗 SketchCow see you around then.
20:07 🔗 closure Someone posted some data to usenet in 1982 and I made a visualization of it today. http://olduse.net/blog/current_usenet_map/ fun collaboration :)
20:14 🔗 closure I especially like the tall doubly linked list of systems at the bottom. we don't build networks like that anymore.
20:15 🔗 nitro2k01 Token Link?
20:15 🔗 nitro2k01 Token Ring rather
20:15 🔗 nitro2k01 Kill me!
20:17 🔗 closure could be token ring, more likely it was a dozen systems talking over 300 baud dialup
20:18 🔗 closure hmm, actually, token ring seems to be 1985 or so, not 1982
20:18 🔗 nitro2k01 Seems like an expensive way of conencting
20:18 🔗 nitro2k01 If the middle box needs to reach out, it needs to rely on a bunch of telephone lines
20:19 🔗 closure and it probably takes it *days* to get new traffic
20:19 🔗 nitro2k01 Damn (whoever) for not providing more metadata
20:20 🔗 closure yeah, I hope for a future dataset with more info
20:20 🔗 nitro2k01 Also, why the double arrows everywhere?
20:20 🔗 nitro2k01 Seems like they don'tprovide additional information
20:20 🔗 closure (of course, telehack.org has a newer, much more extensive uucp map they use in their simulation)
20:21 🔗 closure bidirectional links, each system could call the other
20:21 🔗 nitro2k01 Right. But this applied to ALL of the links?
20:21 🔗 nitro2k01 Except the wormhole :p
20:22 🔗 closure according to Mark, it did, yes
20:22 🔗 nitro2k01 Wait, look at eagle and mhux*
20:23 🔗 nitro2k01 Multiple links
20:23 🔗 nitro2k01 mhuxj -> eagle *2
20:23 🔗 nitro2k01 mhuxj <-> mhuxm *2
20:23 🔗 closure yeah, I've been fixing a few that he doubled
20:24 🔗 nitro2k01 Oh, so that's not even useful data? ._.
20:25 🔗 closure well, look at the original post :P
20:25 🔗 closure it was like a bunch of badly formatted lines from 1982
20:25 🔗 nitro2k01 But that's like text and stuff
20:25 🔗 nitro2k01 I can't read text
20:25 🔗 closure hahahaha
20:26 🔗 nitro2k01 Like, if someone would send me a link to textfiles.com
20:26 🔗 nitro2k01 I'd be lost
20:27 🔗 closure this is why I thought a graphical map would be nice.. I personally prefer the handdrawn ascii ones below it though
20:27 🔗 nitro2k01 In fact, if I didn't have tghis program that translated IRC messages to pictures of fruit, I couldn't have this conversation
20:27 🔗 closure <dingbat> <cloud> <mushroom> <teletype>
20:30 🔗 nitro2k01 http://www.textfiles.com/conspiracy/art-04.txt
20:30 🔗 nitro2k01 Just look at the second paragraph
20:31 🔗 nitro2k01 How nicely sliced it is
20:31 🔗 nitro2k01 A signle diagonal stroke
20:31 🔗 nitro2k01 Same with the first too actually
22:20 🔗 ndurner_c Hi
22:21 🔗 ndurner_c Any emergency downloads going on right now?
22:29 🔗 ndurner_c (will read the log tomorrow.. gn8)

irclogger-viewer