[00:28] who needs to set down standards for a parallel project? [00:31] [NFSW] http://i.imgur.com/qVOmg.jpg [00:31] what's ns abuot that? [00:33] Reread the tag [00:33] tag? [00:33] oh god damnit [00:33] hahahahaha [00:36] argh [00:37] lysdexia [00:38] :D [01:11] http://www.gsc-game.com/ is ready to be rsynced [01:14] the info should still be the same. just put it in a different directory on the server [01:16] got it [01:21] thanks [01:22] I'm uploading it as a single targz, with the cdx, warc, and site folder dump, in it [01:26] how big is batcave, anyway, it must be enormous? multi dozens of terabytes? [01:37] bsmith093: SketchCow said once in one of his talks, but I forget the number he gave [01:37] the system also has 96GB of ram [01:46] ! [01:46] Jiminy christmas, thats a lot of memory [02:42] bsmith093: btw, if the crawler froze on you, you may be hitting frustrating issues with Ruby 1.9.3's threading capabilities [02:43] I suggest a Ruby implementation that gets threads more right, like JRuby or Rubinius [02:43] how do i get this to use that? [02:44] at this point, i only vaguely understand the code, and how it does wha its doing [02:45] rvm install jruby, rvm use jruby, bundle install, run it [02:45] also fetching latest would help [02:46] ah now thats concise, thanks :) a lot, seriously, it took me forever to figure out rvm [02:46] rvm help [02:46] rvm fetch latest? [02:46] fetching latest version of the crawler, I mean [02:47] oh right [02:48] the crawler has been adjusted to cache data for 3 days [02:48] hey it updated since several hours ago! [02:48] because although their pages change frequently, the stories themselves do not [02:48] the things that update are (I think) things like hit count [02:48] which messes up the Last-Modified header [02:48] so I made a guess [02:49] oh so the last modified date, string shceker wont do dupes? [02:49] the Last-Modified date is stored and is sent with subsequent requests as an If-Modified-Since header [02:49] if the page hasn't changed since that date, their server returns a 304 [02:49] and the cached result is used [02:51] where are all these headers stored, server, or in the pages themselves, or what? [02:51] Redis [02:52] more specifically, just the Last-Modified date and a cache flag [02:53] do you also check for etags? (does their server even use etags?) [02:54] :P [02:54] no, and no [02:55] i meant does any given webpage store its metadata, like the last modified date? within itself, or do you ask the server? [02:55] the headers are present in an HTTP request [02:55] they can be generated by a combination of the Web server and the Web application [02:56] ohhh, yeah we just got to wireshark in my networking class, and omg there is a lot of info per packet, most of which is metadata [02:57] a little higher than that [02:57] Coderjoe: honestly I'm not sure what the point of checking both Last-Modified and ETag is [02:57] this must be why the next class is web apps [02:58] check one or the other, but both seems like a "what if the application is wrong" scenario [02:59] more for if the server didn't return LM but did return an etag [02:59] oh [02:59] no, they seem to always return Last-Modified [02:59] although the value is not useful [02:59] for story archiving purposes [03:00] correction, it's not *that* useful [03:00] Building Nailgun jruby-1.6.5 - #installing to /home/ben/.rvm/rubies/jruby-1.6.5 jruby-1.6.5 - #importing default gemsets (/home/ben/.rvm/gemsets/ Copying across included gems ERROR: While executing gem ... (URI::InvalidURIError) bad URI(is not URI?): http://localhost:4001 [03:01] I've never seen that before [03:01] update rvm? [03:01] I don't know why rubygems is trying to hit localhost:4001 [03:04] i just checked the link in ff8, it took me to a , what looked like an internal error oage, which would make sense, but it had external links to jodymym,xom [03:04] https://anonymous-proxy-servers.net/, well this, anyway [03:05] uh [03:05] localhost is your machine [03:05] if you're seeing that from going to localhost:4001, then you may be running something weird [03:08] i just checked, apparently it was some weird proxy package id installed and forgotten about, anon-proxy [03:10] purged it and i reran and i get this Using /home/ben/.rvm/gems/jruby-1.6.5 [03:10] /usr/local/lib/site_ruby/1.8/rubygems/dependency.rb:247:in `to_specs': Could not find bundler (>= 0) amongst [] (Gem::LoadError) [03:10] from /usr/bin/bundle:18 [03:10] from /usr/local/lib/site_ruby/1.8/rubygems.rb:1203:in `gem' [03:10] from /usr/local/lib/site_ruby/1.8/rubygems/dependency.rb:256:in `to_spec' [03:12] gem install bundler [03:12] localhost 4001 bad uri error [03:13] 4001 isn't even in my services file [03:14] paste me the result of gem env [03:15] - EXECUTABLE DIRECTORY: /home/ben/.rvm/gems/jruby-1.6.5/bin [03:15] - INSTALLATION DIRECTORY: /home/ben/.rvm/gems/jruby-1.6.5 [03:15] - RUBY EXECUTABLE: /home/ben/.rvm/rubies/jruby-1.6.5/bin/jruby [03:15] - RUBY VERSION: 1.8.7 (2011-10-25 patchlevel 330) [java] [03:15] - RUBYGEMS PLATFORMS: [03:15] - RUBYGEMS VERSION: 1.8.9 [03:15] RubyGems Environment: [03:15] - ruby [03:15] - universal-java-1.6 [03:15] - GEM PATHS: [03:15] - /home/ben/.rvm/gems/jruby-1.6.5 [03:15] - /home/ben/.rvm/gems/jruby-1.6.5@global [03:15] - :update_sources => true [03:15] - GEM CONFIGURATION: [03:15] ok sorry should have used pastebin [03:17] http://pastebin.com/dpQY4HVw [03:19] I'm not sure why that's resolving as it is [03:19] evidently rubygems.org on your machine is now resolving to localhost:4001 [03:19] you'll have to fix that [03:22] works fine in firefox, where do i look? [03:23] to fix it locally [03:23] should i just reinstall rvm? [03:57] wiki down? [03:57] oh there it is. [05:50] Still fixing french magazines. [12:19] somewhere there is a website with old computer system character maps [12:19] anyone know the url? [12:19] are you looking for one in particular? [12:20] nope, looking for the site itself [12:20] had the maps on the right site iirc [12:20] nav on the left [12:21] oh yeah, that one [12:21] with the text [12:21] dark background [12:21] haha [12:21] what color was it? [12:21] lol [12:21] there was bl [12:21] there was black in it [12:21] hey, i got a memory that works like that [12:21] I found it: http://www.compukiss.com/basics/symbols-character-map-2.html [12:21] compu-KISS [12:22] with Sandy Berger [12:22] i was wrong but i found it http://damieng.com/blog/2011/03/27/typography-in-16-bits-system-fonts [12:22] dang, you gots some search engine skillz [12:22] where is the google option for "mostly black site"? [12:23] calm down [20:05] I still have anyhub data, where can I drop it? alard his rsync doesn't seem to work anymore. [20:14] HEY SO [20:15] If anone has any other caches of magazines they find online, would love to find them. [20:15] French is temporarily fired. [20:19] * DFJustin 's ears perk up [20:20] hungarian http://pcvilag.muskatli.hu/irodalom/begins/news.index.html [20:20] spanish http://www.konamito.com/publicaciones-msx/ [20:21] various www.retromags.com [20:23] one-offs http://electrickery.xs4all.nl/comp/dai/doc/ http://www.apple2online.com/index.php?p=1_65_Apple-IIGS-Buyer-s-Guide http://www.apple2online.com/index.php?p=1_70_inCider-Magazine http://www.apple2online.com/index.php?p=1_53_Newsletters http://bitsavers.org/pdf/creativeSolutions/ http://bitsavers.org/pdf/ti/ti-mix/ [20:25] http://www.oldgamemags.com/index.php?title=Magazine_Index [20:32] another japanese magazine http://narod.ru/disk/23643150000/Super%20soft%20magazine.rar.html archive password is retropc98.narod.ru [20:33] the filename encoding inside is hosed though fyi [20:56] I'd like ones with sets of jpegs and PDFs, please. [20:56] Also, just e-mail me, jason@textfiles.com, no need to fill the channel. [20:57] hehe [20:57] It was not enjoyable, pulling down 1,000 issues of french magazines. I would like to replace them. [20:57] And when I say pulling down, they're in the archive forever. [20:58] those should all be jpg/pdf sets [21:01] PepsiMax: I thought you had started uploading to batcave? Should I switch the rsync back on? How much more do you have? [21:10] Let me see [21:15] alard: 42G /mnt/extdisk/archiveteam/anyhub-grab/ [21:18] PepsiMax: I have 48G from you here, so perhaps part of that is already sent? [21:18] (I reopened the rsync port, by the way.) [21:19] It's a bit weird to see that AnyHub is back, by the way. [21:21] Yeah, I noticed that too [21:21] We should make a rule for that: a site can only die once. [21:22] We can't keep archiving them. [21:25] I mean: take Google Video. You go through all the trouble to archive it, and then it doesn't die. That's not fair, is it? [21:29] So, well, yeah I noticed [21:29] That's fine. [21:29] I would nother bother acctally [21:29] I mean, we cause some of these to happen. [21:29] I am fine with "Oh yeah, fuck you, a second option has arrived" followed by "Oh, well, then." [21:29] Which is what happened in both those cases, anyhub and google video [21:30] We should take pride in that, the two choices aren't the site stays up or it's deleted. [21:31] OK, true dat. alard: [21:31] alard: I'm sending what you don't have yet. [22:42] alard: I'm still getting splinder, what is this about closing rsync? [23:10] this is pretty interesting: http://blogs.loc.gov/digitalpreservation/2011/12/providing-access-to-70-million-copyright-records/ [23:17] So they're digitizing their card catalog for 1870 to 1977. Does this mean that 1977-present is already digital? Regardless, yes, interesting. [23:17] yes. the feds had a big digital push in the 1970s. [23:19] registration wasn't required after 1977 [23:19] every work was assumed to be in copyright unless the author specifically disclaimed it [23:20] you could register it, which gave you a boost in case you ever needed to present it in a courtroom [23:21] what would be interesting is seeing if there are any previously-unknown public domain works to be found in that catalog [23:37] ah, yeah. [23:52] son of a [23:52] ERROR (3). [23:52] Error downloading 'it:Redazione'. [23:52] -rw-rw-r-- 1 ec2-user ec2-user 1216211055 Dec 10 04:38 splinder.com-Redazione-blog-journal.splinder.com.warc.gz [23:52] ffffuuuuuu [23:53] oh wtf [23:53] there's errors like this [23:53] Cannot write to `./tmpfs/it/Redazione/www.splinder.com/search/profile?from=480&i=la + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +musica' (File name too long). [23:53] you know what, I'm just going to upload what I've got of Redazione [23:53] I suspect that there is no way to actually archive it without errors at this point [23:54] hahaha wtf [23:54] nice filename [23:59] no one will miss one, what looks like a search page