#archiveteam 2011-12-11,Sun

↑back Search

Time Nickname Message
00:28 🔗 Coderjoe who needs to set down standards for a parallel project?
00:31 🔗 underscor [NFSW] http://i.imgur.com/qVOmg.jpg
00:31 🔗 Coderjoe what's ns abuot that?
00:33 🔗 underscor Reread the tag
00:33 🔗 chronomex tag?
00:33 🔗 chronomex oh god damnit
00:33 🔗 underscor hahahahaha
00:36 🔗 Coderjoe argh
00:37 🔗 Coderjoe lysdexia
00:38 🔗 underscor :D
01:11 🔗 bsmith093 http://www.gsc-game.com/ is ready to be rsynced
01:14 🔗 Coderjoe the info should still be the same. just put it in a different directory on the server
01:16 🔗 bsmith093 got it
01:21 🔗 DFJustin thanks
01:22 🔗 bsmith093 I'm uploading it as a single targz, with the cdx, warc, and site folder dump, in it
01:26 🔗 bsmith093 how big is batcave, anyway, it must be enormous? multi dozens of terabytes?
01:37 🔗 Coderjoe bsmith093: SketchCow said once in one of his talks, but I forget the number he gave
01:37 🔗 Coderjoe the system also has 96GB of ram
01:46 🔗 bsmith093 !
01:46 🔗 bsmith093 Jiminy christmas, thats a lot of memory
02:42 🔗 yipdw| bsmith093: btw, if the crawler froze on you, you may be hitting frustrating issues with Ruby 1.9.3's threading capabilities
02:43 🔗 yipdw| I suggest a Ruby implementation that gets threads more right, like JRuby or Rubinius
02:43 🔗 bsmith093 how do i get this to use that?
02:44 🔗 bsmith093 at this point, i only vaguely understand the code, and how it does wha its doing
02:45 🔗 yipdw| rvm install jruby, rvm use jruby, bundle install, run it
02:45 🔗 yipdw| also fetching latest would help
02:46 🔗 bsmith093 ah now thats concise, thanks :) a lot, seriously, it took me forever to figure out rvm
02:46 🔗 yipdw| rvm help
02:46 🔗 bsmith093 rvm fetch latest?
02:46 🔗 yipdw| fetching latest version of the crawler, I mean
02:47 🔗 bsmith093 oh right
02:48 🔗 yipdw| the crawler has been adjusted to cache data for 3 days
02:48 🔗 bsmith093 hey it updated since several hours ago!
02:48 🔗 yipdw| because although their pages change frequently, the stories themselves do not
02:48 🔗 yipdw| the things that update are (I think) things like hit count
02:48 🔗 yipdw| which messes up the Last-Modified header
02:48 🔗 yipdw| so I made a guess
02:49 🔗 bsmith093 oh so the last modified date, string shceker wont do dupes?
02:49 🔗 yipdw| the Last-Modified date is stored and is sent with subsequent requests as an If-Modified-Since header
02:49 🔗 yipdw| if the page hasn't changed since that date, their server returns a 304
02:49 🔗 yipdw| and the cached result is used
02:51 🔗 bsmith093 where are all these headers stored, server, or in the pages themselves, or what?
02:51 🔗 yipdw| Redis
02:52 🔗 yipdw| more specifically, just the Last-Modified date and a cache flag
02:53 🔗 Coderjoe do you also check for etags? (does their server even use etags?)
02:54 🔗 yipdw| :P
02:54 🔗 yipdw| no, and no
02:55 🔗 bsmith093 i meant does any given webpage store its metadata, like the last modified date? within itself, or do you ask the server?
02:55 🔗 yipdw| the headers are present in an HTTP request
02:55 🔗 yipdw| they can be generated by a combination of the Web server and the Web application
02:56 🔗 bsmith093 ohhh, yeah we just got to wireshark in my networking class, and omg there is a lot of info per packet, most of which is metadata
02:57 🔗 Coderjoe a little higher than that
02:57 🔗 yipdw| Coderjoe: honestly I'm not sure what the point of checking both Last-Modified and ETag is
02:57 🔗 bsmith093 this must be why the next class is web apps
02:58 🔗 yipdw| check one or the other, but both seems like a "what if the application is wrong" scenario
02:59 🔗 Coderjoe more for if the server didn't return LM but did return an etag
02:59 🔗 yipdw| oh
02:59 🔗 yipdw| no, they seem to always return Last-Modified
02:59 🔗 yipdw| although the value is not useful
02:59 🔗 yipdw| for story archiving purposes
03:00 🔗 yipdw| correction, it's not *that* useful
03:00 🔗 bsmith093 Building Nailgun jruby-1.6.5 - #installing to /home/ben/.rvm/rubies/jruby-1.6.5 jruby-1.6.5 - #importing default gemsets (/home/ben/.rvm/gemsets/ Copying across included gems ERROR: While executing gem ... (URI::InvalidURIError) bad URI(is not URI?): http://localhost:4001
03:01 🔗 yipdw| I've never seen that before
03:01 🔗 bsmith093 update rvm?
03:01 🔗 yipdw| I don't know why rubygems is trying to hit localhost:4001
03:04 🔗 bsmith093 i just checked the link in ff8, it took me to a , what looked like an internal error oage, which would make sense, but it had external links to jodymym,xom
03:04 🔗 bsmith093 https://anonymous-proxy-servers.net/, well this, anyway
03:05 🔗 yipdw| uh
03:05 🔗 yipdw| localhost is your machine
03:05 🔗 yipdw| if you're seeing that from going to localhost:4001, then you may be running something weird
03:08 🔗 bsmith093 i just checked, apparently it was some weird proxy package id installed and forgotten about, anon-proxy
03:10 🔗 bsmith093 purged it and i reran and i get this Using /home/ben/.rvm/gems/jruby-1.6.5
03:10 🔗 bsmith093 /usr/local/lib/site_ruby/1.8/rubygems/dependency.rb:247:in `to_specs': Could not find bundler (>= 0) amongst [] (Gem::LoadError)
03:10 🔗 bsmith093 from /usr/bin/bundle:18
03:10 🔗 bsmith093 from /usr/local/lib/site_ruby/1.8/rubygems.rb:1203:in `gem'
03:10 🔗 bsmith093 from /usr/local/lib/site_ruby/1.8/rubygems/dependency.rb:256:in `to_spec'
03:12 🔗 yipdw| gem install bundler
03:12 🔗 bsmith093 localhost 4001 bad uri error
03:13 🔗 bsmith093 4001 isn't even in my services file
03:14 🔗 yipdw| paste me the result of gem env
03:15 🔗 bsmith093 - EXECUTABLE DIRECTORY: /home/ben/.rvm/gems/jruby-1.6.5/bin
03:15 🔗 bsmith093 - INSTALLATION DIRECTORY: /home/ben/.rvm/gems/jruby-1.6.5
03:15 🔗 bsmith093 - RUBY EXECUTABLE: /home/ben/.rvm/rubies/jruby-1.6.5/bin/jruby
03:15 🔗 bsmith093 - RUBY VERSION: 1.8.7 (2011-10-25 patchlevel 330) [java]
03:15 🔗 bsmith093 - RUBYGEMS PLATFORMS:
03:15 🔗 bsmith093 - RUBYGEMS VERSION: 1.8.9
03:15 🔗 bsmith093 RubyGems Environment:
03:15 🔗 bsmith093 - ruby
03:15 🔗 bsmith093 - universal-java-1.6
03:15 🔗 bsmith093 - GEM PATHS:
03:15 🔗 bsmith093 - /home/ben/.rvm/gems/jruby-1.6.5
03:15 🔗 bsmith093 - /home/ben/.rvm/gems/jruby-1.6.5@global
03:15 🔗 bsmith093 - :update_sources => true
03:15 🔗 bsmith093 - GEM CONFIGURATION:
03:15 🔗 bsmith093 ok sorry should have used pastebin
03:17 🔗 bsmith093 http://pastebin.com/dpQY4HVw
03:19 🔗 yipdw| I'm not sure why that's resolving as it is
03:19 🔗 yipdw| evidently rubygems.org on your machine is now resolving to localhost:4001
03:19 🔗 yipdw| you'll have to fix that
03:22 🔗 bsmith093 works fine in firefox, where do i look?
03:23 🔗 bsmith093 to fix it locally
03:23 🔗 bsmith093 should i just reinstall rvm?
03:57 🔗 dnova wiki down?
03:57 🔗 dnova oh there it is.
05:50 🔗 SketchCow Still fixing french magazines.
12:19 🔗 Schbirid somewhere there is a website with old computer system character maps
12:19 🔗 Schbirid anyone know the url?
12:19 🔗 chronomex are you looking for one in particular?
12:20 🔗 Schbirid nope, looking for the site itself
12:20 🔗 Schbirid had the maps on the right site iirc
12:20 🔗 Schbirid nav on the left
12:21 🔗 chronomex oh yeah, that one
12:21 🔗 chronomex with the text
12:21 🔗 Schbirid dark background
12:21 🔗 Schbirid haha
12:21 🔗 chronomex what color was it?
12:21 🔗 dnova lol
12:21 🔗 chronomex there was bl
12:21 🔗 chronomex there was black in it
12:21 🔗 Schbirid hey, i got a memory that works like that
12:21 🔗 dnova I found it: http://www.compukiss.com/basics/symbols-character-map-2.html
12:21 🔗 dnova compu-KISS
12:22 🔗 dnova with Sandy Berger
12:22 🔗 Schbirid i was wrong but i found it http://damieng.com/blog/2011/03/27/typography-in-16-bits-system-fonts
12:22 🔗 chronomex dang, you gots some search engine skillz
12:22 🔗 chronomex where is the google option for "mostly black site"?
12:23 🔗 Schbirid calm down
20:05 🔗 PepsiMax I still have anyhub data, where can I drop it? alard his rsync doesn't seem to work anymore.
20:14 🔗 SketchCow HEY SO
20:15 🔗 SketchCow If anone has any other caches of magazines they find online, would love to find them.
20:15 🔗 SketchCow French is temporarily fired.
20:19 🔗 * DFJustin 's ears perk up
20:20 🔗 DFJustin hungarian http://pcvilag.muskatli.hu/irodalom/begins/news.index.html
20:20 🔗 DFJustin spanish http://www.konamito.com/publicaciones-msx/
20:21 🔗 DFJustin various www.retromags.com
20:23 🔗 DFJustin one-offs http://electrickery.xs4all.nl/comp/dai/doc/ http://www.apple2online.com/index.php?p=1_65_Apple-IIGS-Buyer-s-Guide http://www.apple2online.com/index.php?p=1_70_inCider-Magazine http://www.apple2online.com/index.php?p=1_53_Newsletters http://bitsavers.org/pdf/creativeSolutions/ http://bitsavers.org/pdf/ti/ti-mix/
20:25 🔗 DFJustin http://www.oldgamemags.com/index.php?title=Magazine_Index
20:32 🔗 DFJustin another japanese magazine http://narod.ru/disk/23643150000/Super%20soft%20magazine.rar.html archive password is retropc98.narod.ru
20:33 🔗 DFJustin the filename encoding inside is hosed though fyi
20:56 🔗 SketchCow I'd like ones with sets of jpegs and PDFs, please.
20:56 🔗 SketchCow Also, just e-mail me, jason@textfiles.com, no need to fill the channel.
20:57 🔗 DFJustin hehe
20:57 🔗 SketchCow It was not enjoyable, pulling down 1,000 issues of french magazines. I would like to replace them.
20:57 🔗 SketchCow And when I say pulling down, they're in the archive forever.
20:58 🔗 DFJustin those should all be jpg/pdf sets
21:01 🔗 alard PepsiMax: I thought you had started uploading to batcave? Should I switch the rsync back on? How much more do you have?
21:10 🔗 PepsiMax Let me see
21:15 🔗 PepsiMax alard: 42G /mnt/extdisk/archiveteam/anyhub-grab/
21:18 🔗 alard PepsiMax: I have 48G from you here, so perhaps part of that is already sent?
21:18 🔗 alard (I reopened the rsync port, by the way.)
21:19 🔗 alard It's a bit weird to see that AnyHub is back, by the way.
21:21 🔗 underscor Yeah, I noticed that too
21:21 🔗 alard We should make a rule for that: a site can only die once.
21:22 🔗 alard We can't keep archiving them.
21:25 🔗 alard I mean: take Google Video. You go through all the trouble to archive it, and then it doesn't die. That's not fair, is it?
21:29 🔗 PepsiMax So, well, yeah I noticed
21:29 🔗 SketchCow That's fine.
21:29 🔗 PepsiMax I would nother bother acctally
21:29 🔗 SketchCow I mean, we cause some of these to happen.
21:29 🔗 SketchCow I am fine with "Oh yeah, fuck you, a second option has arrived" followed by "Oh, well, then."
21:29 🔗 SketchCow Which is what happened in both those cases, anyhub and google video
21:30 🔗 SketchCow We should take pride in that, the two choices aren't the site stays up or it's deleted.
21:31 🔗 PepsiMax OK, true dat. alard:
21:31 🔗 PepsiMax alard: I'm sending what you don't have yet.
22:42 🔗 chronomex alard: I'm still getting splinder, what is this about closing rsync?
23:10 🔗 dashcloud this is pretty interesting: http://blogs.loc.gov/digitalpreservation/2011/12/providing-access-to-70-million-copyright-records/
23:17 🔗 Paradoks So they're digitizing their card catalog for 1870 to 1977. Does this mean that 1977-present is already digital? Regardless, yes, interesting.
23:17 🔗 chronomex yes. the feds had a big digital push in the 1970s.
23:19 🔗 dashcloud registration wasn't required after 1977
23:19 🔗 dashcloud every work was assumed to be in copyright unless the author specifically disclaimed it
23:20 🔗 dashcloud you could register it, which gave you a boost in case you ever needed to present it in a courtroom
23:21 🔗 dashcloud what would be interesting is seeing if there are any previously-unknown public domain works to be found in that catalog
23:37 🔗 chronomex ah, yeah.
23:52 🔗 yipdw son of a
23:52 🔗 yipdw ERROR (3).
23:52 🔗 yipdw Error downloading 'it:Redazione'.
23:52 🔗 yipdw -rw-rw-r-- 1 ec2-user ec2-user 1216211055 Dec 10 04:38 splinder.com-Redazione-blog-journal.splinder.com.warc.gz
23:52 🔗 chronomex ffffuuuuuu
23:53 🔗 yipdw oh wtf
23:53 🔗 yipdw there's errors like this
23:53 🔗 yipdw Cannot write to `./tmpfs/it/Redazione/www.splinder.com/search/profile?from=480&i=la + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +musica' (File name too long).
23:53 🔗 yipdw you know what, I'm just going to upload what I've got of Redazione
23:53 🔗 yipdw I suspect that there is no way to actually archive it without errors at this point
23:54 🔗 chronomex hahaha wtf
23:54 🔗 chronomex nice filename
23:59 🔗 bsmith093 no one will miss one, what looks like a search page

irclogger-viewer