Time |
Nickname |
Message |
00:28
🔗
|
Coderjoe |
who needs to set down standards for a parallel project? |
00:31
🔗
|
underscor |
[NFSW] http://i.imgur.com/qVOmg.jpg |
00:31
🔗
|
Coderjoe |
what's ns abuot that? |
00:33
🔗
|
underscor |
Reread the tag |
00:33
🔗
|
chronomex |
tag? |
00:33
🔗
|
chronomex |
oh god damnit |
00:33
🔗
|
underscor |
hahahahaha |
00:36
🔗
|
Coderjoe |
argh |
00:37
🔗
|
Coderjoe |
lysdexia |
00:38
🔗
|
underscor |
:D |
01:11
🔗
|
bsmith093 |
http://www.gsc-game.com/ is ready to be rsynced |
01:14
🔗
|
Coderjoe |
the info should still be the same. just put it in a different directory on the server |
01:16
🔗
|
bsmith093 |
got it |
01:21
🔗
|
DFJustin |
thanks |
01:22
🔗
|
bsmith093 |
I'm uploading it as a single targz, with the cdx, warc, and site folder dump, in it |
01:26
🔗
|
bsmith093 |
how big is batcave, anyway, it must be enormous? multi dozens of terabytes? |
01:37
🔗
|
Coderjoe |
bsmith093: SketchCow said once in one of his talks, but I forget the number he gave |
01:37
🔗
|
Coderjoe |
the system also has 96GB of ram |
01:46
🔗
|
bsmith093 |
! |
01:46
🔗
|
bsmith093 |
Jiminy christmas, thats a lot of memory |
02:42
🔗
|
yipdw| |
bsmith093: btw, if the crawler froze on you, you may be hitting frustrating issues with Ruby 1.9.3's threading capabilities |
02:43
🔗
|
yipdw| |
I suggest a Ruby implementation that gets threads more right, like JRuby or Rubinius |
02:43
🔗
|
bsmith093 |
how do i get this to use that? |
02:44
🔗
|
bsmith093 |
at this point, i only vaguely understand the code, and how it does wha its doing |
02:45
🔗
|
yipdw| |
rvm install jruby, rvm use jruby, bundle install, run it |
02:45
🔗
|
yipdw| |
also fetching latest would help |
02:46
🔗
|
bsmith093 |
ah now thats concise, thanks :) a lot, seriously, it took me forever to figure out rvm |
02:46
🔗
|
yipdw| |
rvm help |
02:46
🔗
|
bsmith093 |
rvm fetch latest? |
02:46
🔗
|
yipdw| |
fetching latest version of the crawler, I mean |
02:47
🔗
|
bsmith093 |
oh right |
02:48
🔗
|
yipdw| |
the crawler has been adjusted to cache data for 3 days |
02:48
🔗
|
bsmith093 |
hey it updated since several hours ago! |
02:48
🔗
|
yipdw| |
because although their pages change frequently, the stories themselves do not |
02:48
🔗
|
yipdw| |
the things that update are (I think) things like hit count |
02:48
🔗
|
yipdw| |
which messes up the Last-Modified header |
02:48
🔗
|
yipdw| |
so I made a guess |
02:49
🔗
|
bsmith093 |
oh so the last modified date, string shceker wont do dupes? |
02:49
🔗
|
yipdw| |
the Last-Modified date is stored and is sent with subsequent requests as an If-Modified-Since header |
02:49
🔗
|
yipdw| |
if the page hasn't changed since that date, their server returns a 304 |
02:49
🔗
|
yipdw| |
and the cached result is used |
02:51
🔗
|
bsmith093 |
where are all these headers stored, server, or in the pages themselves, or what? |
02:51
🔗
|
yipdw| |
Redis |
02:52
🔗
|
yipdw| |
more specifically, just the Last-Modified date and a cache flag |
02:53
🔗
|
Coderjoe |
do you also check for etags? (does their server even use etags?) |
02:54
🔗
|
yipdw| |
:P |
02:54
🔗
|
yipdw| |
no, and no |
02:55
🔗
|
bsmith093 |
i meant does any given webpage store its metadata, like the last modified date? within itself, or do you ask the server? |
02:55
🔗
|
yipdw| |
the headers are present in an HTTP request |
02:55
🔗
|
yipdw| |
they can be generated by a combination of the Web server and the Web application |
02:56
🔗
|
bsmith093 |
ohhh, yeah we just got to wireshark in my networking class, and omg there is a lot of info per packet, most of which is metadata |
02:57
🔗
|
Coderjoe |
a little higher than that |
02:57
🔗
|
yipdw| |
Coderjoe: honestly I'm not sure what the point of checking both Last-Modified and ETag is |
02:57
🔗
|
bsmith093 |
this must be why the next class is web apps |
02:58
🔗
|
yipdw| |
check one or the other, but both seems like a "what if the application is wrong" scenario |
02:59
🔗
|
Coderjoe |
more for if the server didn't return LM but did return an etag |
02:59
🔗
|
yipdw| |
oh |
02:59
🔗
|
yipdw| |
no, they seem to always return Last-Modified |
02:59
🔗
|
yipdw| |
although the value is not useful |
02:59
🔗
|
yipdw| |
for story archiving purposes |
03:00
🔗
|
yipdw| |
correction, it's not *that* useful |
03:00
🔗
|
bsmith093 |
Building Nailgun jruby-1.6.5 - #installing to /home/ben/.rvm/rubies/jruby-1.6.5 jruby-1.6.5 - #importing default gemsets (/home/ben/.rvm/gemsets/ Copying across included gems ERROR: While executing gem ... (URI::InvalidURIError) bad URI(is not URI?): http://localhost:4001 |
03:01
🔗
|
yipdw| |
I've never seen that before |
03:01
🔗
|
bsmith093 |
update rvm? |
03:01
🔗
|
yipdw| |
I don't know why rubygems is trying to hit localhost:4001 |
03:04
🔗
|
bsmith093 |
i just checked the link in ff8, it took me to a , what looked like an internal error oage, which would make sense, but it had external links to jodymym,xom |
03:04
🔗
|
bsmith093 |
https://anonymous-proxy-servers.net/, well this, anyway |
03:05
🔗
|
yipdw| |
uh |
03:05
🔗
|
yipdw| |
localhost is your machine |
03:05
🔗
|
yipdw| |
if you're seeing that from going to localhost:4001, then you may be running something weird |
03:08
🔗
|
bsmith093 |
i just checked, apparently it was some weird proxy package id installed and forgotten about, anon-proxy |
03:10
🔗
|
bsmith093 |
purged it and i reran and i get this Using /home/ben/.rvm/gems/jruby-1.6.5 |
03:10
🔗
|
bsmith093 |
/usr/local/lib/site_ruby/1.8/rubygems/dependency.rb:247:in `to_specs': Could not find bundler (>= 0) amongst [] (Gem::LoadError) |
03:10
🔗
|
bsmith093 |
from /usr/bin/bundle:18 |
03:10
🔗
|
bsmith093 |
from /usr/local/lib/site_ruby/1.8/rubygems.rb:1203:in `gem' |
03:10
🔗
|
bsmith093 |
from /usr/local/lib/site_ruby/1.8/rubygems/dependency.rb:256:in `to_spec' |
03:12
🔗
|
yipdw| |
gem install bundler |
03:12
🔗
|
bsmith093 |
localhost 4001 bad uri error |
03:13
🔗
|
bsmith093 |
4001 isn't even in my services file |
03:14
🔗
|
yipdw| |
paste me the result of gem env |
03:15
🔗
|
bsmith093 |
- EXECUTABLE DIRECTORY: /home/ben/.rvm/gems/jruby-1.6.5/bin |
03:15
🔗
|
bsmith093 |
- INSTALLATION DIRECTORY: /home/ben/.rvm/gems/jruby-1.6.5 |
03:15
🔗
|
bsmith093 |
- RUBY EXECUTABLE: /home/ben/.rvm/rubies/jruby-1.6.5/bin/jruby |
03:15
🔗
|
bsmith093 |
- RUBY VERSION: 1.8.7 (2011-10-25 patchlevel 330) [java] |
03:15
🔗
|
bsmith093 |
- RUBYGEMS PLATFORMS: |
03:15
🔗
|
bsmith093 |
- RUBYGEMS VERSION: 1.8.9 |
03:15
🔗
|
bsmith093 |
RubyGems Environment: |
03:15
🔗
|
bsmith093 |
- ruby |
03:15
🔗
|
bsmith093 |
- universal-java-1.6 |
03:15
🔗
|
bsmith093 |
- GEM PATHS: |
03:15
🔗
|
bsmith093 |
- /home/ben/.rvm/gems/jruby-1.6.5 |
03:15
🔗
|
bsmith093 |
- /home/ben/.rvm/gems/jruby-1.6.5@global |
03:15
🔗
|
bsmith093 |
- :update_sources => true |
03:15
🔗
|
bsmith093 |
- GEM CONFIGURATION: |
03:15
🔗
|
bsmith093 |
ok sorry should have used pastebin |
03:17
🔗
|
bsmith093 |
http://pastebin.com/dpQY4HVw |
03:19
🔗
|
yipdw| |
I'm not sure why that's resolving as it is |
03:19
🔗
|
yipdw| |
evidently rubygems.org on your machine is now resolving to localhost:4001 |
03:19
🔗
|
yipdw| |
you'll have to fix that |
03:22
🔗
|
bsmith093 |
works fine in firefox, where do i look? |
03:23
🔗
|
bsmith093 |
to fix it locally |
03:23
🔗
|
bsmith093 |
should i just reinstall rvm? |
03:57
🔗
|
dnova |
wiki down? |
03:57
🔗
|
dnova |
oh there it is. |
05:50
🔗
|
SketchCow |
Still fixing french magazines. |
12:19
🔗
|
Schbirid |
somewhere there is a website with old computer system character maps |
12:19
🔗
|
Schbirid |
anyone know the url? |
12:19
🔗
|
chronomex |
are you looking for one in particular? |
12:20
🔗
|
Schbirid |
nope, looking for the site itself |
12:20
🔗
|
Schbirid |
had the maps on the right site iirc |
12:20
🔗
|
Schbirid |
nav on the left |
12:21
🔗
|
chronomex |
oh yeah, that one |
12:21
🔗
|
chronomex |
with the text |
12:21
🔗
|
Schbirid |
dark background |
12:21
🔗
|
Schbirid |
haha |
12:21
🔗
|
chronomex |
what color was it? |
12:21
🔗
|
dnova |
lol |
12:21
🔗
|
chronomex |
there was bl |
12:21
🔗
|
chronomex |
there was black in it |
12:21
🔗
|
Schbirid |
hey, i got a memory that works like that |
12:21
🔗
|
dnova |
I found it: http://www.compukiss.com/basics/symbols-character-map-2.html |
12:21
🔗
|
dnova |
compu-KISS |
12:22
🔗
|
dnova |
with Sandy Berger |
12:22
🔗
|
Schbirid |
i was wrong but i found it http://damieng.com/blog/2011/03/27/typography-in-16-bits-system-fonts |
12:22
🔗
|
chronomex |
dang, you gots some search engine skillz |
12:22
🔗
|
chronomex |
where is the google option for "mostly black site"? |
12:23
🔗
|
Schbirid |
calm down |
20:05
🔗
|
PepsiMax |
I still have anyhub data, where can I drop it? alard his rsync doesn't seem to work anymore. |
20:14
🔗
|
SketchCow |
HEY SO |
20:15
🔗
|
SketchCow |
If anone has any other caches of magazines they find online, would love to find them. |
20:15
🔗
|
SketchCow |
French is temporarily fired. |
20:19
🔗
|
* |
DFJustin 's ears perk up |
20:20
🔗
|
DFJustin |
hungarian http://pcvilag.muskatli.hu/irodalom/begins/news.index.html |
20:20
🔗
|
DFJustin |
spanish http://www.konamito.com/publicaciones-msx/ |
20:21
🔗
|
DFJustin |
various www.retromags.com |
20:23
🔗
|
DFJustin |
one-offs http://electrickery.xs4all.nl/comp/dai/doc/ http://www.apple2online.com/index.php?p=1_65_Apple-IIGS-Buyer-s-Guide http://www.apple2online.com/index.php?p=1_70_inCider-Magazine http://www.apple2online.com/index.php?p=1_53_Newsletters http://bitsavers.org/pdf/creativeSolutions/ http://bitsavers.org/pdf/ti/ti-mix/ |
20:25
🔗
|
DFJustin |
http://www.oldgamemags.com/index.php?title=Magazine_Index |
20:32
🔗
|
DFJustin |
another japanese magazine http://narod.ru/disk/23643150000/Super%20soft%20magazine.rar.html archive password is retropc98.narod.ru |
20:33
🔗
|
DFJustin |
the filename encoding inside is hosed though fyi |
20:56
🔗
|
SketchCow |
I'd like ones with sets of jpegs and PDFs, please. |
20:56
🔗
|
SketchCow |
Also, just e-mail me, jason@textfiles.com, no need to fill the channel. |
20:57
🔗
|
DFJustin |
hehe |
20:57
🔗
|
SketchCow |
It was not enjoyable, pulling down 1,000 issues of french magazines. I would like to replace them. |
20:57
🔗
|
SketchCow |
And when I say pulling down, they're in the archive forever. |
20:58
🔗
|
DFJustin |
those should all be jpg/pdf sets |
21:01
🔗
|
alard |
PepsiMax: I thought you had started uploading to batcave? Should I switch the rsync back on? How much more do you have? |
21:10
🔗
|
PepsiMax |
Let me see |
21:15
🔗
|
PepsiMax |
alard: 42G /mnt/extdisk/archiveteam/anyhub-grab/ |
21:18
🔗
|
alard |
PepsiMax: I have 48G from you here, so perhaps part of that is already sent? |
21:18
🔗
|
alard |
(I reopened the rsync port, by the way.) |
21:19
🔗
|
alard |
It's a bit weird to see that AnyHub is back, by the way. |
21:21
🔗
|
underscor |
Yeah, I noticed that too |
21:21
🔗
|
alard |
We should make a rule for that: a site can only die once. |
21:22
🔗
|
alard |
We can't keep archiving them. |
21:25
🔗
|
alard |
I mean: take Google Video. You go through all the trouble to archive it, and then it doesn't die. That's not fair, is it? |
21:29
🔗
|
PepsiMax |
So, well, yeah I noticed |
21:29
🔗
|
SketchCow |
That's fine. |
21:29
🔗
|
PepsiMax |
I would nother bother acctally |
21:29
🔗
|
SketchCow |
I mean, we cause some of these to happen. |
21:29
🔗
|
SketchCow |
I am fine with "Oh yeah, fuck you, a second option has arrived" followed by "Oh, well, then." |
21:29
🔗
|
SketchCow |
Which is what happened in both those cases, anyhub and google video |
21:30
🔗
|
SketchCow |
We should take pride in that, the two choices aren't the site stays up or it's deleted. |
21:31
🔗
|
PepsiMax |
OK, true dat. alard: |
21:31
🔗
|
PepsiMax |
alard: I'm sending what you don't have yet. |
22:42
🔗
|
chronomex |
alard: I'm still getting splinder, what is this about closing rsync? |
23:10
🔗
|
dashcloud |
this is pretty interesting: http://blogs.loc.gov/digitalpreservation/2011/12/providing-access-to-70-million-copyright-records/ |
23:17
🔗
|
Paradoks |
So they're digitizing their card catalog for 1870 to 1977. Does this mean that 1977-present is already digital? Regardless, yes, interesting. |
23:17
🔗
|
chronomex |
yes. the feds had a big digital push in the 1970s. |
23:19
🔗
|
dashcloud |
registration wasn't required after 1977 |
23:19
🔗
|
dashcloud |
every work was assumed to be in copyright unless the author specifically disclaimed it |
23:20
🔗
|
dashcloud |
you could register it, which gave you a boost in case you ever needed to present it in a courtroom |
23:21
🔗
|
dashcloud |
what would be interesting is seeing if there are any previously-unknown public domain works to be found in that catalog |
23:37
🔗
|
chronomex |
ah, yeah. |
23:52
🔗
|
yipdw |
son of a |
23:52
🔗
|
yipdw |
ERROR (3). |
23:52
🔗
|
yipdw |
Error downloading 'it:Redazione'. |
23:52
🔗
|
yipdw |
-rw-rw-r-- 1 ec2-user ec2-user 1216211055 Dec 10 04:38 splinder.com-Redazione-blog-journal.splinder.com.warc.gz |
23:52
🔗
|
chronomex |
ffffuuuuuu |
23:53
🔗
|
yipdw |
oh wtf |
23:53
🔗
|
yipdw |
there's errors like this |
23:53
🔗
|
yipdw |
Cannot write to `./tmpfs/it/Redazione/www.splinder.com/search/profile?from=480&i=la + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +musica' (File name too long). |
23:53
🔗
|
yipdw |
you know what, I'm just going to upload what I've got of Redazione |
23:53
🔗
|
yipdw |
I suspect that there is no way to actually archive it without errors at this point |
23:54
🔗
|
chronomex |
hahaha wtf |
23:54
🔗
|
chronomex |
nice filename |
23:59
🔗
|
bsmith093 |
no one will miss one, what looks like a search page |