Time |
Nickname |
Message |
00:39
๐
|
pberry |
slow mobile me is still slow |
01:54
๐
|
SketchCow |
I can.t put my finger on the precise Reasons. |
01:54
๐
|
SketchCow |
I think the Audio-Quality needs work. |
01:54
๐
|
SketchCow |
I hear a bit of echo near the start. |
01:54
๐
|
SketchCow |
The Face (especially the eyes) is in some shots slightly out of Focus. |
01:54
๐
|
SketchCow |
The Moving shot is a bit of a risk with fixed lenses. |
01:54
๐
|
zetathust |
for mug shots |
01:54
๐
|
SketchCow |
The Focussing Bit near the End distracts a bit (I think Camera Lenses overshoot to fast when adjusting). |
01:54
๐
|
SketchCow |
Barely literate critics, have to love them. |
02:10
๐
|
underscor |
hahaha |
02:19
๐
|
rude___ |
Tell him that certain types of eyes simply absorb vast amounts of light into their cones, throwing the shot slightly out of focus.. it's rare, it's unavoidable, shit happens. |
02:19
๐
|
underscor |
lol |
02:20
๐
|
yipdw |
"camera lenses overshoot to fast when adjusting" |
02:20
๐
|
zetathust |
lenses |
02:20
๐
|
yipdw |
what |
02:22
๐
|
yipdw |
eesh, yikes |
02:22
๐
|
yipdw |
[ec2-user@ip-10-243-119-16 files.splinder.com]$ pwd; ls -1 | wc -l |
02:22
๐
|
yipdw |
22046 |
02:34
๐
|
* |
SketchCow boots zetathust, and does this: http://www.youtube.com/watch?v=Mu71EAdnjQ0 |
02:43
๐
|
dashcloud |
if someone's got a better place for me to upload the 7z tell me, otherwise here's a link to it on mediafire: http://www.mediafire.com/?49kgs4umrb79a34 |
02:44
๐
|
PatC |
If you get a dropbox you can upload it there and copy a public url |
02:44
๐
|
dashcloud |
I don't actually |
02:45
๐
|
dashcloud |
I don't really have anywhere else to put it online that I can share from, so my apologies there- but it is a pretty small download |
02:58
๐
|
SketchCow |
Why not just throw on batcave? |
02:58
๐
|
SketchCow |
I can make it browsable. |
03:02
๐
|
dashcloud |
I don't have any logins or access- if you want to PM me something, I can throw it up there right away |
03:03
๐
|
Coderjoe |
you don't have rsync? |
03:05
๐
|
dashcloud |
I do have rsync |
03:10
๐
|
dashcloud |
so how would I go out using rsync to get the folder onto batcave? |
03:15
๐
|
dashcloud |
okay- it's uploading to batcave |
03:19
๐
|
dashcloud |
okay- it's up there |
03:23
๐
|
dashcloud |
just as a note- there are some gaps in the archive, because some pages the site points to simply aren't there anymore |
03:33
๐
|
bsmith093 |
anything for the ffnet scrape |
03:36
๐
|
bsmith093 |
is it possible to upload directly into ia, using ftp |
03:47
๐
|
chronomex |
no, but you can use http |
03:47
๐
|
chronomex |
http://www.archive.org/help/abouts3.txt |
03:47
๐
|
yipdw |
oh, that's awesome |
03:47
๐
|
yipdw |
I didn't know IA had an S3 interface |
03:48
๐
|
yipdw |
that means I can reuse AWS::S3 and all the fun related bits |
03:48
๐
|
chronomex |
yeah, it's super rad. |
03:49
๐
|
chronomex |
that, and being able to specify metadata with http headers, means you can drop items into archive.org from shellscripts with 0 hassle |
03:49
๐
|
yipdw |
I quite like that |
03:51
๐
|
DFJustin |
http://www.archive.org/create.php?ftp=1 |
03:53
๐
|
chronomex |
oh, yeah, you can do that but it's kind of lousy. |
03:58
๐
|
bsmith093 |
yes but its easier to do that , than to use ftp to login firsat and create the mxml by hand |
03:58
๐
|
bsmith093 |
btw, as a library, IA kicks LoC in the nutsack |
03:59
๐
|
DFJustin |
loc's catalog is really nice |
04:01
๐
|
bsmith093 |
mostly because, when I search for anything in the Archvie, i can take the url of the *search* and dump it into jdownloader, which will then proceed to load and look for links, and find *every single result on the page*,and give me human readable links for them so i can poick and choose, without even having o click on each individual result |
04:02
๐
|
bsmith093 |
seriously thought, LoC website search is worse than useless, because it makes me give up, rather than keep looking, its just that bad |
04:03
๐
|
DFJustin |
yeah the interface sucks |
04:03
๐
|
DFJustin |
but at least the metadata is correct and somewhat consistent |
04:03
๐
|
underscor |
yipdw: Unless you need to create items with directories |
04:03
๐
|
underscor |
Then it sucks |
04:03
๐
|
underscor |
Although, it's a lot easier now that I have internal access |
04:04
๐
|
yipdw |
oh, I was thinking of using it to shove WARCs at the IA |
04:05
๐
|
underscor |
Then it's probably perfect |
04:07
๐
|
SketchCow |
TECHNICALLY it's not S3 |
04:07
๐
|
SketchCow |
It's S3 like. |
04:08
๐
|
SketchCow |
Until this calms down: http://www.archive.org/~tracey/mrtg/derivesg.html |
04:08
๐
|
SketchCow |
I'll be focusing on other things. |
04:10
๐
|
underscor |
Oh wow |
04:11
๐
|
underscor |
It's mostly ximm with all his forever-running heritrix crawls |
04:14
๐
|
bsmith093 |
yipdw: but wouldn't you need to hand vreate an xml for each warc file? |
04:15
๐
|
yipdw |
bsmith093: why would I need to hand-create it |
04:16
๐
|
underscor |
S3 automatically creates the necessary XML based off of the headers you pass in |
05:13
๐
|
SketchCow |
http://www.poe-news.com/forums/sp.php?pi=1002546492 |
05:13
๐
|
SketchCow |
poe-news.com has announced they're shutting down. |
05:14
๐
|
bsmith093 |
start the warc |
05:19
๐
|
bsmith093 |
this good? wget-warc -mpke robots=off -U "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" --warc-cdx --warc-file=poe-news.com_12022011 www.poe-news.com |
05:20
๐
|
dnova |
bsmith093: can you give me a very succinct idea of the current state of ffnet project? |
05:20
๐
|
dnova |
or a very meandering, sloppy narrative |
05:20
๐
|
dnova |
that'll work too |
05:20
๐
|
bsmith093 |
have ideas, cant code, got someihing half baked and don ish |
05:21
๐
|
bsmith093 |
underscor's working on a script to grab reviews and stories with storyinator |
05:21
๐
|
bsmith093 |
im just iterating thriugh every possible ffnet id, and culling the bad ones to make a linklist |
05:28
๐
|
bsmith093 |
underscor's way is almost certianly faster |
05:32
๐
|
arrith |
spidering the site like yipdw suggested might be the fastest |
05:33
๐
|
dnova |
arrith: can you explain #2 in the "extra credit"? |
05:33
๐
|
dnova |
http://learnpythonthehardway.org/book/ex8.html |
05:35
๐
|
arrith |
dnova: notice where double quotes get used versus where single quotes get used |
05:35
๐
|
arrith |
there's something unique about he double quoted sentence |
05:35
๐
|
dnova |
OH. |
05:35
๐
|
dnova |
the single quote wasn't escaped |
05:36
๐
|
arrith |
kinda |
05:36
๐
|
arrith |
just that there is a single quote |
05:36
๐
|
arrith |
usually when there's a single quote people use doubles |
05:36
๐
|
dnova |
hmph. well ok. thanks :) |
05:36
๐
|
arrith |
but yeah, you can escape it |
05:37
๐
|
arrith |
i dunno actually if people usually escape or not |
05:37
๐
|
arrith |
i've only seen doubles used then but i've only seen tutorialish code |
05:56
๐
|
bsmith093 |
spidering, i dont know how to tell wget to spider and save a linklist to then go back to |
05:56
๐
|
arrith |
not spider with wget |
05:56
๐
|
arrith |
spider with a ruby script that goes through the categories |
05:56
๐
|
bsmith093 |
also, on IA is it possible to edit an existing item? |
05:56
๐
|
arrith |
or python script |
05:59
๐
|
bsmith093 |
wheres the script, and how do i runit? |
06:00
๐
|
bsmith093 |
ive got something by underscor from a repo, that looks like ruby |
06:01
๐
|
arrith |
there isn't one |
06:01
๐
|
arrith |
you gotta make it |
06:01
๐
|
bsmith093 |
ugh |
06:02
๐
|
bsmith093 |
pardon me by yipdw git://gist.github.com/1432483.git |
06:10
๐
|
yipdw |
eh? |
06:10
๐
|
yipdw |
oh |
06:10
๐
|
bsmith093 |
yeah hiws that going, any updates |
06:10
๐
|
yipdw |
yeah, I maintain that only hitting what you need to hit is the fastest way to do it |
06:10
๐
|
yipdw |
I haven't touched it since then |
06:10
๐
|
yipdw |
other work, etc. |
06:11
๐
|
yipdw |
I think arrith wanted to port it to Python |
06:11
๐
|
yipdw |
you can run it right now, if you have a Ruby 1.9 environment with the connection_pool, girl_friday and mechanize gems installed |
06:12
๐
|
bsmith093 |
ok , wonderful, now, how do i get those modules installed? |
06:14
๐
|
bsmith093 |
rubygems1.9.1 or 1.9 |
06:19
๐
|
arrith |
haha |
06:20
๐
|
arrith |
yipdw: yeah i was basically waiting to see what underscor ends up with and go from there |
06:20
๐
|
bsmith093 |
seriously how do i get those ruby modules installed? |
06:20
๐
|
arrith |
possibly switching to a spidering method to get updates |
06:20
๐
|
arrith |
bsmith093: http://www.google.com/search?q=rubygems+ubuntu |
06:31
๐
|
dnova |
good god |
06:31
๐
|
dnova |
bsmith: how many stories are on ffnet? |
06:31
๐
|
dnova |
do we know? |
06:33
๐
|
dnova |
less than or equal to 10,000,000 it looks like? |
06:35
๐
|
dnova |
or: what is the highest valid ID you've found? |
06:36
๐
|
bsmith093 |
~7million |
06:36
๐
|
bsmith093 |
can some kind [erson walk me through how to insatll girl_friday gem, ive found the darn thing but it wont install with gem install |
06:37
๐
|
bsmith093 |
https://github.com/mperham/girl_friday.git |
06:41
๐
|
bsmith093 |
anyone? |
06:43
๐
|
dnova |
I have no ruby experience, sorry. |
06:44
๐
|
bsmith093 |
arrith |
06:46
๐
|
dnova |
bsmith, |
06:46
๐
|
dnova |
I think you need to relax just a little bit |
06:48
๐
|
dnova |
I added the project to the wiki frontpage |
06:49
๐
|
bsmith093 |
yeah , i know, im overtired and really need to sleep |
06:49
๐
|
chronomex |
dnova: spot on. |
06:50
๐
|
dnova |
ooh, thanks, chronomex |
06:50
๐
|
dnova |
any ideas/critiques are welcome |
06:51
๐
|
chronomex |
I meant with respect to relaxing, but the link looks good :) |
06:51
๐
|
dnova |
oh, lol |
06:52
๐
|
chronomex |
dang, it's been two months since I've uploaded anything |
06:52
๐
|
chronomex |
get busy time |
06:53
๐
|
dnova |
are you running the fix-dld script or what |
06:53
๐
|
dnova |
where are you getting all those splinder profiles!! |
06:53
๐
|
chronomex |
me? |
06:53
๐
|
chronomex |
I'm fix-dld |
06:53
๐
|
dnova |
ahh figured :D |
06:53
๐
|
chronomex |
was offline for a while. |
06:53
๐
|
dnova |
I'm downloading 2 users. have been for like 4 days |
06:53
๐
|
dnova |
one is over 12gb |
06:53
๐
|
dnova |
one is over 3 |
06:54
๐
|
dnova |
I lost one that was over 10gb because I ran out of ram+swap :( |
06:54
๐
|
chronomex |
using tmpfs? |
06:54
๐
|
chronomex |
tmpfs is only a good idea for when you're doing a bunch of threads simultaneously |
06:54
๐
|
dnova |
not the way its supposed to be (i.e., not a ramdisk) |
06:55
๐
|
chronomex |
? |
06:55
๐
|
chronomex |
no, the upload I'm doing now is to archive.org and not an archiveteam thing. |
06:56
๐
|
chronomex |
http://www.archive.org/details/bellsystem_PK-1C901-01 |
06:56
๐
|
bsmith093 |
well, gnight/gmorning ,all, im gonna go sleep like i should have done 2hrs ago bye |
06:56
๐
|
dnova |
bsmith093: sleep well. |
06:56
๐
|
chronomex |
sleep well! |
06:56
๐
|
chronomex |
arrrgh |
06:56
๐
|
dnova |
:D |
06:57
๐
|
bsmith093 |
chronomex: ook now what? |
06:57
๐
|
chronomex |
bsmith093: ? |
06:57
๐
|
bsmith093 |
you said aargh |
06:57
๐
|
chronomex |
nvm |
06:58
๐
|
bsmith093 |
k night bye |
06:59
๐
|
dnova |
heh. |
07:07
๐
|
yipdw |
bsmith093: easiest way to install it is to get a Ruby environment, get Bundler (gem install bundler), and then install all the gems in the bundle (bundle install) |
07:31
๐
|
SketchCow |
Ops, please |
16:42
๐
|
DFJustin |
http://rbelmont.mameworld.info/?p=689 |
17:37
๐
|
emijrp |
SketchCow: http://fromthepage.balboaparkonline.org/display/display_page?ol=w_rw_p_pl&page_id=1363#page/n0/mode/1up |
19:03
๐
|
SketchCow |
Nice |
19:09
๐
|
PepsiMax |
Aww yeah |
19:09
๐
|
PepsiMax |
Got my new VDSL2 hooked up. |
19:09
๐
|
PepsiMax |
263.90kB/s uploading to alard |
19:09
๐
|
PepsiMax |
alard: more anyhub is coming! |
20:57
๐
|
bsmith093 |
and i just installed the gem connection_pool |
20:57
๐
|
bsmith093 |
ok i got ruby gems to install finally, and their all setup, except im still getting this error ffgrab.rb:1:in `require': no such file to load -- connection_pool (LoadError) |
21:01
๐
|
yipdw |
bsmith093: ruby -v |
21:02
๐
|
yipdw |
actually, just send me your full terminal log |
21:02
๐
|
bsmith093 |
ruby 1.8.7 (2010-01-10 patchlevel 249) [i486-linux] |
21:02
๐
|
yipdw |
connection_pool does not work with Ruby 1.8.7, because it uses BasicObject, which only exists in Ruby 1.9 |
21:02
๐
|
yipdw |
also, Ruby 1.9 automatically loads Rubygems; 1.8.7 doesn't |
21:02
๐
|
bsmith093 |
apt install ruby1.9 |
21:02
๐
|
yipdw |
which is where the error you're seeing comes from |
21:02
๐
|
bsmith093 |
cause i think i did that |
21:03
๐
|
bsmith093 |
ruby1.9 is already the newest version. |
21:03
๐
|
yipdw |
ruby1.9 -v |
21:03
๐
|
bsmith093 |
ruby 1.9.0 (2008-10-04 revision 19669) [i486-linux] |
21:04
๐
|
yipdw |
ugh |
21:04
๐
|
yipdw |
that's...way behind |
21:04
๐
|
bsmith093 |
ah, another repo? |
21:04
๐
|
yipdw |
Ruby (and projects like it) move too fast for Debian/Ubuntu to keep up, IMO |
21:04
๐
|
bsmith093 |
oh wait yeah i just noticed the 2008 thing, wow, thats old |
21:04
๐
|
yipdw |
unless I can control the Ruby packages (e.g. for production environments) I use https://rvm.beginrescueend.com/ |
21:05
๐
|
yipdw |
it bypasses package management, but for me, the benefit outweighs that cost |
21:07
๐
|
bsmith093 |
got rvm now, grabbing ruby 1.9.3 |
21:07
๐
|
bsmith093 |
should i dump the ubuntu repo ruby? |
21:07
๐
|
yipdw |
only if you want to, it's not necessary |
21:07
๐
|
yipdw |
to dump it |
21:08
๐
|
bsmith093 |
k then, will this install it like a normal package? |
21:08
๐
|
yipdw |
RVM does not use apt, so no |
21:08
๐
|
ersi |
lol @ a language moving so fast you can't package it |
21:09
๐
|
yipdw |
it will, however, modify your environment's PATH to work out |
21:09
๐
|
yipdw |
ersi: it's not that uncommon |
21:09
๐
|
bsmith093 |
yeah ive never heard of that |
21:09
๐
|
ersi |
sounds more like a dialect, that forks all the time |
21:09
๐
|
yipdw |
I actually more often construct development environments directly from upstream than I do via OS packages |
21:09
๐
|
bsmith093 |
although i must say, this is the smoothest, complex thing i ve ever done |
21:10
๐
|
bsmith093 |
how do i keep it updated? |
21:10
๐
|
yipdw |
ersi: in particular, I've found that following upstream directly pays off for Node.js, factor, and GHC |
21:10
๐
|
yipdw |
bsmith093: rvm install [Ruby version] |
21:11
๐
|
bsmith093 |
so i have to know the version, i have , or the version i want to get? |
21:11
๐
|
yipdw |
ersi: also, the syntax and semantics of Ruby don't change that often (although ruby-core has been doing some WTFs in that regard lately) |
21:11
๐
|
yipdw |
ersi: the libraries, on the other hand |
21:11
๐
|
yipdw |
bsmith093: yes; rvm list will show you those |
21:11
๐
|
bsmith093 |
oh, wow this is cool, ive also never had this much feedback from a compiler that i could actually follow |
21:12
๐
|
ersi |
I'm having a hard time understanding how a 10+year language can move so fast it's bleeding edge all the time |
21:12
๐
|
yipdw |
the language itself does not |
21:12
๐
|
yipdw |
implementations and libraries do |
21:14
๐
|
bsmith093 |
you know what would be nice? a dummy package for every linux distro, that does [language]-all, and grabs everything in the repos for that language |
21:14
๐
|
yipdw |
that would be infeasibly huge |
21:14
๐
|
bsmith093 |
how big could that possibly be? |
21:14
๐
|
yipdw |
for Ruby alone there's 31,503 libraries |
21:15
๐
|
yipdw |
Java would be an order of magnitude larger |
21:15
๐
|
bsmith093 |
mother of Turing, that's a lot of development |
21:15
๐
|
bsmith093 |
and to be fair, java mostly take care of it self as it needs to |
21:16
๐
|
bsmith093 |
keep jvm updated and afaik thats all u need to worry about |
21:16
๐
|
yipdw |
Hackage lists, uh |
21:17
๐
|
yipdw |
something around 3633 packages for Haskell |
21:17
๐
|
bsmith093 |
ok, ok, so languages are much bigger that I thought, in their entirety |
21:18
๐
|
yipdw |
yeah -- I find that a language is really nothing without its libraries |
21:18
๐
|
yipdw |
I mean, sure, you can install an implementation of a language |
21:18
๐
|
yipdw |
but it's really pretty useless on its own |
21:18
๐
|
bsmith093 |
hey another thing, does a sudo operation keep root until its done, or is their a timer somewhere? |
21:19
๐
|
bsmith093 |
because ive had things crap out asking for rights halfway through |
21:19
๐
|
bsmith093 |
rubys's done |
21:20
๐
|
bsmith093 |
annnnd.. same error as last time only this time ruby -v ruby 1.8.7 (2010-01-10 patchlevel 249) [i486-linux] |
21:20
๐
|
yipdw |
rvm use 1.9.3 |
21:21
๐
|
bsmith093 |
/home/ben/.rvm/rubies/ruby-1.9.3-p0/lib/ruby/site_ruby/1.9.1/rubygems/custom_require.rb:36:in `require': cannot load such file -- connection_pool (LoadError) |
21:21
๐
|
bsmith093 |
from /home/ben/.rvm/rubies/ruby-1.9.3-p0/lib/ruby/site_ruby/1.9.1/rubygems/custom_require.rb:36:in `require' |
21:21
๐
|
bsmith093 |
from ffgrab.rb:1:in `<main>' |
21:21
๐
|
yipdw |
paste me the full terminal outpuyt |
21:22
๐
|
bsmith093 |
/home/ben/.rvm/rubies/ruby-1.9.3-p0/lib/ruby/site_ruby/1.9.1/rubygems/custom_require.rb:36:in `require': cannot load such file -- connection_pool (LoadError) |
21:22
๐
|
bsmith093 |
ben@ben-laptop:~/1432483$ |
21:22
๐
|
bsmith093 |
from /home/ben/.rvm/rubies/ruby-1.9.3-p0/lib/ruby/site_ruby/1.9.1/rubygems/custom_require.rb:36:in `require' |
21:22
๐
|
bsmith093 |
from ffgrab.rb:1:in `<main>' |
21:22
๐
|
bsmith093 |
ruby ffgrab.rb |
21:22
๐
|
bsmith093 |
thats what i get |
21:22
๐
|
yipdw |
gem install bundler; bundle install |
21:23
๐
|
yipdw |
the Gemfile in the gist repo is a dependency manifest |
21:23
๐
|
yipdw |
for Bundler |
21:24
๐
|
bsmith093 |
i nthought that was important, i kept trying ruby Gemfile on the offchance something would happen, this is not an intuitive lang to install |
21:25
๐
|
bsmith093 |
Fetching source index for http://rubygems.org/ |
21:25
๐
|
bsmith093 |
now that seems like i would need that for gems, becasue thats where i found connection_pool and girl_friday |
21:25
๐
|
yipdw |
Rubygems is a packaging mechanism; bundler's a tool for managing packages |
21:25
๐
|
yipdw |
they're related, but Rubygems is independent of Bundler |
21:26
๐
|
bsmith093 |
well its finding the deps, and indtalling them , so whoo. |
21:26
๐
|
bsmith093 |
holy crap its running |
21:27
๐
|
yipdw |
I'd like to again point out that it doesn't do anything to record its results |
21:27
๐
|
bsmith093 |
and apparently, its timed itself to 6 decimal places? |
21:27
๐
|
yipdw |
times what |
21:27
๐
|
bsmith093 |
timestamp goes out to seconds.###### |
21:28
๐
|
yipdw |
that's the default behavior of the Ruby logger library |
21:28
๐
|
yipdw |
but, yeah, there's no point in running that as-is for a long time |
21:28
๐
|
bsmith093 |
man, thats precise |
21:28
๐
|
yipdw |
because it doesn't yet actually do anything aside from spit results to the console |
21:29
๐
|
yipdw |
I'm not even sure if it handles pages correctly -- I *think* it does, but I haven't run it long enough to see how they get processed in the queue |
21:30
๐
|
bsmith093 |
i just had a though, do user profiles show the stories all on one page, regardless of how many there are, cause that might be a help. |
21:30
๐
|
yipdw |
possibly, but AFAIK there is no way to get a list of all users |
21:31
๐
|
bsmith093 |
other than doing my original idea, and yours is much faster and uses less resources over all, on both ends |
21:32
๐
|
yipdw |
I can tell you that my method results in a lot of duplicates |
21:32
๐
|
yipdw |
in particular, it doesn't yet account for the "last page" link in each story |
21:32
๐
|
yipdw |
that will have to be filtered out in the discovery logic |
21:33
๐
|
bsmith093 |
yeah i dont really have any thoughts for that |
21:33
๐
|
yipdw |
it's just more HTML scraping |
21:33
๐
|
bsmith093 |
although the chapter is just a number appended to the link |
21:33
๐
|
yipdw |
not hard, just needs to be done |
21:33
๐
|
bsmith093 |
the next and back buttons are javascript, i think |
21:34
๐
|
yipdw |
https://gist.github.com/705cd333e06178057dec |
21:34
๐
|
yipdw |
that's a list of 4,506 story links recovered by ffgrab |
21:34
๐
|
yipdw |
well |
21:34
๐
|
yipdw |
4506 / 2 roughly |
21:34
๐
|
bsmith093 |
wait the number before the title, thats the last chapter? |
21:34
๐
|
yipdw |
that's a chapter indicator |
21:35
๐
|
bsmith093 |
so let ffgrab run till its done then grep for dupes and keeep the higest number |
21:35
๐
|
yipdw |
I'd rather fix it in the grabber |
21:35
๐
|
yipdw |
to ignore it, you'll have to change what stories_and_categories_of does at lines 12-13 |
21:36
๐
|
yipdw |
I'm not sure what the change is, as I haven't looked at ff.net's page structure close enough to make the discernment |
21:36
๐
|
bsmith093 |
its still faster that iterating through 10mil semi fake links |
21:37
๐
|
yipdw |
I am also suspicious of results like this: |
21:37
๐
|
yipdw |
I, [2011-12-07T15:33:31.381205 #75544] INFO -- : Found 0 categories, 0 stories from /book/My_Sweet_Audrina/ |
21:37
๐
|
yipdw |
in that case, there really are no entries that show up |
21:37
๐
|
yipdw |
but any 0/0 results make me suspicious that the script is missing something |
21:38
๐
|
bsmith093 |
i was right, it is js here <input value=" < Prev " onclick="self.location='/s/7066342/6/The_Same_Will_Never_Happen_to_You'" type="BUTTON"> <select title="chapter navigation" name="chapter" onchange="self.location = '/s/7066342/'+ this.options[this.selectedIndex].value + '/The_Same_Will_Never_Happen_to_You';"><option value="1">1. Such a Shame</option><option value="2">2. Don't Do This</option><option value="3">3. The |
21:39
๐
|
yipdw |
that's not the link I was talking about |
21:39
๐
|
yipdw |
look at e.g. http://www.fanfiction.net/comic/300 |
21:39
๐
|
yipdw |
see the รยป link? |
21:39
๐
|
yipdw |
that's the link to the last completed chapter |
21:39
๐
|
yipdw |
which is the link that the discovery code is picking up (and shouldn't pick up) |
21:40
๐
|
yipdw |
there's a few ways to fix that |
21:40
๐
|
bsmith093 |
hey I never noticed that before |
21:41
๐
|
yipdw |
anyway, I need to try to finish up some webapp work at work |
21:41
๐
|
yipdw |
which is a shitload of fuck related to the DOM and event propagation |
21:41
๐
|
yipdw |
as James Rolfe might put it |
21:41
๐
|
yipdw |
brb |
21:41
๐
|
bsmith093 |
wait so grab fanfiction.net/storyid/1 and that last chapter link, and generate all the rest of the links between them |
21:41
๐
|
bsmith093 |
yeah work comes first |
21:42
๐
|
bsmith093 |
lol nice reference |
21:42
๐
|
yipdw |
that's one possibility; another possibility is to just have wget-warc follow the links |
21:42
๐
|
bsmith093 |
thats what i tired it only grabbed 300k files |
21:43
๐
|
bsmith093 |
wget-warc -mcpke robots off with ua for firefox |
21:43
๐
|
bsmith093 |
speaking of which, I'm still grabbing poe-news |
21:59
๐
|
emijrp |
IA is going to create a collection for Occupy movement http://blog.archive.org/2011/12/07/archive-it-team-encourages-your-contributions-to-the-%E2%80%9Coccupy-movement%E2%80%9D-collection/ |
21:59
๐
|
emijrp |
but I think that there is no collection for Spanish Revolution or Arab Spring |
22:06
๐
|
emijrp |
I have many links to share if IA creates an Archive-It collection. I offered my help some weeks ago. |
22:07
๐
|
emijrp |
(I mean about Spanish Rev.) |
22:20
๐
|
DFJustin |
you can upload the stuff now and the collection can be made later |
22:23
๐
|
emijrp |
I prefer to use the Archive-It system. I don't want to upload a tarball with websites that can be viewed online. |
22:24
๐
|
emijrp |
Or 200 gb of videos (i have 6000+) because i cannot with my home connection. |
22:24
๐
|
yipdw |
bsmith093: ok, so, I've got a variant of ffgrab recording story IDs in a Redis instance |
22:25
๐
|
emijrp |
I'm tired of content being uploaded to IA as huge boxes that cant be viewed easily. |
22:27
๐
|
bsmith093 |
emijrp: meaning hat, exactly |
22:27
๐
|
bsmith093 |
huge iso files? |
22:28
๐
|
emijrp |
scrapes of forums, blogs hostings, geocities, wikis, yahoo videos |
22:28
๐
|
bsmith093 |
whats wrong with that? |
22:29
๐
|
DFJustin |
it's not great, but it's better to get the stuff backed up in some form first |
22:29
๐
|
emijrp |
that you cant use them easily |
22:29
๐
|
bsmith093 |
IA , afaik, isn't really meant as a mirror, its an archive, of the raw data, meant for historical research purpose s |
22:30
๐
|
bsmith093 |
im sure there's a script for that somewhere |
22:30
๐
|
bsmith093 |
besides that, complain to them, not archiveteam. |
22:31
๐
|
emijrp |
IA has always offered content in a viewable way (wayback, videos and audio with metadata) |
22:31
๐
|
bsmith093 |
thers an entire section of ia dedicated to geocities |
22:31
๐
|
emijrp |
archiveteam is uploading dozen-GB tarballs with apcked content |
22:32
๐
|
DFJustin |
it's just a manpower thing, right now it's all jason can do just to keep up with the tarballs coming in |
22:33
๐
|
bsmith093 |
i would imagine so, you cant grep through a tarball that i know of, and im sure he's doing the best he can, speaking of which, he's not the only person there doing what he's doing , is he? |
22:33
๐
|
bsmith093 |
SketchCow: voworkers? |
22:33
๐
|
DFJustin |
priority #1 has to be getting things off random people's hard drives and into IA's backup infrastructure so it doesn't just go poof |
22:34
๐
|
bsmith093 |
true and thats better than nothing by a long shot |
22:34
๐
|
bsmith093 |
but emijrp has a valid point, there needs to be a way to search through all this crapload of otherwise nearly-useless data |
22:34
๐
|
DFJustin |
for sure |
22:35
๐
|
dnova |
yeah. you download it, untar it, and look through it |
22:35
๐
|
bsmith093 |
seriously, is it opossible to search through a remote tarball, cause that would be awsome |
22:35
๐
|
dnova |
most of these collections aren't meant for casual browsing, afaik. |
22:35
๐
|
yipdw |
bsmith093: curl http://[host][path][file].tar.gz | gunzip -c |
22:35
๐
|
bsmith093 |
and some, like the utzoo tapes are slightly damaged and should be repaired |
22:36
๐
|
bsmith093 |
yip,thats a remote search, asin it doesnt download all of it first |
22:36
๐
|
yipdw |
that doesn't download all of it |
22:36
๐
|
emijrp |
YES, I going to download the 600GB Geogicites pack to watch a site. The good approach is geocities.ws or the mirrors people created. |
22:36
๐
|
yipdw |
it only goes until you terminate the site |
22:36
๐
|
yipdw |
er, connection |
22:36
๐
|
Ymgve |
http://news.slashdot.org/story/11/12/07/2034200/library-of-congress-to-receive-entire-twitter-archive |
22:36
๐
|
Ymgve |
cool |
22:37
๐
|
dnova |
emijrp: nobody is stopping YOU from making mirrors |
22:37
๐
|
yipdw |
if you want something more sophisticated than that, you need to build an index |
22:37
๐
|
dnova |
not everyone can afford to host these things. |
22:39
๐
|
yipdw |
hmm, my ff link grabber is still a bit retarded |
22:39
๐
|
yipdw |
I, [2011-12-07T16:39:37.370755 #78706] INFO -- : Found 0 categories, 0 stories from /r/6551377/ |
22:39
๐
|
yipdw |
should ignore these: |
22:39
๐
|
yipdw |
I, [2011-12-07T16:39:39.987928 #78706] INFO -- : Found 0 categories, 0 stories from /u/1148547/Hobbit4Lyfe |
22:39
๐
|
yipdw |
oh well |
22:39
๐
|
DFJustin |
videos can be uploaded to archive.org right now to the community videos collection, and then once there's a bunch of them it should be easy to poke someone and get them to create a collection |
22:40
๐
|
DFJustin |
if you don't have bandwidth then recruit some buddies |
22:40
๐
|
yipdw |
heh |
22:41
๐
|
yipdw |
1.9.2-p290 :015 > b = Redis.new.smembers('stories').map(&:to_i).sort; [b.length, b.min, b.max] |
22:41
๐
|
yipdw |
=> [89974, 158, 7617073] |
22:41
๐
|
yipdw |
that's a pretty sparsely inhabited space |
22:42
๐
|
yipdw |
granted, that doesn't include any of the crossovers etc |
22:57
๐
|
bsmith093 |
wait 80% full is sparsely inhabited |
22:59
๐
|
bsmith093 |
afaik, every genre page has its own crossover page, and would it kill somebody to back this script up by sorting the good/bad story ids? |
23:00
๐
|
bsmith093 |
becasue the lowest story id is 4, if im reading that right, yours says 158 |