Time |
Nickname |
Message |
01:39
🔗
|
swebb |
Oops. I just realized that my irc logger has been off for the last 3 days. Doah. http://badcheese.com/~steve/atlogs |
01:42
🔗
|
tef_ |
godane: I should have something in ~15 minutes |
01:52
🔗
|
tef_ |
godane: I think i'm done |
01:53
🔗
|
godane |
:-D |
01:53
🔗
|
godane |
code please? |
01:55
🔗
|
tef_ |
http://code.hanzoarchives.com/warc-tools/src/2a7976f9e7d7/warclinks.py |
01:55
🔗
|
tef_ |
should handle all sorts of links in warcs (only html though...) |
01:55
🔗
|
tef_ |
handles relative urls too |
01:55
🔗
|
tef_ |
I happened to have a html link extractor using the py stdlib kicking around |
01:55
🔗
|
tef_ |
and it helps I wrote a warc library :-) |
01:56
🔗
|
tef_ |
should be able to do hg clone ... (or grab a tarball) |
01:56
🔗
|
tef_ |
export PYTHONPATH=`pwd` |
01:56
🔗
|
tef_ |
python warclinks.py warc-files.... |
01:56
🔗
|
tef_ |
handles gzipped, non gzipped files |
01:57
🔗
|
tef_ |
if you have +6 month old warc files when the wget-warc produced weird files, I can put in a fix in for that, but warc2warc --wget-chunk-fix should sort it |
01:57
🔗
|
tef_ |
it doesn't keep a set of links |
01:57
🔗
|
tef_ |
it could product a list of urls found in the links, that aren't in the warc |
01:57
🔗
|
tef_ |
but you can do warcdump ... | grep WARC-Target | cut ... |
01:58
🔗
|
godane |
found a error |
01:58
🔗
|
tef_ |
any questions? I've only tested it a little |
01:58
🔗
|
tef_ |
ah balls |
01:58
🔗
|
tef_ |
can you pastebin ? |
01:59
🔗
|
godane |
http://pastebin.com/NfbFUy2Q |
01:59
🔗
|
tef_ |
hrm, it shouldn't be raising that |
01:59
🔗
|
tef_ |
oh i'm a muppet. |
01:59
🔗
|
tef_ |
hrm, you've got some lovely html there :-) |
02:00
🔗
|
godane |
i know |
02:00
🔗
|
godane |
this is the first time that grep -ohP doesn't work to grab/filter all urls |
02:01
🔗
|
godane |
i'm trying to use that to grab all images from sites like techcrunch and such |
02:01
🔗
|
tef_ |
I pushed a fix to skip them properly |
02:01
🔗
|
tef_ |
but I should replace it with something more reliable than python's built in parser |
02:01
🔗
|
tef_ |
maybe I should use beautiful soup or lxml |
02:02
🔗
|
tef_ |
but it will get you *some* of the urls, maybe, I hope, :-) |
02:05
🔗
|
tef_ |
ugh |
02:05
🔗
|
tef_ |
I am an idiot |
02:05
🔗
|
tef_ |
anyway, I'm gonna try and put beautiful soup in |
02:05
🔗
|
tef_ |
should handle everything |
02:06
🔗
|
godane |
ok |
02:06
🔗
|
tef_ |
rather than committing typos :3 |
02:29
🔗
|
tef_ |
godane: pushed |
02:29
🔗
|
tef_ |
should use lxml |
02:29
🔗
|
tef_ |
well almost pushed |
02:29
🔗
|
tef_ |
pushed *now* |
02:30
🔗
|
tef_ |
godane: ping |
02:30
🔗
|
godane |
hey |
02:30
🔗
|
godane |
i got it |
02:31
🔗
|
godane |
there looks be warnings of parse error |
02:31
🔗
|
tef_ |
hrm |
02:32
🔗
|
tef_ |
you may need into install lxml, via python-lxml (apt) or easy_install lxml |
02:32
🔗
|
godane |
you didn't fix my problem |
02:32
🔗
|
godane |
the lines still break |
02:33
🔗
|
godane |
but this does look better and has more stuff in it now |
02:33
🔗
|
tef_ |
just fixing a bug |
02:33
🔗
|
tef_ |
well, maybe a bug |
02:34
🔗
|
tef_ |
godane: how are you running it, I get a whole slew of urls from the examples I try |
02:36
🔗
|
underscor |
tef_: are you the tef that recently visited #hackerfurs? |
02:36
🔗
|
godane |
python warclinks groklaw.net-articles-2006.warc.gz > log |
02:36
🔗
|
godane |
*warclinks.py |
02:37
🔗
|
tef_ |
underscor: yeah, I got dragged in by mithaldu |
02:37
🔗
|
tef_ |
I heard some furries were trash talking my code :-) |
02:37
🔗
|
tef_ |
I assume you're the same underscor there |
02:38
🔗
|
tef_ |
what's the lines still break thing ? |
02:38
🔗
|
tef_ |
hrm |
02:38
🔗
|
godane |
like i said |
02:39
🔗
|
tef_ |
i'm slow :3 |
02:39
🔗
|
godane |
this warc.gz is special |
02:39
🔗
|
tef_ |
oh so special |
02:39
🔗
|
tef_ |
i'd ask for a copy but I assume It's huge |
02:39
🔗
|
godane |
no just ~15mb |
02:41
🔗
|
underscor |
tef_: haha, yeah |
02:41
🔗
|
tef_ |
small world, innit |
02:41
🔗
|
tef_ |
I backed out cos well, I had a clearout of irssi windows |
02:42
🔗
|
underscor |
Aye |
02:46
🔗
|
godane |
tef_: you can download it here: http://archive.org/details/groklaw.net-articles-2006-20120827-mirror |
02:47
🔗
|
tef_ |
godane: fetching now |
02:50
🔗
|
tef_ |
oh *wow* |
02:51
🔗
|
godane |
you see what i mean now |
02:51
🔗
|
godane |
even doing a tr -d '\n' does nothing to it |
02:52
🔗
|
tef_ |
yeah |
02:52
🔗
|
tef_ |
that is rather amazing |
02:54
🔗
|
tef_ |
pushed a fix :3 |
02:54
🔗
|
tef_ |
godane: try now |
02:54
🔗
|
tef_ |
I can also try stripping fragments too, but I think sed can fix that |
02:55
🔗
|
godane |
lots of errors now |
02:55
🔗
|
tef_ |
hrm ? I get a bunch of links out |
02:56
🔗
|
tef_ |
did python warclinks.py ~/Downloads/groklaw.net-articles-2006.warc.gz |sort|uniq |
02:56
🔗
|
tef_ |
and without newlines and such |
02:56
🔗
|
tef_ |
try repulling incase something weird happened |
02:57
🔗
|
godane |
file "warclinks.py", line 64, in extract_links_from_warcfh |
02:58
🔗
|
godane |
there error i have is your fix |
02:58
🔗
|
tef_ |
hrm |
02:58
🔗
|
tef_ |
do you have a little bit more of that error ? |
02:59
🔗
|
tef_ |
it parses on mine, what version of python are you using ? |
02:59
🔗
|
godane |
yield link.translate(None, '\n\r\t') |
02:59
🔗
|
godane |
i'm using python2 |
02:59
🔗
|
Coderjoe |
2.6 or 2.7? |
02:59
🔗
|
tef_ |
can you paste the entire traceback |
02:59
🔗
|
godane |
2.7.3 |
02:59
🔗
|
godane |
i can't right now |
02:59
🔗
|
tef_ |
baws |
02:59
🔗
|
godane |
i'm on firefox proxy |
02:59
🔗
|
tef_ |
can you copy and pase the error message at least? |
03:00
🔗
|
tef_ |
rather than just the line |
03:00
🔗
|
tef_ |
which exception |
03:00
🔗
|
tef_ |
as it works on my machine (tm) |
03:02
🔗
|
tef_ |
http://secretvolcanobase.org/~tef/warc_links.txt.gz example output |
03:03
🔗
|
godane |
http://pastebin.com/NnaN79q1 |
03:04
🔗
|
tef_ |
2.7.3 weeerid |
03:04
🔗
|
tef_ |
http://docs.python.org/library/stdtypes.html#str.translate |
03:04
🔗
|
tef_ |
cos it says two arguments here |
03:05
🔗
|
tef_ |
anyway, the txt.gz file has the links you want, I hope |
03:05
🔗
|
tef_ |
hrm |
03:05
🔗
|
tef_ |
aaaaha |
03:05
🔗
|
tef_ |
for some reason on your machine it is sending in unicode |
03:08
🔗
|
tef_ |
godane: pull or try the output provided |
03:10
🔗
|
godane |
thank you |
03:11
🔗
|
tef_ |
fixed? |
03:11
🔗
|
godane |
yes |
03:11
🔗
|
tef_ |
\o/ |
03:11
🔗
|
godane |
i think |
03:12
🔗
|
tef_ |
well that took longer than 15 minutes :3 |
03:12
🔗
|
tef_ |
what an awful warc file |
04:00
🔗
|
godane |
looks like that warc had 700+mb of pdfs, mp3, ogg, and images from groklaw.net |
04:11
🔗
|
godane |
there is a error again |
04:12
🔗
|
godane |
tef_: ping ^ |
07:48
🔗
|
alard |
Similarly, it might be useful to disable proxy_buffering if it's enabled. That can also be done from the script with an extra HTTP header in the response, if that's easier. |
07:48
🔗
|
alard |
underscor: Thanks for the warctozip update. Although the new POST things don't really work: your Nginx config apparently has a very low client_max_body_size. Perhaps you can increase that a bit? (It would be even nicer if it didn't buffer the request at all, but that seems to be impossible with Nginx.) |
09:22
🔗
|
Schbirid |
thanks for the Aktuelles Software Magazine collection! |
09:36
🔗
|
Schbirid |
does someone have/know a tool to completely download a reddit thread? the increments when you click "more" get tiny, so it is quite annoying to do by hand |
09:37
🔗
|
ersi |
it's called a scripting language, and it's a very sharp tool |
09:37
🔗
|
ersi |
^_^ |
09:38
🔗
|
ersi |
Wonder how they do the comment collapsing, should take a look at that sometime |
09:39
🔗
|
Schbirid |
same would be handy for facebook, those threads are nearly impossible to get with a browser since they cant keep up rendering thousands of comments |
09:40
🔗
|
alard |
Wget+Lua! |
09:40
🔗
|
* |
Schbirid runs away |
09:41
🔗
|
ersi |
Ooh, should take a looksie at wget+lua sometime as well |
10:49
🔗
|
tef_ |
godane: ? |
13:16
🔗
|
godane |
tef_: hey |
13:16
🔗
|
godane |
i'm back |
13:16
🔗
|
godane |
it looks like some keys have problems with unicode |
13:16
🔗
|
godane |
like 0x94 |
13:16
🔗
|
godane |
and 0x31 |
13:17
🔗
|
tef_ |
hrm |
13:45
🔗
|
SketchCow |
I just asked archive.org a question about scanning. |
13:45
🔗
|
SketchCow |
Can we have a volunteer corps of people in the SF Bay area who come in and operate a bookscanner assigned to our group, who then scan computer historical documents. |
13:46
🔗
|
SketchCow |
If they say yes, I'll start harassing people about joining up. |
13:51
🔗
|
tef_ |
godane: put in a better fix, maybe |
14:57
🔗
|
underscor |
http://want.archive.org/ |
14:57
🔗
|
underscor |
alard: that will go through the load balancer instead of running on my dev box, if you want to update the demo app |
15:39
🔗
|
SketchCow |
underscor: Please add a line under "currently only for books/things with ISBNs" |
15:39
🔗
|
SketchCow |
Experimental: Do not use as a sign-off for large donations of books. Please contact info@archive.org. |
15:39
🔗
|
SketchCow |
Remove secret mode line |
15:50
🔗
|
godane |
i got over 8gb of groklaw.org |
15:50
🔗
|
godane |
:-D |
15:51
🔗
|
godane |
i do have split some the warc.gz cause downloads stop sometimes |
15:52
🔗
|
godane |
it maybe closer to 4gb cause i have the mirror .tar.gz and .warc.gz |
15:56
🔗
|
alard |
underscor: My want-it demo app is asleep, I don't know if I will wake it up again. (I ran the human.io app on my home computer.) |
15:56
🔗
|
alard |
Also, the want-it api is also visible on http://warctozip.archive.org/ ? |
16:15
🔗
|
tef_ |
godane: did the most recent fix, well, uh fix |
16:15
🔗
|
godane |
i don't know |
16:15
🔗
|
tef_ |
heh |
16:16
🔗
|
godane |
i see the error again with my groklaw.net 2011 dump |
16:16
🔗
|
tef_ |
godane: yeah I'm not sure why your lxml is returning unicode |
16:17
🔗
|
godane |
i think its mostly cause groklaw is special |
16:17
🔗
|
godane |
i also get some bad urls like this: http://www.groklaw.net/htt[://www.groklaw.net/pdf3/LodsysvCombay-26.pdf |
16:18
🔗
|
godane |
luckly all bad urls on the top of the list |
16:18
🔗
|
tef_ |
heh |
16:18
🔗
|
tef_ |
yeah I can't fix their broken links |
16:19
🔗
|
godane |
the thing is i checked for that file |
16:19
🔗
|
tef_ |
pushing a better check for unicode for what it is worth |
16:19
🔗
|
tef_ |
either way I hope you've got more stuff than you would have had without it |
16:19
🔗
|
tef_ |
despite it being buggy and crap :-) |
16:20
🔗
|
godane |
it has that same broke line problem from what i can tell |
16:22
🔗
|
tef_ |
baws |
16:23
🔗
|
tef_ |
I'm not going to have a lot of time, if any to keep playing hunt the bug when I'm struggling to recreate some of the weirder errors |
16:23
🔗
|
tef_ |
sorry :/ |
16:23
🔗
|
godane |
thats ok |
16:24
🔗
|
godane |
it filters out the bad urls better then before |
16:24
🔗
|
godane |
and i think it does fix most of the bad urls |
16:27
🔗
|
tef_ |
yay :D |
16:27
🔗
|
tef_ |
you might find google refine will be good for cleaning up large data sets like this |
16:39
🔗
|
DFJustin |
<Zuu_> I have a website that I would like to be archived, how would I do so? |
16:39
🔗
|
DFJustin |
<Zuu_> it's going down saturday sometime, i'll just leave this here: http://www.therevoltpress.org/ |
16:39
🔗
|
DFJustin |
did anyone do this |
16:39
🔗
|
DFJustin |
godane was disconnected at the time |
16:41
🔗
|
Patt |
it looks like the website is still up |
16:47
🔗
|
godane |
i will try to grab it soon |
16:48
🔗
|
godane |
my groklaw.net grab is very special so i don't want it to stop downloading |
16:48
🔗
|
Patt |
godane, let me know when/where you download it when your done please |
16:50
🔗
|
godane |
good news is it doesn't look like it was updated since last year |
16:53
🔗
|
godane |
but there boards have been busy |
16:53
🔗
|
Patt |
yea, it will be until it closes |
16:53
🔗
|
Patt |
no ETA though |
16:53
🔗
|
SketchCow |
want.archive.org is apparently going to shift names, so don't get comfy with it. :) |
17:12
🔗
|
godane |
i have to login with a user name and password |
17:13
🔗
|
godane |
how do you do that with wget? |
17:14
🔗
|
alard |
godane: HTTP basic authentication? wget --help | grep user |
17:15
🔗
|
Patt |
godane, you can login with anonymous / anonymous |
17:15
🔗
|
Patt |
btw |
17:24
🔗
|
godane |
i'm get this for cookie: |
17:24
🔗
|
godane |
therevoltpress.org FALSE / FALSE 1377710618 bblastactivity 0 |
17:24
🔗
|
godane |
therevoltpress.org FALSE / FALSE 1377710618 bblastvisit 1346174618 |
17:24
🔗
|
godane |
its not working |
17:24
🔗
|
godane |
stupid me |
17:24
🔗
|
godane |
wrong url |
17:25
🔗
|
godane |
still doesn't work |
17:25
🔗
|
godane |
therevoltpress.org FALSE / FALSE 1377710728 bblastactivity 0 |
17:25
🔗
|
godane |
therevoltpress.org FALSE / FALSE 1377710728 bblastvisit 1346174728 |
17:28
🔗
|
godane |
i don't think i can mirror it |
17:35
🔗
|
godane |
what am i doing wrong here: |
17:35
🔗
|
godane |
://therevoltpress.org/boards/" --keep-session-cookies --load-cookies=cookies1.tx |
17:35
🔗
|
godane |
cdx |
17:35
🔗
|
godane |
t --content-disposition --mirror --warc-file=therevoltpress.org-20120828 --warc- |
17:35
🔗
|
godane |
wget "http |
17:43
🔗
|
godane |
can anyone help me? |
17:43
🔗
|
godane |
its driving me nuts |
17:44
🔗
|
godane |
cause i have no idea on how to add cookies to wget the right way |
17:48
🔗
|
balrog_ |
godane: do you have a cookies.txt? |
17:48
🔗
|
balrog_ |
and is it properly formatted? |
17:48
🔗
|
godane |
yes |
17:48
🔗
|
godane |
its just like the other ones |
17:48
🔗
|
godane |
i'm using export cookies addon for firefox to get the cookie |
17:49
🔗
|
godane |
i may not know where to point it to through |
17:49
🔗
|
godane |
cause therevoltpress.org/boards/ is not working with wget |
17:49
🔗
|
godane |
even therevoltpress.org/boards/login.php doesn't work |
17:52
🔗
|
alard |
-U "Somethingelse." ? |
17:52
🔗
|
alard |
They may be blocking wget. |
17:53
🔗
|
godane |
that didn't work |
17:56
🔗
|
godane |
there using vBulletin 3.8.0 if that helps |
17:58
🔗
|
godane |
this maybe better for you guys to do it |
17:58
🔗
|
godane |
i can't do much here |
17:58
🔗
|
godane |
and even if i could get all of it i maybe more then 10gb |
17:59
🔗
|
godane |
and i don't think i can get the uploaded on my internet speed |
18:08
🔗
|
balrog_ |
godane: that's worked for me... |
18:08
🔗
|
balrog_ |
are you faking the UA? |
18:08
🔗
|
balrog_ |
I had to for one project |
18:15
🔗
|
godane |
yes |
18:15
🔗
|
godane |
show me your code please? |
18:15
🔗
|
godane |
and send me your cookie |
18:15
🔗
|
godane |
i getting false / false with my cookies for some reasone |
18:19
🔗
|
godane |
balrog_: can please sead me the code? |
18:19
🔗
|
godane |
i'm dieing here |
18:30
🔗
|
godane |
wget "http |
18:30
🔗
|
godane |
://therevoltpress.org/boards/login.php?do=login" --mirror --warc-file=therevoltp |
18:30
🔗
|
godane |
ress.org-20120828 --warc-cdx -U "ArchiveTeam" --load-cookies=cookies1.txt |
18:30
🔗
|
godane |
thats my code |
18:30
🔗
|
godane |
you show my yours? |
18:30
🔗
|
godane |
*me |
18:31
🔗
|
godane |
or at least tell me the url your using |
18:34
🔗
|
godane |
balrog_: where the hell are you? |
18:35
🔗
|
balrog_ |
busy, stuck at work |
18:35
🔗
|
godane |
can you please help me? |
18:35
🔗
|
godane |
i don't know why this site will not download |
18:35
🔗
|
godane |
and i don't know how the hell to save the cookies through wget anymore |
18:36
🔗
|
balrog_ |
what's in cookies1.txt before you start? |
18:36
🔗
|
godane |
therevoltpress.org FALSE / FALSE 1377696171 bblastvisit 1346174589 |
18:36
🔗
|
godane |
therevoltpress.org FALSE / FALSE 1377696451 bblastactivity 0 |
18:36
🔗
|
godane |
www.therevoltpress.org FALSE / FALSE 0 __utmc 1 |
18:36
🔗
|
godane |
www.therevoltpress.org FALSE / FALSE 1346159957 __utmb 1.2.10.1346158150 |
18:36
🔗
|
godane |
www.therevoltpress.org FALSE / FALSE 1361926157 __utmz 1.1346158150.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none) |
18:36
🔗
|
godane |
www.therevoltpress.org FALSE / FALSE 1409230157 __utma 1.882311859.1346158150.1346158150.1346158150.1 |
18:36
🔗
|
godane |
therevoltpress.org FALSE / FALSE 0 bbsessionhash a11c86836d5471bdda445db209cb2e5a |
18:36
🔗
|
godane |
thats all my therevoltpress.org cookies |
18:37
🔗
|
godane |
i have no idea why there not working |
18:39
🔗
|
godane |
Patt: any ideas on how to mirror therevoltpress.org |
18:39
🔗
|
godane |
Patt: remember you asked for me by name |
18:40
🔗
|
underscor |
Alard: fixed. Thanks. |
20:58
🔗
|
SketchCow |
alard: When could we make the memac search public? |
21:00
🔗
|
alard |
Hadn't you already done that? |
21:01
🔗
|
alard |
I think it won't get more complete than it is now. The .zip download links work. It's a pity the .warc.gz download links don't work, but I think that's an issue with the archive.org tarviewer. |
21:11
🔗
|
SketchCow |
Well, I'm about to give it to a press person |
21:11
🔗
|
SketchCow |
So if it can be set up as ready to go for press, let's do it. |
21:13
🔗
|
chronomex |
the fixed-width font will scare muggles |
21:14
🔗
|
chronomex |
I'm all for it |
21:17
🔗
|
SketchCow |
WHY MUST YOU SELL FEAR |
21:17
🔗
|
SketchCow |
+1 for "muggles" |
21:18
🔗
|
SketchCow |
Always amazed how that one goes by |
21:18
🔗
|
SketchCow |
Treats them like cattle |
21:18
🔗
|
SketchCow |
Also liked how one book basically had magic dude show up in prime minister's office going "major shit going down lol brb" |
21:18
🔗
|
chronomex |
heh |
21:19
🔗
|
alard |
Yeah, so, well, the search page is here: http://archive.org/download/archiveteam-mobileme-index/mobileme-20120817.html |
21:20
🔗
|
alard |
It may or may not need a lot of text and explanations. |
21:20
🔗
|
alard |
Why am I not here? Why am I here? How did you hack my account? |
21:20
🔗
|
chronomex |
hm |
21:21
🔗
|
chronomex |
how the fuzz does this work anyway |
21:21
🔗
|
alard |
"Email complaints@archiveteam.org to get your things removed." |
21:21
🔗
|
chronomex |
ahhh |
21:21
🔗
|
alard |
It's just a 400MB JSON file sorted alphabetically. |
21:21
🔗
|
chronomex |
that's tricky |
21:22
🔗
|
chronomex |
not worthwhile to split it up? |
21:22
🔗
|
soultcer |
So searching requires me to download a 400 MB file? |
21:22
🔗
|
alard |
No, you just download small bits of it. |
21:22
🔗
|
chronomex |
ah, cool |
21:22
🔗
|
soultcer |
Magic |
21:22
🔗
|
chronomex |
it does some sort of binary windowing thing? |
21:23
🔗
|
alard |
https://ia600403.us.archive.org/30/items/archiveteam-mobileme-index/ |
21:23
🔗
|
alard |
There's an index to the large json file, with the locations of where items start. |
21:23
🔗
|
chronomex |
hot diggity damn |
21:24
🔗
|
alard |
Because it's sorted, you know that the item X should be in bytes n-m. |
21:24
🔗
|
alard |
(If that's abstract enough.) |
21:24
🔗
|
chronomex |
hangs infinitely in opera |
21:25
🔗
|
alard |
Does it. |
21:25
🔗
|
chronomex |
yurp |
21:25
🔗
|
alard |
Any idea why? |
21:25
🔗
|
* |
chronomex shrugs |
21:25
🔗
|
chronomex |
opera's weird |
21:25
🔗
|
alard |
I tried it in Firefox and Chrome. |
21:25
🔗
|
chronomex |
yeah, works fine in chromei |
21:26
🔗
|
alard |
It's a bit tricky, so you need a modern browser. But it doesn't need a database. |
21:26
🔗
|
chronomex |
it's spiffy |
21:26
🔗
|
chronomex |
I like it |
21:27
🔗
|
chronomex |
this is the future |
21:28
🔗
|
alard |
It's the past. It's just a horribly slow search engine that can only search on one key. |
21:28
🔗
|
alard |
It's fast enough to be usable, though. |
21:28
🔗
|
chronomex |
yeah |
21:29
🔗
|
chronomex |
https://ia600403.us.archive.org/30/items/archiveteam-mobileme-index/mobileme-20120817.html#chronomex hah, I suppose I put my own name through the script at some point |
21:31
🔗
|
alard |
We're flooding the channel. :) |
21:33
🔗
|
ersi |
take it to #internetarchive, you! |
21:33
🔗
|
ersi |
or #nowwhat :D or.. -bs |
21:34
🔗
|
ersi |
endless possibilities |
21:34
🔗
|
alard |
We should have a hash function where you can enter a topic and it'll tell you to go to #archiveteam-${hash} |
21:35
🔗
|
alard |
Let's go to #nowwhat |
21:35
🔗
|
ersi |
or just a stab at random |
21:37
🔗
|
alard |
We'll just change channels after every second message. That's what real hackers do, I've heard. |
21:39
🔗
|
closure |
7 layers of channels |
21:57
🔗
|
alard |
Installed Opera, found the problem: Opera is stupid, it doesn't do Range: headers in XmlHttpRequest, so it starts downloading the full 400MB. |
21:58
🔗
|
alard |
(It also opens connections to ebay, booking.com and other sites, without my asking so.) |
21:59
🔗
|
alard |
SketchCow: Anything else you need to make the search thing ready to go to press? |
21:59
🔗
|
SketchCow |
http://archive.org/download/archiveteam-mobileme-index/mobileme-20120817.html is what we go with, right? |
21:59
🔗
|
alard |
Yes. It's possible to put it in an iframe somewhere on archiveteam.org, if that's better. |
22:02
🔗
|
dashcloud |
want.archive.org sounds great- how do you get books to IA? (is there going to be a blog post somewhere on this? or is it not public-ready yet?) |
22:02
🔗
|
SketchCow |
Not public ready |
22:02
🔗
|
SketchCow |
But you basically mail them books. I send mine in crates, media mail. |
22:02
🔗
|
SketchCow |
200 went out today |
22:03
🔗
|
SketchCow |
archive.org wants to take it under consideration before it becomes an official API |
22:03
🔗
|
chronomex |
I'd love to unload some books, I have way too many for a single man in a city :( |
22:04
🔗
|
chronomex |
I'll do an inventory eventually |
22:05
🔗
|
soultcer |
chronomex: Check out bookmooch.com, it allows you to trade books by mail |
22:05
🔗
|
chronomex |
meh, that sounds like a lot of work |
22:06
🔗
|
chronomex |
also I have *too many* books |
22:06
🔗
|
chronomex |
I should scan the rare ones. |
22:06
🔗
|
dashcloud |
I do as well- I've had to switch to ebooks because I don't really have more room for physical copies |
22:07
🔗
|
chronomex |
the space under my bed is about 80% books. |
22:07
🔗
|
dashcloud |
every shelf is full of books, and nearly the entire wall is lined with piles of books |
22:09
🔗
|
dashcloud |
I'd love to do the book scanning thing, but it takes a more disciplined and dedicated person than me to do that- I'd get distracted by reading parts of the pages as I flipped by, and it's a lot more tedious flipping pages and taking pictures than reading the book |
22:09
🔗
|
DFJustin |
haha I'm not the only one |
22:11
🔗
|
chronomex |
I've only scanned one in toto, which is probably the most valuable book I own - http://archive.org/details/TheElectronicSwitchingSystem |
22:11
🔗
|
dashcloud |
that instructable on the cardboard box bookscanner makes the whole thing look easy, but apart from the aforemention issues, there's the post processing of each page- which is SO much easier if your pictures are uniform in each respect |
22:12
🔗
|
SketchCow |
This is why we're working on want.archive.org |
22:12
🔗
|
SketchCow |
Send them to archive.org, they get scanned in |
22:12
🔗
|
DFJustin |
I used to scan books on a flatbed for distributed proofreaders, you kids and your diy things |
22:14
🔗
|
chronomex |
DFJustin: gutenberg? |
22:14
🔗
|
DFJustin |
yeah |
22:14
🔗
|
DFJustin |
unfortunately the raw scans all ate it in an hdd crash, unless dp still has them |
22:15
🔗
|
chronomex |
:( ): :( |
22:15
🔗
|
DFJustin |
the pg guys made some wicked ebook editions though http://www.gutenberg.org/files/16410/16410-h/16410-h.htm |
22:15
🔗
|
dashcloud |
that's a great idea, except making space is only half the reason I'm scanning a book- the other is to have an ebook version of it (which I'm pretty sure I can't get from archive.org- books are too new) |
22:15
🔗
|
chronomex |
DFJustin: oh that's sexy |
22:16
🔗
|
chronomex |
I got 2/3 of the way through TeXifying that book too - http://gir.seattlewireless.net/~chronomex/bellsystem/morris/Morris.html |
22:17
🔗
|
dashcloud |
if you tell me I can get an electronic copy of every book I mail into IA, I'd crate a large part of books and send them very quickly |
22:17
🔗
|
chronomex |
yeah. |
22:17
🔗
|
DFJustin |
it's not legal since they want to lend out the electronic copy |
22:18
🔗
|
chronomex |
yeah :S |
22:19
🔗
|
dashcloud |
the other scanning project you proposed to archive.org sounds great as well- the historical computer document one |
22:47
🔗
|
SketchCow |
I made the formal proposal to archive.org about that |
23:26
🔗
|
Coderjoe |
I still would like a DIY bookscanner :D |
23:45
🔗
|
DFJustin |
wasn't SketchCow supposed to get one of those like 6+ months ago and CHANGE COMPUTER HISTORY |
23:45
🔗
|
SketchCow |
Yes |
23:45
🔗
|
SketchCow |
I've been needling the guy - little response |
23:46
🔗
|
SketchCow |
I've got a few "Getting that right to you (six months ago)" so I'm not going to get too het up |