#archiveteam 2012-08-28,Tue

↑back Search

Time Nickname Message
01:39 🔗 swebb Oops. I just realized that my irc logger has been off for the last 3 days. Doah. http://badcheese.com/~steve/atlogs
01:42 🔗 tef_ godane: I should have something in ~15 minutes
01:52 🔗 tef_ godane: I think i'm done
01:53 🔗 godane :-D
01:53 🔗 godane code please?
01:55 🔗 tef_ http://code.hanzoarchives.com/warc-tools/src/2a7976f9e7d7/warclinks.py
01:55 🔗 tef_ should handle all sorts of links in warcs (only html though...)
01:55 🔗 tef_ handles relative urls too
01:55 🔗 tef_ I happened to have a html link extractor using the py stdlib kicking around
01:55 🔗 tef_ and it helps I wrote a warc library :-)
01:56 🔗 tef_ should be able to do hg clone ... (or grab a tarball)
01:56 🔗 tef_ export PYTHONPATH=`pwd`
01:56 🔗 tef_ python warclinks.py warc-files....
01:56 🔗 tef_ handles gzipped, non gzipped files
01:57 🔗 tef_ if you have +6 month old warc files when the wget-warc produced weird files, I can put in a fix in for that, but warc2warc --wget-chunk-fix should sort it
01:57 🔗 tef_ it doesn't keep a set of links
01:57 🔗 tef_ it could product a list of urls found in the links, that aren't in the warc
01:57 🔗 tef_ but you can do warcdump ... | grep WARC-Target | cut ...
01:58 🔗 godane found a error
01:58 🔗 tef_ any questions? I've only tested it a little
01:58 🔗 tef_ ah balls
01:58 🔗 tef_ can you pastebin ?
01:59 🔗 godane http://pastebin.com/NfbFUy2Q
01:59 🔗 tef_ hrm, it shouldn't be raising that
01:59 🔗 tef_ oh i'm a muppet.
01:59 🔗 tef_ hrm, you've got some lovely html there :-)
02:00 🔗 godane i know
02:00 🔗 godane this is the first time that grep -ohP doesn't work to grab/filter all urls
02:01 🔗 godane i'm trying to use that to grab all images from sites like techcrunch and such
02:01 🔗 tef_ I pushed a fix to skip them properly
02:01 🔗 tef_ but I should replace it with something more reliable than python's built in parser
02:01 🔗 tef_ maybe I should use beautiful soup or lxml
02:02 🔗 tef_ but it will get you *some* of the urls, maybe, I hope, :-)
02:05 🔗 tef_ ugh
02:05 🔗 tef_ I am an idiot
02:05 🔗 tef_ anyway, I'm gonna try and put beautiful soup in
02:05 🔗 tef_ should handle everything
02:06 🔗 godane ok
02:06 🔗 tef_ rather than committing typos :3
02:29 🔗 tef_ godane: pushed
02:29 🔗 tef_ should use lxml
02:29 🔗 tef_ well almost pushed
02:29 🔗 tef_ pushed *now*
02:30 🔗 tef_ godane: ping
02:30 🔗 godane hey
02:30 🔗 godane i got it
02:31 🔗 godane there looks be warnings of parse error
02:31 🔗 tef_ hrm
02:32 🔗 tef_ you may need into install lxml, via python-lxml (apt) or easy_install lxml
02:32 🔗 godane you didn't fix my problem
02:32 🔗 godane the lines still break
02:33 🔗 godane but this does look better and has more stuff in it now
02:33 🔗 tef_ just fixing a bug
02:33 🔗 tef_ well, maybe a bug
02:34 🔗 tef_ godane: how are you running it, I get a whole slew of urls from the examples I try
02:36 🔗 underscor tef_: are you the tef that recently visited #hackerfurs?
02:36 🔗 godane python warclinks groklaw.net-articles-2006.warc.gz > log
02:36 🔗 godane *warclinks.py
02:37 🔗 tef_ underscor: yeah, I got dragged in by mithaldu
02:37 🔗 tef_ I heard some furries were trash talking my code :-)
02:37 🔗 tef_ I assume you're the same underscor there
02:38 🔗 tef_ what's the lines still break thing ?
02:38 🔗 tef_ hrm
02:38 🔗 godane like i said
02:39 🔗 tef_ i'm slow :3
02:39 🔗 godane this warc.gz is special
02:39 🔗 tef_ oh so special
02:39 🔗 tef_ i'd ask for a copy but I assume It's huge
02:39 🔗 godane no just ~15mb
02:41 🔗 underscor tef_: haha, yeah
02:41 🔗 tef_ small world, innit
02:41 🔗 tef_ I backed out cos well, I had a clearout of irssi windows
02:42 🔗 underscor Aye
02:46 🔗 godane tef_: you can download it here: http://archive.org/details/groklaw.net-articles-2006-20120827-mirror
02:47 🔗 tef_ godane: fetching now
02:50 🔗 tef_ oh *wow*
02:51 🔗 godane you see what i mean now
02:51 🔗 godane even doing a tr -d '\n' does nothing to it
02:52 🔗 tef_ yeah
02:52 🔗 tef_ that is rather amazing
02:54 🔗 tef_ pushed a fix :3
02:54 🔗 tef_ godane: try now
02:54 🔗 tef_ I can also try stripping fragments too, but I think sed can fix that
02:55 🔗 godane lots of errors now
02:55 🔗 tef_ hrm ? I get a bunch of links out
02:56 🔗 tef_ did python warclinks.py ~/Downloads/groklaw.net-articles-2006.warc.gz |sort|uniq
02:56 🔗 tef_ and without newlines and such
02:56 🔗 tef_ try repulling incase something weird happened
02:57 🔗 godane file "warclinks.py", line 64, in extract_links_from_warcfh
02:58 🔗 godane there error i have is your fix
02:58 🔗 tef_ hrm
02:58 🔗 tef_ do you have a little bit more of that error ?
02:59 🔗 tef_ it parses on mine, what version of python are you using ?
02:59 🔗 godane yield link.translate(None, '\n\r\t')
02:59 🔗 godane i'm using python2
02:59 🔗 Coderjoe 2.6 or 2.7?
02:59 🔗 tef_ can you paste the entire traceback
02:59 🔗 godane 2.7.3
02:59 🔗 godane i can't right now
02:59 🔗 tef_ baws
02:59 🔗 godane i'm on firefox proxy
02:59 🔗 tef_ can you copy and pase the error message at least?
03:00 🔗 tef_ rather than just the line
03:00 🔗 tef_ which exception
03:00 🔗 tef_ as it works on my machine (tm)
03:02 🔗 tef_ http://secretvolcanobase.org/~tef/warc_links.txt.gz example output
03:03 🔗 godane http://pastebin.com/NnaN79q1
03:04 🔗 tef_ 2.7.3 weeerid
03:04 🔗 tef_ http://docs.python.org/library/stdtypes.html#str.translate
03:04 🔗 tef_ cos it says two arguments here
03:05 🔗 tef_ anyway, the txt.gz file has the links you want, I hope
03:05 🔗 tef_ hrm
03:05 🔗 tef_ aaaaha
03:05 🔗 tef_ for some reason on your machine it is sending in unicode
03:08 🔗 tef_ godane: pull or try the output provided
03:10 🔗 godane thank you
03:11 🔗 tef_ fixed?
03:11 🔗 godane yes
03:11 🔗 tef_ \o/
03:11 🔗 godane i think
03:12 🔗 tef_ well that took longer than 15 minutes :3
03:12 🔗 tef_ what an awful warc file
04:00 🔗 godane looks like that warc had 700+mb of pdfs, mp3, ogg, and images from groklaw.net
04:11 🔗 godane there is a error again
04:12 🔗 godane tef_: ping ^
07:48 🔗 alard Similarly, it might be useful to disable proxy_buffering if it's enabled. That can also be done from the script with an extra HTTP header in the response, if that's easier.
07:48 🔗 alard underscor: Thanks for the warctozip update. Although the new POST things don't really work: your Nginx config apparently has a very low client_max_body_size. Perhaps you can increase that a bit? (It would be even nicer if it didn't buffer the request at all, but that seems to be impossible with Nginx.)
09:22 🔗 Schbirid thanks for the Aktuelles Software Magazine collection!
09:36 🔗 Schbirid does someone have/know a tool to completely download a reddit thread? the increments when you click "more" get tiny, so it is quite annoying to do by hand
09:37 🔗 ersi it's called a scripting language, and it's a very sharp tool
09:37 🔗 ersi ^_^
09:38 🔗 ersi Wonder how they do the comment collapsing, should take a look at that sometime
09:39 🔗 Schbirid same would be handy for facebook, those threads are nearly impossible to get with a browser since they cant keep up rendering thousands of comments
09:40 🔗 alard Wget+Lua!
09:40 🔗 * Schbirid runs away
09:41 🔗 ersi Ooh, should take a looksie at wget+lua sometime as well
10:49 🔗 tef_ godane: ?
13:16 🔗 godane tef_: hey
13:16 🔗 godane i'm back
13:16 🔗 godane it looks like some keys have problems with unicode
13:16 🔗 godane like 0x94
13:16 🔗 godane and 0x31
13:17 🔗 tef_ hrm
13:45 🔗 SketchCow I just asked archive.org a question about scanning.
13:45 🔗 SketchCow Can we have a volunteer corps of people in the SF Bay area who come in and operate a bookscanner assigned to our group, who then scan computer historical documents.
13:46 🔗 SketchCow If they say yes, I'll start harassing people about joining up.
13:51 🔗 tef_ godane: put in a better fix, maybe
14:57 🔗 underscor http://want.archive.org/
14:57 🔗 underscor alard: that will go through the load balancer instead of running on my dev box, if you want to update the demo app
15:39 🔗 SketchCow underscor: Please add a line under "currently only for books/things with ISBNs"
15:39 🔗 SketchCow Experimental: Do not use as a sign-off for large donations of books. Please contact info@archive.org.
15:39 🔗 SketchCow Remove secret mode line
15:50 🔗 godane i got over 8gb of groklaw.org
15:50 🔗 godane :-D
15:51 🔗 godane i do have split some the warc.gz cause downloads stop sometimes
15:52 🔗 godane it maybe closer to 4gb cause i have the mirror .tar.gz and .warc.gz
15:56 🔗 alard underscor: My want-it demo app is asleep, I don't know if I will wake it up again. (I ran the human.io app on my home computer.)
15:56 🔗 alard Also, the want-it api is also visible on http://warctozip.archive.org/ ?
16:15 🔗 tef_ godane: did the most recent fix, well, uh fix
16:15 🔗 godane i don't know
16:15 🔗 tef_ heh
16:16 🔗 godane i see the error again with my groklaw.net 2011 dump
16:16 🔗 tef_ godane: yeah I'm not sure why your lxml is returning unicode
16:17 🔗 godane i think its mostly cause groklaw is special
16:17 🔗 godane i also get some bad urls like this: http://www.groklaw.net/htt[://www.groklaw.net/pdf3/LodsysvCombay-26.pdf
16:18 🔗 godane luckly all bad urls on the top of the list
16:18 🔗 tef_ heh
16:18 🔗 tef_ yeah I can't fix their broken links
16:19 🔗 godane the thing is i checked for that file
16:19 🔗 tef_ pushing a better check for unicode for what it is worth
16:19 🔗 tef_ either way I hope you've got more stuff than you would have had without it
16:19 🔗 tef_ despite it being buggy and crap :-)
16:20 🔗 godane it has that same broke line problem from what i can tell
16:22 🔗 tef_ baws
16:23 🔗 tef_ I'm not going to have a lot of time, if any to keep playing hunt the bug when I'm struggling to recreate some of the weirder errors
16:23 🔗 tef_ sorry :/
16:23 🔗 godane thats ok
16:24 🔗 godane it filters out the bad urls better then before
16:24 🔗 godane and i think it does fix most of the bad urls
16:27 🔗 tef_ yay :D
16:27 🔗 tef_ you might find google refine will be good for cleaning up large data sets like this
16:39 🔗 DFJustin <Zuu_> I have a website that I would like to be archived, how would I do so?
16:39 🔗 DFJustin <Zuu_> it's going down saturday sometime, i'll just leave this here: http://www.therevoltpress.org/
16:39 🔗 DFJustin did anyone do this
16:39 🔗 DFJustin godane was disconnected at the time
16:41 🔗 Patt it looks like the website is still up
16:47 🔗 godane i will try to grab it soon
16:48 🔗 godane my groklaw.net grab is very special so i don't want it to stop downloading
16:48 🔗 Patt godane, let me know when/where you download it when your done please
16:50 🔗 godane good news is it doesn't look like it was updated since last year
16:53 🔗 godane but there boards have been busy
16:53 🔗 Patt yea, it will be until it closes
16:53 🔗 Patt no ETA though
16:53 🔗 SketchCow want.archive.org is apparently going to shift names, so don't get comfy with it. :)
17:12 🔗 godane i have to login with a user name and password
17:13 🔗 godane how do you do that with wget?
17:14 🔗 alard godane: HTTP basic authentication? wget --help | grep user
17:15 🔗 Patt godane, you can login with anonymous / anonymous
17:15 🔗 Patt btw
17:24 🔗 godane i'm get this for cookie:
17:24 🔗 godane therevoltpress.org FALSE / FALSE 1377710618 bblastactivity 0
17:24 🔗 godane therevoltpress.org FALSE / FALSE 1377710618 bblastvisit 1346174618
17:24 🔗 godane its not working
17:24 🔗 godane stupid me
17:24 🔗 godane wrong url
17:25 🔗 godane still doesn't work
17:25 🔗 godane therevoltpress.org FALSE / FALSE 1377710728 bblastactivity 0
17:25 🔗 godane therevoltpress.org FALSE / FALSE 1377710728 bblastvisit 1346174728
17:28 🔗 godane i don't think i can mirror it
17:35 🔗 godane what am i doing wrong here:
17:35 🔗 godane ://therevoltpress.org/boards/" --keep-session-cookies --load-cookies=cookies1.tx
17:35 🔗 godane cdx
17:35 🔗 godane t --content-disposition --mirror --warc-file=therevoltpress.org-20120828 --warc-
17:35 🔗 godane wget "http
17:43 🔗 godane can anyone help me?
17:43 🔗 godane its driving me nuts
17:44 🔗 godane cause i have no idea on how to add cookies to wget the right way
17:48 🔗 balrog_ godane: do you have a cookies.txt?
17:48 🔗 balrog_ and is it properly formatted?
17:48 🔗 godane yes
17:48 🔗 godane its just like the other ones
17:48 🔗 godane i'm using export cookies addon for firefox to get the cookie
17:49 🔗 godane i may not know where to point it to through
17:49 🔗 godane cause therevoltpress.org/boards/ is not working with wget
17:49 🔗 godane even therevoltpress.org/boards/login.php doesn't work
17:52 🔗 alard -U "Somethingelse." ?
17:52 🔗 alard They may be blocking wget.
17:53 🔗 godane that didn't work
17:56 🔗 godane there using vBulletin 3.8.0 if that helps
17:58 🔗 godane this maybe better for you guys to do it
17:58 🔗 godane i can't do much here
17:58 🔗 godane and even if i could get all of it i maybe more then 10gb
17:59 🔗 godane and i don't think i can get the uploaded on my internet speed
18:08 🔗 balrog_ godane: that's worked for me...
18:08 🔗 balrog_ are you faking the UA?
18:08 🔗 balrog_ I had to for one project
18:15 🔗 godane yes
18:15 🔗 godane show me your code please?
18:15 🔗 godane and send me your cookie
18:15 🔗 godane i getting false / false with my cookies for some reasone
18:19 🔗 godane balrog_: can please sead me the code?
18:19 🔗 godane i'm dieing here
18:30 🔗 godane wget "http
18:30 🔗 godane ://therevoltpress.org/boards/login.php?do=login" --mirror --warc-file=therevoltp
18:30 🔗 godane ress.org-20120828 --warc-cdx -U "ArchiveTeam" --load-cookies=cookies1.txt
18:30 🔗 godane thats my code
18:30 🔗 godane you show my yours?
18:30 🔗 godane *me
18:31 🔗 godane or at least tell me the url your using
18:34 🔗 godane balrog_: where the hell are you?
18:35 🔗 balrog_ busy, stuck at work
18:35 🔗 godane can you please help me?
18:35 🔗 godane i don't know why this site will not download
18:35 🔗 godane and i don't know how the hell to save the cookies through wget anymore
18:36 🔗 balrog_ what's in cookies1.txt before you start?
18:36 🔗 godane therevoltpress.org FALSE / FALSE 1377696171 bblastvisit 1346174589
18:36 🔗 godane therevoltpress.org FALSE / FALSE 1377696451 bblastactivity 0
18:36 🔗 godane www.therevoltpress.org FALSE / FALSE 0 __utmc 1
18:36 🔗 godane www.therevoltpress.org FALSE / FALSE 1346159957 __utmb 1.2.10.1346158150
18:36 🔗 godane www.therevoltpress.org FALSE / FALSE 1361926157 __utmz 1.1346158150.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none)
18:36 🔗 godane www.therevoltpress.org FALSE / FALSE 1409230157 __utma 1.882311859.1346158150.1346158150.1346158150.1
18:36 🔗 godane therevoltpress.org FALSE / FALSE 0 bbsessionhash a11c86836d5471bdda445db209cb2e5a
18:36 🔗 godane thats all my therevoltpress.org cookies
18:37 🔗 godane i have no idea why there not working
18:39 🔗 godane Patt: any ideas on how to mirror therevoltpress.org
18:39 🔗 godane Patt: remember you asked for me by name
18:40 🔗 underscor Alard: fixed. Thanks.
20:58 🔗 SketchCow alard: When could we make the memac search public?
21:00 🔗 alard Hadn't you already done that?
21:01 🔗 alard I think it won't get more complete than it is now. The .zip download links work. It's a pity the .warc.gz download links don't work, but I think that's an issue with the archive.org tarviewer.
21:11 🔗 SketchCow Well, I'm about to give it to a press person
21:11 🔗 SketchCow So if it can be set up as ready to go for press, let's do it.
21:13 🔗 chronomex the fixed-width font will scare muggles
21:14 🔗 chronomex I'm all for it
21:17 🔗 SketchCow WHY MUST YOU SELL FEAR
21:17 🔗 SketchCow +1 for "muggles"
21:18 🔗 SketchCow Always amazed how that one goes by
21:18 🔗 SketchCow Treats them like cattle
21:18 🔗 SketchCow Also liked how one book basically had magic dude show up in prime minister's office going "major shit going down lol brb"
21:18 🔗 chronomex heh
21:19 🔗 alard Yeah, so, well, the search page is here: http://archive.org/download/archiveteam-mobileme-index/mobileme-20120817.html
21:20 🔗 alard It may or may not need a lot of text and explanations.
21:20 🔗 alard Why am I not here? Why am I here? How did you hack my account?
21:20 🔗 chronomex hm
21:21 🔗 chronomex how the fuzz does this work anyway
21:21 🔗 alard "Email complaints@archiveteam.org to get your things removed."
21:21 🔗 chronomex ahhh
21:21 🔗 alard It's just a 400MB JSON file sorted alphabetically.
21:21 🔗 chronomex that's tricky
21:22 🔗 chronomex not worthwhile to split it up?
21:22 🔗 soultcer So searching requires me to download a 400 MB file?
21:22 🔗 alard No, you just download small bits of it.
21:22 🔗 chronomex ah, cool
21:22 🔗 soultcer Magic
21:22 🔗 chronomex it does some sort of binary windowing thing?
21:23 🔗 alard https://ia600403.us.archive.org/30/items/archiveteam-mobileme-index/
21:23 🔗 alard There's an index to the large json file, with the locations of where items start.
21:23 🔗 chronomex hot diggity damn
21:24 🔗 alard Because it's sorted, you know that the item X should be in bytes n-m.
21:24 🔗 alard (If that's abstract enough.)
21:24 🔗 chronomex hangs infinitely in opera
21:25 🔗 alard Does it.
21:25 🔗 chronomex yurp
21:25 🔗 alard Any idea why?
21:25 🔗 * chronomex shrugs
21:25 🔗 chronomex opera's weird
21:25 🔗 alard I tried it in Firefox and Chrome.
21:25 🔗 chronomex yeah, works fine in chromei
21:26 🔗 alard It's a bit tricky, so you need a modern browser. But it doesn't need a database.
21:26 🔗 chronomex it's spiffy
21:26 🔗 chronomex I like it
21:27 🔗 chronomex this is the future
21:28 🔗 alard It's the past. It's just a horribly slow search engine that can only search on one key.
21:28 🔗 alard It's fast enough to be usable, though.
21:28 🔗 chronomex yeah
21:29 🔗 chronomex https://ia600403.us.archive.org/30/items/archiveteam-mobileme-index/mobileme-20120817.html#chronomex hah, I suppose I put my own name through the script at some point
21:31 🔗 alard We're flooding the channel. :)
21:33 🔗 ersi take it to #internetarchive, you!
21:33 🔗 ersi or #nowwhat :D or.. -bs
21:34 🔗 ersi endless possibilities
21:34 🔗 alard We should have a hash function where you can enter a topic and it'll tell you to go to #archiveteam-${hash}
21:35 🔗 alard Let's go to #nowwhat
21:35 🔗 ersi or just a stab at random
21:37 🔗 alard We'll just change channels after every second message. That's what real hackers do, I've heard.
21:39 🔗 closure 7 layers of channels
21:57 🔗 alard Installed Opera, found the problem: Opera is stupid, it doesn't do Range: headers in XmlHttpRequest, so it starts downloading the full 400MB.
21:58 🔗 alard (It also opens connections to ebay, booking.com and other sites, without my asking so.)
21:59 🔗 alard SketchCow: Anything else you need to make the search thing ready to go to press?
21:59 🔗 SketchCow http://archive.org/download/archiveteam-mobileme-index/mobileme-20120817.html is what we go with, right?
21:59 🔗 alard Yes. It's possible to put it in an iframe somewhere on archiveteam.org, if that's better.
22:02 🔗 dashcloud want.archive.org sounds great- how do you get books to IA? (is there going to be a blog post somewhere on this? or is it not public-ready yet?)
22:02 🔗 SketchCow Not public ready
22:02 🔗 SketchCow But you basically mail them books. I send mine in crates, media mail.
22:02 🔗 SketchCow 200 went out today
22:03 🔗 SketchCow archive.org wants to take it under consideration before it becomes an official API
22:03 🔗 chronomex I'd love to unload some books, I have way too many for a single man in a city :(
22:04 🔗 chronomex I'll do an inventory eventually
22:05 🔗 soultcer chronomex: Check out bookmooch.com, it allows you to trade books by mail
22:05 🔗 chronomex meh, that sounds like a lot of work
22:06 🔗 chronomex also I have *too many* books
22:06 🔗 chronomex I should scan the rare ones.
22:06 🔗 dashcloud I do as well- I've had to switch to ebooks because I don't really have more room for physical copies
22:07 🔗 chronomex the space under my bed is about 80% books.
22:07 🔗 dashcloud every shelf is full of books, and nearly the entire wall is lined with piles of books
22:09 🔗 dashcloud I'd love to do the book scanning thing, but it takes a more disciplined and dedicated person than me to do that- I'd get distracted by reading parts of the pages as I flipped by, and it's a lot more tedious flipping pages and taking pictures than reading the book
22:09 🔗 DFJustin haha I'm not the only one
22:11 🔗 chronomex I've only scanned one in toto, which is probably the most valuable book I own - http://archive.org/details/TheElectronicSwitchingSystem
22:11 🔗 dashcloud that instructable on the cardboard box bookscanner makes the whole thing look easy, but apart from the aforemention issues, there's the post processing of each page- which is SO much easier if your pictures are uniform in each respect
22:12 🔗 SketchCow This is why we're working on want.archive.org
22:12 🔗 SketchCow Send them to archive.org, they get scanned in
22:12 🔗 DFJustin I used to scan books on a flatbed for distributed proofreaders, you kids and your diy things
22:14 🔗 chronomex DFJustin: gutenberg?
22:14 🔗 DFJustin yeah
22:14 🔗 DFJustin unfortunately the raw scans all ate it in an hdd crash, unless dp still has them
22:15 🔗 chronomex :( ): :(
22:15 🔗 DFJustin the pg guys made some wicked ebook editions though http://www.gutenberg.org/files/16410/16410-h/16410-h.htm
22:15 🔗 dashcloud that's a great idea, except making space is only half the reason I'm scanning a book- the other is to have an ebook version of it (which I'm pretty sure I can't get from archive.org- books are too new)
22:15 🔗 chronomex DFJustin: oh that's sexy
22:16 🔗 chronomex I got 2/3 of the way through TeXifying that book too - http://gir.seattlewireless.net/~chronomex/bellsystem/morris/Morris.html
22:17 🔗 dashcloud if you tell me I can get an electronic copy of every book I mail into IA, I'd crate a large part of books and send them very quickly
22:17 🔗 chronomex yeah.
22:17 🔗 DFJustin it's not legal since they want to lend out the electronic copy
22:18 🔗 chronomex yeah :S
22:19 🔗 dashcloud the other scanning project you proposed to archive.org sounds great as well- the historical computer document one
22:47 🔗 SketchCow I made the formal proposal to archive.org about that
23:26 🔗 Coderjoe I still would like a DIY bookscanner :D
23:45 🔗 DFJustin wasn't SketchCow supposed to get one of those like 6+ months ago and CHANGE COMPUTER HISTORY
23:45 🔗 SketchCow Yes
23:45 🔗 SketchCow I've been needling the guy - little response
23:46 🔗 SketchCow I've got a few "Getting that right to you (six months ago)" so I'm not going to get too het up

irclogger-viewer