[00:10] Yeah [00:10] Sorry, Nemo. I fell the fuck ASLEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEP [00:10] I mean like MAN DOWN [00:12] hey it happens, I've fallen asleep in front of my computer numerous times... [00:13] have you ever fallen asleep on the keyboard (or is that not something that ever happens?) [00:14] I have a laptop so that's tricky [00:15] AT the keyboard, yes :P [00:28] It would be a Plustek book scanner. [00:28] I'm just seeing which Plustek is the best one, there's no obvious way [00:30] I thought you hated their scanners with a passion? [00:30] I hated them in 2008 [00:30] (Good callback, though) [00:31] That's why I'm researching what people say about newer models. [00:31] well, ok :P [00:32] If they still suck, great [00:32] there are scanners, and there are scanners. :/ [00:34] http://www.newegg.com/Product/ProductList.aspx?Submit=ENE&Depa=0&Order=BESTMATCH&Description=plustek%20opticbook&IsRelated=1&cm_sp=KeywordRelated-_-Plustek-_-plustek%20opticbook [00:34] People help me figure out the difference. [00:34] Anyway, godane, I'm buying you a scanner. [00:35] If you're going to acquire magazines and books and scan them, it should be something nice. [00:35] yes. [00:35] Your wand will fucking explode your hand [00:35] And the scans won't be good. [00:36] and everyone will be sad [00:36] Yeah. [00:37] if you do something, don't do it mostly-right if you can do it right - because your mostly-right version will be trotted out to shame the person who goes back and does it right [00:37] that's my view [00:38] so one those $800 scanners [00:39] Well, either the $800 or the $1800. [00:39] The $300 one, I've dealt with. It's meh. [00:39] I'm looking for solid reviews of the other two. [00:39] my experience with manually-moved scanners is the results are crap [00:40] http://www.pcmag.com/article2/0,2817,2392769,00.asp [00:40] that is, the kind where you move a wand or a mouse-like device across the page [00:43] the kind Coderjoe talks about suck because you can't consistently control them [00:44] No, you don't want it, that's off the table. [00:44] So Plustek OpticBook 4800 is the way to go, unless I can find what the A300 has in itself that makes it worth more than twice it's worth. [00:45] what's the size or the 4800? the a300 is a3 (11x17) [00:46] yeah the a300 is 11x17 but lower dpi [00:46] so there you go [00:47] So it costs twice as much but has lower dpi? [00:47] That makes it easy, huh [00:47] costs twice as much for a larger surface BUT has a lower dpi [00:47] That's terrible. [00:48] <3 my Epson 1640xl [00:48] though its not glass-to-edge [00:48] Yeah, if it was just about quality, we'd do an Epson Perfection or a Canon [00:48] But it's the glass to edge [00:49] yup [00:50] so you mean ths one: http://www.newegg.com/Product/Product.aspx?Item=N82E16838122054 [00:50] the a3 1640xl is nice though in that I can lay both pages of a book on the glass since its that large [00:50] http://www.newegg.com/Product/Product.aspx?Item=N82E16838122061 godane. [00:50] not quite as easy as glass to edge but still usable... [00:51] i buy computers for less then that [00:52] I'm sure you do. [00:52] Good thing you're not buying this. [00:53] Also, I am lending it to you [00:53] Because I know full well disability gets pissed if you're sent gifts or the like [00:53] ok [00:55] Fun Godane Facts: He has uploaded 4,383 items to archive.org, for a total of 2.141 terabytes [00:58] There, I just hit up the internet. [00:59] But e-mail jscott@archive.org with a mailing address. I'll get it immediately. [01:00] Let's see what bounty the internet drops [01:01] will they take a po box? [01:01] Is it, like, a Mailboxes Etc. ? [01:01] Sorry, UPS Store [01:02] It hasn't been a Mailbox Etc. for years [01:02] or is it an actual office post office? [01:02] my po box is at a post office [01:02] usually ups and FedEx don't deliver to those :( [01:06] just for this you getting another episode of the screen savers [01:07] the computer home club epsiode [01:07] Newegg does not ship to standard P.O. Boxes. However, Newegg can ship to residential P.O. Boxes in rural areas. [01:07] Either we'll have to find another address, or I can just drive it over to you. [01:08] i will give you my address and you can mail it to there [01:09] so, should donations just be marked as a Gift on Paypal's site, or is there something else that should be used? [01:10] Gift is good [01:10] We're already 10% there! [01:12] OK, off to get my blood pressure meds [01:12] Back soon! [01:12] And then, Nemo_bis - we'll get that anonymous edits problem fixed [01:18] i just emailed you [01:29] i'm getting this: http://www.youtube.com/watch?v=ROHJmP_TNR0 [01:30] http://www.microchip.com/stellent/idcplg?IdcService=SS_GET_PAGE&nodeId=1406&dDocName=en023073 [01:31] prove that it may go into the g4videos collection: http://www.g4tv.com/thefeed/blog/post/711273/g4-supports-gamers-heart-japan-special-to-raise-donations-for-japan/ [01:31] might be good to mirror that stuff since microchip blocks archiving the main site [01:32] yeah microchip uses a robots.txt whitelist [01:32] :/ [01:44] this thing requires manual intervention, but it uses twitter to retrieve pdfs: https://groups.google.com/group/science-liberation-front/t/414247ab61c1112b [02:06] kanzure: iiiiinteresting [02:42] Pasting something for amusement: [02:42] The Password is 77be2da9 [02:42] Password: [02:42] Welcome to ClassicCMP.org rsyncd [02:42] receiving incremental file list [02:42] pdf/borland/borland_C++/ [02:42] pdf/borland/borland_C++/Borland_C++_Version_4.0_Programmers_Guide_Oct93.pdf 13237599 100% 31.85kB/s 0:06:45 (xfer#1, to-check=1021/103449) [02:42] pdf/borland/borland_C++/Borland_C++_Version_4.0_Users_Guide_Oct93.pdf [02:42] 20778040 100% 36.92kB/s 0:09:09 (xfer#2, to-check=1020/103449) [02:42] pdf/borland/borland_C++/Borland_ObjectWindows_for_C++_Version_2.0_Programmers_Guide_Oct93.pdf 19152845 100% 38.40kB/s 0:08:07 (xfer#3, to-check=1012/103449) [02:42] pdf/borland/borland_C++/Borland_ObjectWindows_for_C++_Version_2.0_Reference_Guide_Sep93.pdf [02:42] 24558355 100% 38.57kB/s 0:10:21 (xfer#4, to-check=1011/103449) [02:42] pdf/borland/turbo_assembler/ [02:42] pdf/borland/turbo_assembler/Turbo_Assembler_Version_4.0_Quick_Reference_Mar94.pdf [02:42] 5680156 100% 35.44kB/s 0:02:36 (xfer#5, to-check=1011/103471) [02:42] pdf/borland/turbo_assembler/Turbo_Assembler_Version_4.0_Users_Guide_Nov93.pdf 13508562 100% 36.93kB/s 0:05:57 (xfer#6, to-check=1010/103471) [02:42] pdf/borland/turbo_assembler/Turbo_Debugger_Version_4.0_Users_Guide_Oct93.pdf 10700162 100% 39.07kB/s 0:04:27 (xfer#7, to-check=1005/103471) [02:42] sent 5196 bytes received 110843159 bytes 37671.49 bytes/sec [02:42] total size is 187829236613 speedup is 1694.47 [02:42] That's one day of scanning from bitsavers. [02:42] One day. [02:42] Those guys are crazy, there must be a pile of people doing this. [02:44] hey SketchCow [02:45] i'm hoping this scanner will not brake you [02:45] ha ha no. [03:30] OK, more has come in. [03:31] $500! [03:31] So yeah, I'm buying some cheap shit $200 scanner, with the internet discount [03:44] http://www.commondreams.org/further/2013/01/15-5 [04:16] ruhroh. all the yahooblogs are coming up user-not-found all of a sudden. [04:19] 999? [04:20] or did we hit the deadline? [04:20] this user worked for me yesterday for sure: http://blog.yahoo.com/_LQ63BATA6FOO6A5LZBHEOD47QY [04:23] hmm [04:23] the ones my warrior is working on come up fine [04:31] And now we're at $625. [04:31] Most inexpensive scanner evar [04:34] And I just fixed the DNA Lounge Uploads! [04:34] http://archive.org/details/dnalounge [04:34] I realized it was using the old credentials. So it was downloading, but then sitting there. [04:35] whoops... [04:35] so I set up a script to periodically download the contents of Asimov's incoming section [04:35] (Asimov being the Apple II FTP server) [04:36] someone uploaded a couple of zips marked IIgs OS source code there in late December but they didn't last very long [04:36] of course they might have been fake but I guess we won't know now [04:52] your script missed them? [04:58] no, I didn't have a script then [04:58] I set up the script in case interesting stuff shows up again [04:58] ah [07:01] http://archive.org/details/start-magazine [07:21] so i got the 360p version of Gamers Heart Japan [07:21] the 720p version i downloaded was incomplete [08:01] Nemo_bis: Changes you asked for in the permissions are now in effect - apologies for delay. [08:14] balrog_: I think it is time for me to rejoin discferret [09:14] aww discferret, lovely [09:34] Should we stop the yahooblogs? There's occasionally one that exists (perhaps a non-Vietnamese blog), but most seem to have disappeared. [09:42] need timestamps in warrior [09:42] * Smiley goes to record this on the github [09:43] errr where do suggestions go now, warrior-code? warrior-hq? [09:43] seesaw-kit [09:43] k [09:44] That's the Python code, the warrior-code and warrior-preseed etc. is just stuff to get the seesaw-kit to run. [12:27] https://www.mediawiki.org/wiki/Thread:Talk:MediaWiki_vendors/Let's_talk_about_Data_Dumps. [12:27] The thread you specified does not exist. [12:28] Probably you missed the period [12:28] Smiley: include the dot too [12:28] at the end [12:28] [12:29] https://www.mediawiki.org/?oldid=630008 [12:29] btw i should check on this XMl dump, it might help me allowing gopher tunneling to the MediaWiki engine [12:29] *XML [12:31] norbert79: do you mean https://www.mediawiki.org/wiki/Manual:DumpBackup.php ? [12:31] Or /using/ dumps? [12:32] Nemo_bis: Not sure right now, I am just trying to find a method what i could use creating plain text dumps of Mediawiki pages making the Wiki browsable through gopher too [12:32] Nemo_bis: my first idea was curl, but I think that's something ugly [12:32] so I am looking for a more proper method [12:33] basically i wish to recreate the Wiki as folders and TXT files for each [12:33] and images seperate [12:33] and regulary updating it using CRON [12:34] Ah, plain text. [12:35] Probably HTML dumps are more useful for that. [12:35] Not really [12:35] it would just lose the links [12:35] Did you try? [12:35] What's "it"? [12:35] i am still working out the method first [12:35] See when you create a page [12:35] Do you really want to do the parsing? [12:35] you don't refer to outside pages with Ã¤ href> but with mediawiki link [12:36] so my plan would be using plaint txt files or gophermap like structure [12:36] so links will still stay there [12:36] but instead referring back to a HTML page it would use the gopher based method [12:36] I don't understand why HTML should make the linking harder. [12:37] My suggestions: 1) http://www.kiwix.org/index.php/Template:ZIMdumps [12:37] Nemo_bis: Because gopher works different, than a HTML server [12:37] 2) http://lists.wikimedia.org/pipermail/wiktionary-l/ this is where questions about text conversions happen more frequestnly [12:38] thanks [12:38] norbert79: and doesn't gopher work differently from MediaWiki parser/wikisyntax? [12:38] Anyway, as soon as you try you'll immediately agree with me, no need to convince you. :) [12:38] it does, but I wish to dump the raw content through the API and recreate the pages as directories so i can use "gophermap" [12:39] i don't really wish to agree with you until you try to host a gopher on your own too :) [12:39] As I said, I don't need you to agree with me. [12:40] Sure, but you gave me some starting point though, thanks for that [12:40] I also hard this guy figured out a way to handle text decently: http://aarddict.org/ [12:40] *heard [12:40] And you probably don't want to use API. [12:40] Not sure what I wish to use for now [12:40] As said I am working on the concept first [12:41] but the goal is making a Mediwaiki based page being browsable through a gopher host too [12:42] Your concept is wrong if it needs API for mass-crawling, be warned. [12:42] Dynamically browsable? [12:42] Yes [12:42] sort-of [12:43] read only [12:43] https://www.mediawiki.org/wiki/Manual:Maxlag_parameter might be the most useful thing around [12:43] Nemo_bis: but I can assume when i say gopher you know what i am talking about, right? just to get things clear... [12:43] yes [12:44] Alright [12:44] generically [12:44] And would you host it or not? [12:44] Sure, and i already host it, I just wish to find a method making the Wiki being readable through the gopher:// too [12:45] Then you could do the parsing yourself or use a dump. [12:45] not really something I wish to use for now [12:45] i am somehow sure it could go somehow different [12:45] The easiest option is probably to use the txt/HTML dumps I linked [12:46] Nope [12:46] No-go [12:46] Why? [12:46] Would lose the interconnection between articles [12:46] Why should it? [12:46] You seem not getting it [12:46] Yes, I already asked [12:46] Gopher doesn't use links how HTML does [12:46] it uses flags [12:46] and for each flag an own method [12:47] now if I wish to link from a page to another using [[]] I can sure analyze that [12:47] so i can recreate the links within a "gophermap" too [12:47] in gopher each subdirectory can have a gophermap, but one gophermap can't be serving like a html [12:47] Yes but links *won't* be nicely ready for you to use in [[]] [12:48] Sure, but still I could refer back to a section of a text... if you do a html dump you go with addresses [12:48] You'd need to do some pre-processing if not the whole partsing [12:48] but I don't wish to re-create the whole thing by using browser like methods [12:48] section? [12:48] the whole method should go internal [12:48] by using one or more files or methods [12:48] but not creating html dumps [12:49] I didn't say create, I said use [12:49] still no-go [12:49] Anyway, I will figure it out, thanks [12:49] :) [12:49] And i suggest you start hosting a gopher on your own too, so you can see why it's better not using html dumps :P [12:51] Oh, no, I'll just trust the conclusions you'll come to. We'll see if they'll be the same. :) [12:58] norbert79: if you decide for the parsing, you'll probably need a few dozens of the following extensions: https://gerrit.wikimedia.org/r/gitweb?p=mediawiki/tools/release.git;a=blob;f=make-wmf-branch/default.conf#l7 [14:33] http://www.connotea.org/ [14:33] "Connotea will discontinue service on March 12, 2013. Please export your bookmarks before that date." [14:33] Not sure if there's anything public there. [14:41] There is: http://www.connotea.org/tag/LIS5433?start=10&num=100 [14:42] hmm [14:42] alard: I just noticed that a warrior I just set up is spamming out python exceptions to the console [14:42] Is there a reason the Archive Team's choice for the warroprs is set to URLTeam rather than Punchfork? [14:42] AttributeError: 'AsyncPopen' object has no attribute 'pipe' [14:42] *warriors [14:43] externalprocess.py line 57 in _wait_for_end [14:44] chazchaz: Yes, it's quite early (it's closing 2012-03-15) and it's not yet read-only, so we should probably wait. There are enough non-auto-warriors to do the few tasks we can already do. [14:45] Ahh, ok [14:50] db48x: That's code for 'cannot execute this executable file', if I remember correctly. [14:51] Is it a normal warrior? Which project? [14:51] not quite a normal warrior [14:51] punchfork [14:52] it's the warrior code running on an ubuntu vps [14:52] It might be your wget, or one of the other executables in there (export-punchfork.py, extract-users.sh). [14:53] Perhaps look in the warrior-install.sh script to see if you're missing something. [14:53] The Punchfork project has a few extra dependencies. [14:53] hmm [14:54] punchfork-6d817e3$ ./wget-lua [14:54] -bash: ./wget-lua: No such file or directory [14:55] get-wget-lua.sh is recompiling it [14:56] ah, error: --with-ssl was given, but GNUTLS is not available. [14:56] lemme fix that [14:57] lua5.2? [14:57] no, 5.1 [14:59] that fixed it [15:00] does it try to compile wget-lua at all when it sets up the project? [15:00] No. wget-lua is supposed to be in the git repository. [15:00] ah [15:01] the version I got from the repository wasn't compatible [15:01] I naively picked a 64-bit os [15:09] that's annoying [15:10] one of the jobs failed, but it didn't stick around long enough for me to see the message [15:12] export-punchfork.py, line 57 [15:12] 'instancemethod' object has no attribute '__getitem__' [15:18] hmm [15:18] I have a newer python than the real warriors [15:21] same version of the requests module though [15:22] https://github.com/kennethreitz/requests/commit/1451ba0c6d395c41f86da35036fa361c3a41bc90#L2L528 [15:23] I think it depends on when you installed the requests module. My warrior probably has an older version. [15:25] they both say version 1.1.0 [15:26] My warrior has 0.13.6. [15:26] um [15:27] downloaded this warrior vm image a month or so ago [15:27] It could be that I've installed the Debian package at some point. [15:28] Perhaps I should refresh my development warrior. [15:28] In any case: https://github.com/ArchiveTeam/punchfork-grab/commit/51bbb6dca6ac33e2454856c7d03c7f92db30389e [15:30] db48x: Did you run ./get-wget-lua.sh? [15:30] ersi: just now [15:41] alard: https://github.com/ArchiveTeam/punchfork-grab/pull/1 [15:41] Looks good? [15:43] Yes. It's even worse: they also need https://github.com/ArchiveTeam/punchfork-grab/blob/master/warrior-install.sh [15:44] Oh, heh [15:45] oh, that reminds me. warrior-install.sh doesn't run very well on fedora [15:45] Yeah, I imagine so - since it depends on dpkg [15:46] (It's an install script for the warrior.) [15:46] Yeah, and the Warrior is running Debian. So no suprise :) [15:46] yea, I noticed [15:47] but in the absense of documentation, I just picked something and tried it :) [15:49] I'm also commenting out some junk, like the framebuffer stuff [15:59] * db48x sighs [15:59] Permission denied: '/data/data/data' [15:59] by induction... [15:59] Why do you even need all those bits? [15:59] I think this is the most interesting part: https://github.com/ArchiveTeam/warrior-code2/blob/master/warrior-runner.sh#L10-L15 [16:01] * ersi stares at db48x [16:03] yes? [16:03] that's where that error came from [16:06] Perhaps I don't understand what you're trying to do. [16:07] running the warrior on a cheap vps that doesn't let me simply upload an image and run it [16:08] theres a script [16:08] without the vm you can run [16:09] where? [16:09] seesaw [16:11] Do what you'd normally do to run one project (install Python, pip, the seesaw kit). Then instead of run-pipeline, start run-warrior with the appropriate arguments (run-warrior --help). [16:13] There's no real need for the things in https://github.com/ArchiveTeam/warrior-code2 [16:20] ok [16:21] for now it's fine, I'll redo it later [16:21] in the mean time, I must sleep [16:23] oh, btw. I only went down the path of copying what was in the vm because I couldn't find any documentation :) [16:23] I should have remembered that it was called seesaw, but it's been a few months and it's not even a shell script any more (kudos on that, btw) [16:24] That's true, it's very undocumented. (And the documentation for run-warrior should contain a very big warning: it assumes that your system is exactly like the warrior vm. Debian, 32-bit etc.) [16:25] so yahoo blogs is dead? [16:25] I think the Vietnamese part is dead, yes. Most of the blogs that worked before have now disappeared. [16:25] ah :( [16:26] although I have one that is still going [16:26] http://blog.yahoo.com/_HSRTJSB6GTZK6W6XH26DUPKM2Y/articles/page/1 [16:26] so it's hard to say what's going on there [16:28] That doesn't look like Vietnamese. [16:29] I think that somehow not every Yahoo blog is the same, even though they're on the same URL. [16:31] http://blog.zing.vn/jb/dt/supportuserzm/13637957 [16:32] yeah that blog is traditional chinese (hk/taiwan) [16:33] http://english.vietnamnet.vn/fms/science-it/55069/zing-blog-cherishes-ambition-to-replace-yahoo-blog-in-vietnam.html [16:34] The giant has announced that Yahoo!Blog would shut down in Vietnam from January 17, 2013. The noteworthy thing is that the service would be stopped six months after it made debut on June 20.