[00:00] paper is wonderful, but it takes up a lot of space relative to digital [00:01] you would really want error correction, and a lot of it. [00:01] agreed [00:04] *** Fusl has quit IRC (Max SendQ exceeded) [00:05] *** Fusl has joined #archiveteam-bs [00:15] *** hook54321 has joined #archiveteam-bs [00:29] *** JordanJ2 has quit IRC (Read error: Operation timed out) [00:36] *** lytv has quit IRC (Ping timeout: 258 seconds) [00:45] *** JesseW has joined #archiveteam-bs [00:49] *** lytv has joined #archiveteam-bs [01:11] *** Stiletto has quit IRC (Read error: Operation timed out) [01:41] *** JesseW1 has joined #archiveteam-bs [01:44] *** JesseW has quit IRC (Read error: Operation timed out) [01:53] *** justsome has joined #archiveteam-bs [01:53] hello [02:07] *** JesseW1 has quit IRC (Ping timeout: 370 seconds) [02:10] *** justsome has left [02:25] *** JesseW has joined #archiveteam-bs [02:34] *** logchfoo2 starts logging #archiveteam-bs at Fri May 20 02:34:18 2016 [02:34] *** logchfoo2 has joined #archiveteam-bs [02:35] *** RichardG has quit IRC (Ping timeout: 258 seconds) [02:36] *** RichardG has joined #archiveteam-bs [02:43] nice https://www.ssllabs.com/ssltest/analyze.html?d=archive.org [03:06] nyce [03:08] *** Stiletto has joined #archiveteam-bs [03:17] *** tomwsmf-a has quit IRC (Ping timeout: 258 seconds) [03:30] yeah, I was AFK [03:50] *** godane has quit IRC (Ping timeout: 244 seconds) [03:50] *** ndiddy has joined #archiveteam-bs [03:54] justsome (hopefully you read the logs): I've pulled out the list of URLs that generated an error (some of which may have been picked up on retries). There are 785 of them. I put the list here: http://termbin.com/h1a4 [03:55] I'll look into how to run wpull to retry those again. HCross (or others) with more wpull experience, please suggest command line options. [04:06] Started the recheck. 51 checked, 1 found. :-/ [04:07] Hm, I think the ones it's not finding are due to spaces in the file names. [04:09] Stopped the recheck; I need to figure out how to get the proper names, as the wpull.log errors don't help. [04:10] suggestions gratefully appreciated. [04:23] *** Frogging has quit IRC (Ping timeout: 244 seconds) [04:25] *** Frogging has joined #archiveteam-bs [04:37] *** Sk1d has quit IRC (Ping timeout: 194 seconds) [04:38] *** godane has joined #archiveteam-bs [04:43] *** Sk1d has joined #archiveteam-bs [04:59] *** vitzli has joined #archiveteam-bs [05:02] JesseW, I got the copy of ia_census_201604, https://0bin.net/paste/W4rZVPnM0GwayDqs#W+paYsyI5JJ1FYTEQBU2akw-twMTi2tXMIIzB+vYA2g [05:14] vitzli: good [05:17] *** Honno has joined #archiveteam-bs [05:34] Justsome hmm . it was more than that, so what's it done to only upload 16gb [05:34] JesseW: we seem to have the timezone issue [05:38] HCross: which timezone issue is that? [05:39] being in opposite ones? [05:39] async communication works for that [05:40] Yeah. I presume you both are in the US somewhere [05:42] *** godane has quit IRC (Read error: Operation timed out) [05:50] JesseW: I remember now. Wpull crashed at the very last moment, so I think that broke things [05:54] *** Aranje has joined #archiveteam-bs [05:54] JW_work, any idea how microfice compares to printing? presumably you can do it all-digitally now. On microfiche is 250 pages, and lasts 300 years. [05:55] Doesn't microfische look like absolute fucking crap? [05:55] HCross: west coast of the US timezone [05:56] Ah thought so [05:56] Yes, but 250 pages is that size on an index card instead of a phone book. [05:56] phillipsj: but micro- anything isn't human readable (without a microscope) [05:56] Oh, I get it. [05:56] Like how you get a car for free [05:57] But it's on fire and it has your parents' bodies in it. [05:57] It's free [05:57] So I get an index card of absolute useless shit that lasts for three centuries [05:57] Neat [05:57] * SketchCow thumbs up [05:57] Hm, there probably have been people offering pictures of your parent's bodies, on fire, on microfiche. [05:58] (yes, I know that isn't what you meant) [05:58] hey, i work with fiche every week, it's not that bad [05:58] but i wouldn't put anything new on it [05:58] IIRC the major use of microfiche in archival is newspapers, simply because there's no way you're storing those for that long. [05:58] JW was talking about storing 5TB. I can't even image how many filing cabinets that would be. [05:58] *imagine [06:02] pikhq, newspapers are on 35mm film strips from what I have seen. (Micro fiche is the plastic card) [06:02] Frustrating to search through. [06:05] I have been wondering how much it would cost to convert all my papers to microfiche. Local places say "we will do the first box for free!" instead of actually listing prices. [06:06] * phillipsj will probably try scanning to digital first. [06:13] * JesseW is trying to get wolfram alpha to convert 5TB to cubic feet of printed pages [06:14] 1 page is what, about 2K? [06:15] phillipsj: Ah, right. K [06:17] it's 2 billion printed pages, if 1 page=2K; however if 1 page is about 40K (djvu/pdf) then it's about 130 million pages [06:18] it will weight about 650 metric tons at 80 g/m^2 density, A4 paper [06:19] about 890m^3, at 0. [06:20] at 0.11 mm thickness, or 31500 cubic feet [06:20] nice [06:21] vitzli: plus error recovery, add 20%. [06:21] so about half of the cargo capacity of a Boeing 747, according to Wolfram Alpha [06:22] you can also print on both sides of a piece of paper, if the weight is high enough. [06:22] oh, I lost some digits in conversion, oops :P - in 'units' program (5 TiB/40KiB) * (0.11 mm * A4paper) = 920 m^3 = 32518 ft^3 [06:23] 80gsm is about on the threshold of being able to do that. if you bumped it up to 100gsm double sided printing is probably about right. given the scale you should also be accounting for the weight of the toner. [06:24] about 1000 m^3 for 100gsm [06:24] single-sided [06:25] now, lets go further and work out how much the consumer toner for this would cost. [06:26] http://www.amazon.com/Brother-Black-Toner-Cartridge-TN-350/dp/B0007KI6OU says 2500 pages yield based on 5% page coverage. if we assume 100% of the page is covered with perfectly compressed data, we have 50% coverage, so 250 page prints (125 double sided) per cartridge. [06:27] hm, I have a scale with 0.1 g precision [06:30] with a $48.69 cartridge (assuming the amazon search result is representative of normal costs) we see a cost of 0.195 US cents per side. 130 million pages, plus 20% for EEC, gives 30.38 million USD in toner cartridges. the printer might see some wear or jams too, which I've not accounted for. [06:34] if we look at the printer model that cartridge will fit in, a HL-2040 personal laser printer, we find that it will do a "maximum" of 20 pages per minute (at low quality settings, but there's no given numbers for high quality). assuming a high quality print takes 4 seconds it will take 7222 days for the print queue to clear, or as WA helpfully informs me 7.6 times the half life of sodium-22. [06:35] during our 7.6 sodium-22 decays, some poor intern will need to refill the standard size paper tray of 250 sheets no less than 600 thousand times. [06:36] they'll also probably get cancer from the red toner. [06:39] empty page weighs about 4.94 g (roughly 80 g/m^2) [06:41] i love you guys [06:41] 1 page can take ~17300 characters in 6pt text, monospace [06:42] oops, ~16000 (it was serif, now it's monospace) [06:43] you can go smaller than that on consumer printers [06:44] do you really want characters though? if you're only using ASCII, you probably have higher information density if you covert your data into basemoji. [06:47] and it requires more precise instrument, because pages have the same weight before and after, womp-womp [06:48] *** JesseW has quit IRC (Ping timeout: 370 seconds) [06:48] but again, it's a cheap Chinese scales [06:48] there's probably measurements of new and used printer cartridges. [06:50] http://nermsrefill.com/files/c74d97b01eae257e44aa9d5bade97baf_renkser_16.pdf [06:50] "LASER TONER CARTRIDGE POWDER WEIGHT CHART" [06:54] if we assume the line, 2500 page yield (at 5%) is representative of our cartridge, we would be imparting 0.6 grams of toner on each printed page with 50% coverage. our A4, 80 GSM pages weigh 5 grams so the toner adds an extra 12%. that's not an insignificant amount of extra tonnage. [06:56] for 130M pages, 70,000 KG of toner. WA tells me this is 82% of the cargo capacity of a Boeing 747-200F or about the weight of a large dinosaur. [06:56] it's about 0.06 when you divide toner weight on page yield (350/6000, for example) [06:57] you're right, I dropped a zero somewhere. [06:58] if it's 5->50 - then yes, it should be 0.6 [06:58] but 50% is grayish mess [06:59] 50% is about right if we were printing compressed data, think of a QR code you have about half black and white pixels in that encoding. [07:01] *** godane has joined #archiveteam-bs [07:01] *** metalcamp has joined #archiveteam-bs [07:01] I'm wrong, it should 0.6g [07:02] wonder how high you can stack piles of printed paper before the toner bonds to the piece above it. [07:04] 500..1000 pages above it - and it glues very well [07:08] lets say we want to play it safe and stack to a maximum of 500 pages. we'd need 16,000 square meters of shelving to pull that off. [07:42] *** schbirid has joined #archiveteam-bs [08:13] *** Honno has quit IRC (Read error: Operation timed out) [08:19] *** r3c0d3x has quit IRC (Ping timeout: 260 seconds) [09:07] *** bsmith093 has quit IRC (Ping timeout: 370 seconds) [09:19] *** Anarhist has joined #archiveteam-bs [09:19] hi, i would like to save all the family photo albums that my relatives have. the issue is that photos are often glued to the photoalbum and it's very impractical to take a regular scanner with me [09:20] what would be a reasonably good portable scanner that can scan a page without it being ripped out of some book/album [09:39] * joepie91 wave at Anarhist [09:40] * Anarhist takes off one's hat and waves with it to joepie91 [09:40] *** bsmith093 has joined #archiveteam-bs [10:14] Anarhist: flatbed scanners aren't going to be portable, you're going to struggle to get good results if the books can't be laid completely flat too. [10:19] Anarhist, maybe you'll get some luck with book scanners like plustek optibook - but they are windows-only devices and could be worse than plain epson flatbed scanners [10:19] i have no access to a windows machine at this moment, so that doesn't work [10:20] i'm thinking, maybe... i can actually do a "muscle" approach.. and find the lightest flatbed scanner and a bag [10:21] there's some pretty lightweight ones barely bigger than a piece of paper. [10:22] *** brayden has joined #archiveteam-bs [10:22] *** swebb sets mode: +o brayden [10:23] *** brayden has quit IRC (Client Quit) [10:24] *** brayden has joined #archiveteam-bs [10:24] *** swebb sets mode: +o brayden [10:35] you can get a very slim letter-size flatbed, and remove the roof [10:35] then you can flip pages of the albums and press the scanner upside-down onto each page [10:38] blahah, i love the idea... i think i've seen some which have a removable roof... that'd be the best approach [10:39] if you can control the scanner from the terminal, you could script it so you can get into a rhythm :) [10:40] Anarhist: I had one like this: https://www.amazon.co.uk/Canon-CanoScan-LiDE-Compact-Scanner/dp/B00MWLHV2U [10:43] +1 for LiDEs if you are okay with flatbed scanning [10:43] but depending on the spine, it may get tricky [10:48] (I had one of them, it eventually broke, but the scan quality was very very good) [10:50] if they'd make them with external (flash) memory and a battery... that'd be awesome... why can't they just combine technologies like in star trek? [10:50] and axillary circuits, so that almost nothing breaks [10:51] except holosuits [10:51] you can get scanners that use thumb drives and/or (micro)SD for storage [10:52] and you could power then using a li-ion power-pack [10:52] so you could make a makeshift star trek scanner :) [11:09] *** BlueMaxim has quit IRC (Quit: Leaving) [11:09] Anarhist: they do make those- but they are super-small, and you roll them over the page to do a scan. A search for portable scanner should take up some units- if not, tell me, and I'll get you a model if you're interested. [12:04] *** Honno has joined #archiveteam-bs [12:08] What the... What is this, that the arto-script gets stuck and then doesnt do anything anymore? [12:08] Finished WgetDownload for Item 10profiles:567592 [12:08] Starting PrepareStatsForTracker for Item 10profiles:567592 [12:08] Finished PrepareStatsForTracker for Item 10profiles:567592 [12:08] Starting MoveFiles for Item 10profiles:567592 [12:08] Finished MoveFiles for Item 10profiles:567592 [12:09] After that, it simply does nothing anymore. Any ideas? [12:12] Anarhist: LiDE scanners are USB-powered [12:12] so you can run them off a battery pack [12:12] pretty easily [12:12] I'm sure you could hack something together with the "scan to email" thing as well, with an rpi or something [12:12] :P [12:12] that pretends to be a mailserver [12:13] or even have the rpi trigger the scan [12:24] *** Madthias has quit IRC (Ping timeout: 250 seconds) [12:25] *** Sk1d has quit IRC (Ping timeout: 250 seconds) [12:25] *** koon has quit IRC (Ping timeout: 250 seconds) [12:25] *** vitzli has quit IRC (Ping timeout: 250 seconds) [12:25] *** Fletcher_ has quit IRC (Ping timeout: 250 seconds) [12:25] *** Kazzy has quit IRC (Ping timeout: 250 seconds) [12:27] *** Sk1d has joined #archiveteam-bs [12:28] *** vitzli has joined #archiveteam-bs [12:31] *** Kazzy has joined #archiveteam-bs [12:40] *** Madthias has joined #archiveteam-bs [12:46] *** koon has joined #archiveteam-bs [13:27] *** Honno has quit IRC (Read error: Operation timed out) [13:34] *** incog has quit IRC () [13:49] *** tomwsmf-a has joined #archiveteam-bs [14:04] * phillipsj note to self: do not toss 200dpi hand scanner. (the driver uses Win 3.1) [14:05] *** tomwsmf-a has quit IRC (Read error: Operation timed out) [14:11] *** Honno has joined #archiveteam-bs [14:12] *** Honno_ has joined #archiveteam-bs [14:16] Anyone else having issues with your the IA S3 around now? [14:17] *** Fletcher_ has joined #archiveteam-bs [14:19] *** Honno has quit IRC (Read error: Operation timed out) [14:24] https://tweakers.net/nieuws/111517/steekproef-gemeenten-hebben-problemen-met-archief-door-opslag-op-floppys.html?nb=2016-05-20&u=1500 [14:25] "Municipalities have trouble with their archives, due to their use of CD-Roms, floppies, and tapes." [14:25] "Because of this, information is impossible to search through, or has even disappeared entirely." [14:28] highly relevant xkcd today: https://xkcd.com/ [14:28] err [14:28] http://xkcd.com/1683/ [14:29] that is spot on :p [15:17] *** tomwsmf-a has joined #archiveteam-bs [15:18] Stackoverflow is down. [15:20] oh shit how am I going to get work done [15:22] Seems to be back up already. [15:25] https://archive.org/details/stackexchange [15:27] *** vitzli has quit IRC (Quit: Leaving) [15:31] *** JesseW has joined #archiveteam-bs [15:49] *** tomwsmf-a has quit IRC (Read error: Operation timed out) [15:52] *** JesseW has quit IRC (Ping timeout: 370 seconds) [16:19] heh joepie91 [16:19] at least here in Belgium they have old enough pcs running win xp to read those floppies; ) [16:23] hah [16:46] *** metalcamp has quit IRC (Ping timeout: 244 seconds) [16:55] *** Aranje has quit IRC (Ping timeout: 260 seconds) [17:00] *** Aranje has joined #archiveteam-bs [17:49] so who's the "lucky" (hah) custodians of https://twitter.com/archiveteam ? [17:50] Should we publically offer them help? ;) http://www.nltimes.nl/2016/05/20/dutch-cities-struggling-floppy-disc-cd-archives/ [17:52] if there are more specifics, it might be worth asking Jason to send out a request on the ArchiveCorps mailing list [18:32] *** brayden_ has joined #archiveteam-bs [18:32] *** swebb sets mode: +o brayden_ [18:34] Muad-Dib: if you are able to help them, you are welcome to reach out [18:34] I bet [18:41] *** brayden has quit IRC (Ping timeout: 633 seconds) [19:17] *** r3c0d3x has joined #archiveteam-bs [19:32] *** r3c0d3x has quit IRC (Quit: Leaving) [19:33] *** r3c0d3x has joined #archiveteam-bs [19:58] SketchCow: who's running https://twitter.com/archiveteam at the moment? [19:59] Might be nice to keep it well updated with the projects we are doing [19:59] and the statistics on them when they have finished [20:13] *** VADemon has joined #archiveteam-bs [20:32] we should make it an irc-bot, allowing commands from any chanop [20:39] *** atrocity has quit IRC (Read error: Operation timed out) [22:20] *** schbirid has quit IRC (Quit: Leaving) [22:24] *** powerKitt has joined #archiveteam-bs [22:28] *** tomwsmf-a has joined #archiveteam-bs [22:42] *** BlueMaxim has joined #archiveteam-bs [22:49] *** Yoshimura has joined #archiveteam-bs [22:58] *** Honno_ has quit IRC (Read error: Operation timed out) [23:00] *** xXx_ndidd has joined #archiveteam-bs [23:00] *** ndiddy has quit IRC (Read error: Connection reset by peer) [23:02] *** tomwsmf-a has quit IRC (Read error: Operation timed out) [23:08] *** xXx_ndidd has quit IRC (Ping timeout: 492 seconds)