#archiveteam-bs 2016-05-20,Fri

↑back Search

Time Nickname Message
00:00 🔗 dashcloud paper is wonderful, but it takes up a lot of space relative to digital
00:01 🔗 murk you would really want error correction, and a lot of it.
00:01 🔗 JW_work agreed
00:04 🔗 Fusl has quit IRC (Max SendQ exceeded)
00:05 🔗 Fusl has joined #archiveteam-bs
00:15 🔗 hook54321 has joined #archiveteam-bs
00:29 🔗 JordanJ2 has quit IRC (Read error: Operation timed out)
00:36 🔗 lytv has quit IRC (Ping timeout: 258 seconds)
00:45 🔗 JesseW has joined #archiveteam-bs
00:49 🔗 lytv has joined #archiveteam-bs
01:11 🔗 Stiletto has quit IRC (Read error: Operation timed out)
01:41 🔗 JesseW1 has joined #archiveteam-bs
01:44 🔗 JesseW has quit IRC (Read error: Operation timed out)
01:53 🔗 justsome has joined #archiveteam-bs
01:53 🔗 justsome hello
02:07 🔗 JesseW1 has quit IRC (Ping timeout: 370 seconds)
02:10 🔗 justsome has left
02:25 🔗 JesseW has joined #archiveteam-bs
02:34 🔗 logchfoo2 starts logging #archiveteam-bs at Fri May 20 02:34:18 2016
02:34 🔗 logchfoo2 has joined #archiveteam-bs
02:35 🔗 RichardG has quit IRC (Ping timeout: 258 seconds)
02:36 🔗 RichardG has joined #archiveteam-bs
02:43 🔗 ranma nice https://www.ssllabs.com/ssltest/analyze.html?d=archive.org
03:06 🔗 Frogging nyce
03:08 🔗 Stiletto has joined #archiveteam-bs
03:17 🔗 tomwsmf-a has quit IRC (Ping timeout: 258 seconds)
03:30 🔗 JesseW yeah, I was AFK
03:50 🔗 godane has quit IRC (Ping timeout: 244 seconds)
03:50 🔗 ndiddy has joined #archiveteam-bs
03:54 🔗 JesseW justsome (hopefully you read the logs): I've pulled out the list of URLs that generated an error (some of which may have been picked up on retries). There are 785 of them. I put the list here: http://termbin.com/h1a4
03:55 🔗 JesseW I'll look into how to run wpull to retry those again. HCross (or others) with more wpull experience, please suggest command line options.
04:06 🔗 JesseW Started the recheck. 51 checked, 1 found. :-/
04:07 🔗 JesseW Hm, I think the ones it's not finding are due to spaces in the file names.
04:09 🔗 JesseW Stopped the recheck; I need to figure out how to get the proper names, as the wpull.log errors don't help.
04:10 🔗 JesseW suggestions gratefully appreciated.
04:23 🔗 Frogging has quit IRC (Ping timeout: 244 seconds)
04:25 🔗 Frogging has joined #archiveteam-bs
04:37 🔗 Sk1d has quit IRC (Ping timeout: 194 seconds)
04:38 🔗 godane has joined #archiveteam-bs
04:43 🔗 Sk1d has joined #archiveteam-bs
04:59 🔗 vitzli has joined #archiveteam-bs
05:02 🔗 vitzli JesseW, I got the copy of ia_census_201604, https://0bin.net/paste/W4rZVPnM0GwayDqs#W+paYsyI5JJ1FYTEQBU2akw-twMTi2tXMIIzB+vYA2g
05:14 🔗 JesseW vitzli: good
05:17 🔗 Honno has joined #archiveteam-bs
05:34 🔗 HCross2 Justsome hmm . it was more than that, so what's it done to only upload 16gb
05:34 🔗 HCross2 JesseW: we seem to have the timezone issue
05:38 🔗 JesseW HCross: which timezone issue is that?
05:39 🔗 JesseW being in opposite ones?
05:39 🔗 JesseW async communication works for that
05:40 🔗 HCross2 Yeah. I presume you both are in the US somewhere
05:42 🔗 godane has quit IRC (Read error: Operation timed out)
05:50 🔗 HCross2 JesseW: I remember now. Wpull crashed at the very last moment, so I think that broke things
05:54 🔗 Aranje has joined #archiveteam-bs
05:54 🔗 phillipsj JW_work, any idea how microfice compares to printing? presumably you can do it all-digitally now. On microfiche is 250 pages, and lasts 300 years.
05:55 🔗 SketchCow Doesn't microfische look like absolute fucking crap?
05:55 🔗 JesseW HCross: west coast of the US timezone
05:56 🔗 HCross2 Ah thought so
05:56 🔗 phillipsj Yes, but 250 pages is that size on an index card instead of a phone book.
05:56 🔗 JesseW phillipsj: but micro- anything isn't human readable (without a microscope)
05:56 🔗 SketchCow Oh, I get it.
05:56 🔗 SketchCow Like how you get a car for free
05:57 🔗 SketchCow But it's on fire and it has your parents' bodies in it.
05:57 🔗 SketchCow It's free
05:57 🔗 SketchCow So I get an index card of absolute useless shit that lasts for three centuries
05:57 🔗 SketchCow Neat
05:57 🔗 * SketchCow thumbs up
05:57 🔗 JesseW Hm, there probably have been people offering pictures of your parent's bodies, on fire, on microfiche.
05:58 🔗 JesseW (yes, I know that isn't what you meant)
05:58 🔗 xmc hey, i work with fiche every week, it's not that bad
05:58 🔗 xmc but i wouldn't put anything new on it
05:58 🔗 pikhq IIRC the major use of microfiche in archival is newspapers, simply because there's no way you're storing those for that long.
05:58 🔗 phillipsj JW was talking about storing 5TB. I can't even image how many filing cabinets that would be.
05:58 🔗 phillipsj *imagine
06:02 🔗 phillipsj pikhq, newspapers are on 35mm film strips from what I have seen. (Micro fiche is the plastic card)
06:02 🔗 phillipsj Frustrating to search through.
06:05 🔗 phillipsj I have been wondering how much it would cost to convert all my papers to microfiche. Local places say "we will do the first box for free!" instead of actually listing prices.
06:06 🔗 * phillipsj will probably try scanning to digital first.
06:13 🔗 * JesseW is trying to get wolfram alpha to convert 5TB to cubic feet of printed pages
06:14 🔗 vitzli 1 page is what, about 2K?
06:15 🔗 pikhq phillipsj: Ah, right. K
06:17 🔗 vitzli it's 2 billion printed pages, if 1 page=2K; however if 1 page is about 40K (djvu/pdf) then it's about 130 million pages
06:18 🔗 vitzli it will weight about 650 metric tons at 80 g/m^2 density, A4 paper
06:19 🔗 vitzli about 890m^3, at 0.
06:20 🔗 vitzli at 0.11 mm thickness, or 31500 cubic feet
06:20 🔗 JesseW nice
06:21 🔗 murk vitzli: plus error recovery, add 20%.
06:21 🔗 JesseW so about half of the cargo capacity of a Boeing 747, according to Wolfram Alpha
06:22 🔗 murk you can also print on both sides of a piece of paper, if the weight is high enough.
06:22 🔗 vitzli oh, I lost some digits in conversion, oops :P - in 'units' program (5 TiB/40KiB) * (0.11 mm * A4paper) = 920 m^3 = 32518 ft^3
06:23 🔗 murk 80gsm is about on the threshold of being able to do that. if you bumped it up to 100gsm double sided printing is probably about right. given the scale you should also be accounting for the weight of the toner.
06:24 🔗 vitzli about 1000 m^3 for 100gsm
06:24 🔗 vitzli single-sided
06:25 🔗 murk now, lets go further and work out how much the consumer toner for this would cost.
06:26 🔗 murk http://www.amazon.com/Brother-Black-Toner-Cartridge-TN-350/dp/B0007KI6OU says 2500 pages yield based on 5% page coverage. if we assume 100% of the page is covered with perfectly compressed data, we have 50% coverage, so 250 page prints (125 double sided) per cartridge.
06:27 🔗 vitzli hm, I have a scale with 0.1 g precision
06:30 🔗 murk with a $48.69 cartridge (assuming the amazon search result is representative of normal costs) we see a cost of 0.195 US cents per side. 130 million pages, plus 20% for EEC, gives 30.38 million USD in toner cartridges. the printer might see some wear or jams too, which I've not accounted for.
06:34 🔗 murk if we look at the printer model that cartridge will fit in, a HL-2040 personal laser printer, we find that it will do a "maximum" of 20 pages per minute (at low quality settings, but there's no given numbers for high quality). assuming a high quality print takes 4 seconds it will take 7222 days for the print queue to clear, or as WA helpfully informs me 7.6 times the half life of sodium-22.
06:35 🔗 murk during our 7.6 sodium-22 decays, some poor intern will need to refill the standard size paper tray of 250 sheets no less than 600 thousand times.
06:36 🔗 murk they'll also probably get cancer from the red toner.
06:39 🔗 vitzli empty page weighs about 4.94 g (roughly 80 g/m^2)
06:41 🔗 JesseW i love you guys
06:41 🔗 vitzli 1 page can take ~17300 characters in 6pt text, monospace
06:42 🔗 vitzli oops, ~16000 (it was serif, now it's monospace)
06:43 🔗 murk you can go smaller than that on consumer printers
06:44 🔗 murk do you really want characters though? if you're only using ASCII, you probably have higher information density if you covert your data into basemoji.
06:47 🔗 vitzli and it requires more precise instrument, because pages have the same weight before and after, womp-womp
06:48 🔗 JesseW has quit IRC (Ping timeout: 370 seconds)
06:48 🔗 vitzli but again, it's a cheap Chinese scales
06:48 🔗 murk there's probably measurements of new and used printer cartridges.
06:50 🔗 murk http://nermsrefill.com/files/c74d97b01eae257e44aa9d5bade97baf_renkser_16.pdf
06:50 🔗 murk "LASER TONER CARTRIDGE POWDER WEIGHT CHART"
06:54 🔗 murk if we assume the line, 2500 page yield (at 5%) is representative of our cartridge, we would be imparting 0.6 grams of toner on each printed page with 50% coverage. our A4, 80 GSM pages weigh 5 grams so the toner adds an extra 12%. that's not an insignificant amount of extra tonnage.
06:56 🔗 murk for 130M pages, 70,000 KG of toner. WA tells me this is 82% of the cargo capacity of a Boeing 747-200F or about the weight of a large dinosaur.
06:56 🔗 vitzli it's about 0.06 when you divide toner weight on page yield (350/6000, for example)
06:57 🔗 murk you're right, I dropped a zero somewhere.
06:58 🔗 vitzli if it's 5->50 - then yes, it should be 0.6
06:58 🔗 vitzli but 50% is grayish mess
06:59 🔗 murk 50% is about right if we were printing compressed data, think of a QR code you have about half black and white pixels in that encoding.
07:01 🔗 godane has joined #archiveteam-bs
07:01 🔗 metalcamp has joined #archiveteam-bs
07:01 🔗 vitzli I'm wrong, it should 0.6g
07:02 🔗 murk wonder how high you can stack piles of printed paper before the toner bonds to the piece above it.
07:04 🔗 vitzli 500..1000 pages above it - and it glues very well
07:08 🔗 murk lets say we want to play it safe and stack to a maximum of 500 pages. we'd need 16,000 square meters of shelving to pull that off.
07:42 🔗 schbirid has joined #archiveteam-bs
08:13 🔗 Honno has quit IRC (Read error: Operation timed out)
08:19 🔗 r3c0d3x has quit IRC (Ping timeout: 260 seconds)
09:07 🔗 bsmith093 has quit IRC (Ping timeout: 370 seconds)
09:19 🔗 Anarhist has joined #archiveteam-bs
09:19 🔗 Anarhist hi, i would like to save all the family photo albums that my relatives have. the issue is that photos are often glued to the photoalbum and it's very impractical to take a regular scanner with me
09:20 🔗 Anarhist what would be a reasonably good portable scanner that can scan a page without it being ripped out of some book/album
09:39 🔗 * joepie91 wave at Anarhist
09:40 🔗 * Anarhist takes off one's hat and waves with it to joepie91
09:40 🔗 bsmith093 has joined #archiveteam-bs
10:14 🔗 murk Anarhist: flatbed scanners aren't going to be portable, you're going to struggle to get good results if the books can't be laid completely flat too.
10:19 🔗 vitzli Anarhist, maybe you'll get some luck with book scanners like plustek optibook - but they are windows-only devices and could be worse than plain epson flatbed scanners
10:19 🔗 Anarhist i have no access to a windows machine at this moment, so that doesn't work
10:20 🔗 Anarhist i'm thinking, maybe... i can actually do a "muscle" approach.. and find the lightest flatbed scanner and a bag
10:21 🔗 murk there's some pretty lightweight ones barely bigger than a piece of paper.
10:22 🔗 brayden has joined #archiveteam-bs
10:22 🔗 swebb sets mode: +o brayden
10:23 🔗 brayden has quit IRC (Client Quit)
10:24 🔗 brayden has joined #archiveteam-bs
10:24 🔗 swebb sets mode: +o brayden
10:35 🔗 blahah you can get a very slim letter-size flatbed, and remove the roof
10:35 🔗 blahah then you can flip pages of the albums and press the scanner upside-down onto each page
10:38 🔗 Anarhist blahah, i love the idea... i think i've seen some which have a removable roof... that'd be the best approach
10:39 🔗 blahah if you can control the scanner from the terminal, you could script it so you can get into a rhythm :)
10:40 🔗 blahah Anarhist: I had one like this: https://www.amazon.co.uk/Canon-CanoScan-LiDE-Compact-Scanner/dp/B00MWLHV2U
10:43 🔗 joepie91 +1 for LiDEs if you are okay with flatbed scanning
10:43 🔗 joepie91 but depending on the spine, it may get tricky
10:48 🔗 joepie91 (I had one of them, it eventually broke, but the scan quality was very very good)
10:50 🔗 Anarhist if they'd make them with external (flash) memory and a battery... that'd be awesome... why can't they just combine technologies like in star trek?
10:50 🔗 Anarhist and axillary circuits, so that almost nothing breaks
10:51 🔗 Anarhist except holosuits
10:51 🔗 blahah you can get scanners that use thumb drives and/or (micro)SD for storage
10:52 🔗 blahah and you could power then using a li-ion power-pack
10:52 🔗 blahah so you could make a makeshift star trek scanner :)
11:09 🔗 BlueMaxim has quit IRC (Quit: Leaving)
11:09 🔗 dashcloud Anarhist: they do make those- but they are super-small, and you roll them over the page to do a scan. A search for portable scanner should take up some units- if not, tell me, and I'll get you a model if you're interested.
12:04 🔗 Honno has joined #archiveteam-bs
12:08 🔗 Medowar What the... What is this, that the arto-script gets stuck and then doesnt do anything anymore?
12:08 🔗 Medowar Finished WgetDownload for Item 10profiles:567592
12:08 🔗 Medowar Starting PrepareStatsForTracker for Item 10profiles:567592
12:08 🔗 Medowar Finished PrepareStatsForTracker for Item 10profiles:567592
12:08 🔗 Medowar Starting MoveFiles for Item 10profiles:567592
12:08 🔗 Medowar Finished MoveFiles for Item 10profiles:567592
12:09 🔗 Medowar After that, it simply does nothing anymore. Any ideas?
12:12 🔗 joepie91 Anarhist: LiDE scanners are USB-powered
12:12 🔗 joepie91 so you can run them off a battery pack
12:12 🔗 joepie91 pretty easily
12:12 🔗 joepie91 I'm sure you could hack something together with the "scan to email" thing as well, with an rpi or something
12:12 🔗 joepie91 :P
12:12 🔗 joepie91 that pretends to be a mailserver
12:13 🔗 joepie91 or even have the rpi trigger the scan
12:24 🔗 Madthias has quit IRC (Ping timeout: 250 seconds)
12:25 🔗 Sk1d has quit IRC (Ping timeout: 250 seconds)
12:25 🔗 koon has quit IRC (Ping timeout: 250 seconds)
12:25 🔗 vitzli has quit IRC (Ping timeout: 250 seconds)
12:25 🔗 Fletcher_ has quit IRC (Ping timeout: 250 seconds)
12:25 🔗 Kazzy has quit IRC (Ping timeout: 250 seconds)
12:27 🔗 Sk1d has joined #archiveteam-bs
12:28 🔗 vitzli has joined #archiveteam-bs
12:31 🔗 Kazzy has joined #archiveteam-bs
12:40 🔗 Madthias has joined #archiveteam-bs
12:46 🔗 koon has joined #archiveteam-bs
13:27 🔗 Honno has quit IRC (Read error: Operation timed out)
13:34 🔗 incog has quit IRC ()
13:49 🔗 tomwsmf-a has joined #archiveteam-bs
14:04 🔗 * phillipsj note to self: do not toss 200dpi hand scanner. (the driver uses Win 3.1)
14:05 🔗 tomwsmf-a has quit IRC (Read error: Operation timed out)
14:11 🔗 Honno has joined #archiveteam-bs
14:12 🔗 Honno_ has joined #archiveteam-bs
14:16 🔗 HCross2 Anyone else having issues with your the IA S3 around now?
14:17 🔗 Fletcher_ has joined #archiveteam-bs
14:19 🔗 Honno has quit IRC (Read error: Operation timed out)
14:24 🔗 joepie91 https://tweakers.net/nieuws/111517/steekproef-gemeenten-hebben-problemen-met-archief-door-opslag-op-floppys.html?nb=2016-05-20&u=1500
14:25 🔗 joepie91 "Municipalities have trouble with their archives, due to their use of CD-Roms, floppies, and tapes."
14:25 🔗 joepie91 "Because of this, information is impossible to search through, or has even disappeared entirely."
14:28 🔗 joepie91 highly relevant xkcd today: https://xkcd.com/
14:28 🔗 joepie91 err
14:28 🔗 joepie91 http://xkcd.com/1683/
14:29 🔗 Frogging that is spot on :p
15:17 🔗 tomwsmf-a has joined #archiveteam-bs
15:18 🔗 PurpleSym Stackoverflow is down.
15:20 🔗 DFJustin oh shit how am I going to get work done
15:22 🔗 PurpleSym Seems to be back up already.
15:25 🔗 PurpleSym https://archive.org/details/stackexchange
15:27 🔗 vitzli has quit IRC (Quit: Leaving)
15:31 🔗 JesseW has joined #archiveteam-bs
15:49 🔗 tomwsmf-a has quit IRC (Read error: Operation timed out)
15:52 🔗 JesseW has quit IRC (Ping timeout: 370 seconds)
16:19 🔗 Madthias heh joepie91
16:19 🔗 Madthias at least here in Belgium they have old enough pcs running win xp to read those floppies; )
16:23 🔗 joepie91 hah
16:46 🔗 metalcamp has quit IRC (Ping timeout: 244 seconds)
16:55 🔗 Aranje has quit IRC (Ping timeout: 260 seconds)
17:00 🔗 Aranje has joined #archiveteam-bs
17:49 🔗 JW_work so who's the "lucky" (hah) custodians of https://twitter.com/archiveteam ?
17:50 🔗 Muad-Dib Should we publically offer them help? ;) http://www.nltimes.nl/2016/05/20/dutch-cities-struggling-floppy-disc-cd-archives/
17:52 🔗 JW_work if there are more specifics, it might be worth asking Jason to send out a request on the ArchiveCorps mailing list
18:32 🔗 brayden_ has joined #archiveteam-bs
18:32 🔗 swebb sets mode: +o brayden_
18:34 🔗 Atluxity Muad-Dib: if you are able to help them, you are welcome to reach out
18:34 🔗 Atluxity I bet
18:41 🔗 brayden has quit IRC (Ping timeout: 633 seconds)
19:17 🔗 r3c0d3x has joined #archiveteam-bs
19:32 🔗 r3c0d3x has quit IRC (Quit: Leaving)
19:33 🔗 r3c0d3x has joined #archiveteam-bs
19:58 🔗 arkiver SketchCow: who's running https://twitter.com/archiveteam at the moment?
19:59 🔗 arkiver Might be nice to keep it well updated with the projects we are doing
19:59 🔗 arkiver and the statistics on them when they have finished
20:13 🔗 VADemon has joined #archiveteam-bs
20:32 🔗 Atluxity we should make it an irc-bot, allowing commands from any chanop
20:39 🔗 atrocity has quit IRC (Read error: Operation timed out)
22:20 🔗 schbirid has quit IRC (Quit: Leaving)
22:24 🔗 powerKitt has joined #archiveteam-bs
22:28 🔗 tomwsmf-a has joined #archiveteam-bs
22:42 🔗 BlueMaxim has joined #archiveteam-bs
22:49 🔗 Yoshimura has joined #archiveteam-bs
22:58 🔗 Honno_ has quit IRC (Read error: Operation timed out)
23:00 🔗 xXx_ndidd has joined #archiveteam-bs
23:00 🔗 ndiddy has quit IRC (Read error: Connection reset by peer)
23:02 🔗 tomwsmf-a has quit IRC (Read error: Operation timed out)
23:08 🔗 xXx_ndidd has quit IRC (Ping timeout: 492 seconds)

irclogger-viewer