[00:09] wow [00:40] *** SN4T14 has joined #archiveteam-bs [00:51] *** mkram is now known as mkram_ [00:52] mkram_: no one really knows what will still be around in 10-15 years, so we can only give you guesses based on what's currently available [00:53] mkram_: Amazon Glacier is an option [00:54] *** Jens has joined #archiveteam-bs [00:55] *** Jens is now known as JensRex [00:55] hey JensRex [00:56] Oi. [01:10] *** nickware has joined #archiveteam-bs [01:10] *** Jordan has quit IRC (Quit: bye ;w;) [01:23] *** n00b908 has joined #archiveteam-bs [01:25] *** nickware has quit IRC (Quit: Leaving) [01:37] *** spiko has quit IRC (Read error: Operation timed out) [02:14] *** pizzaiolo has quit IRC (Remote host closed the connection) [02:35] *** BlueMaxim has quit IRC (Quit: Leaving) [02:55] We have finished all of our Club Penguin grabs. (Hooray!) [03:41] *** BlueMaxim has joined #archiveteam-bs [03:51] *** xmc has joined #archiveteam-bs [03:51] *** swebb sets mode: +o xmc [04:10] *** BlueMaxim has quit IRC (Quit: Leaving) [04:32] *** BlueMaxim has joined #archiveteam-bs [05:06] *** ndiddy has quit IRC (Read error: Connection reset by peer) [05:18] *** Sk1d has quit IRC (Ping timeout: 194 seconds) [05:23] *** Sk1d has joined #archiveteam-bs [06:09] *** nrp3c has quit IRC (Read error: Operation timed out) [06:29] *** BiggieJon has quit IRC (Ping timeout: 268 seconds) [06:41] *** Ctrl-S___ has joined #archiveteam-bs [06:41] *** _desu___ has joined #archiveteam-bs [06:41] *** jiphex has joined #archiveteam-bs [06:41] *** BartoCH has joined #archiveteam-bs [06:41] *** Nons has joined #archiveteam-bs [06:41] *** DFJustin has joined #archiveteam-bs [06:41] *** HCross2 has joined #archiveteam-bs [06:41] *** alembic has joined #archiveteam-bs [06:41] *** JSharp___ has joined #archiveteam-bs [06:41] *** zhongfu has joined #archiveteam-bs [06:41] *** wm_ has joined #archiveteam-bs [06:41] *** Aoede has joined #archiveteam-bs [06:41] *** Famicoman has joined #archiveteam-bs [06:41] *** acridAxid has joined #archiveteam-bs [06:41] *** ItsYoda has joined #archiveteam-bs [06:41] *** Sanqui has joined #archiveteam-bs [06:41] *** Muad-Dib has joined #archiveteam-bs [06:41] *** Nyx has joined #archiveteam-bs [06:41] *** balrog has joined #archiveteam-bs [06:41] *** midas has joined #archiveteam-bs [06:41] *** Meroje has joined #archiveteam-bs [06:41] *** Coderjo has joined #archiveteam-bs [06:41] *** alfie has joined #archiveteam-bs [06:41] *** Jonimus has joined #archiveteam-bs [06:41] *** decay has joined #archiveteam-bs [06:41] *** ae_g_i_s has joined #archiveteam-bs [06:41] *** Zebranky has joined #archiveteam-bs [06:41] *** dan- has joined #archiveteam-bs [06:41] *** tephra_ has joined #archiveteam-bs [06:41] *** efnet.port80.se sets mode: +oo balrog Jonimus [06:41] *** swebb sets mode: +o DFJustin [06:41] *** swebb sets mode: +o balrog [06:41] *** swebb sets mode: +o Jonimus [07:04] *** Aranje has quit IRC (Quit: Three sheets to the wind) [07:14] so i'm going after metro newspaper [07:14] alot of it is on issuu.com [07:14] so i will be doing it like did with computer paper [07:14] for id names [07:14] issue_${userid}_${docid} [07:15] *issuu_${userid}_${docid} [07:15] one of the smaller ones is metro_korea [07:16] with only 604 issues [07:55] *** odemg2 has quit IRC (Remote host closed the connection) [08:01] *** spiko has joined #archiveteam-bs [08:01] *** dashcloud has quit IRC (Read error: Operation timed out) [08:03] *** dashcloud has joined #archiveteam-bs [08:31] *** vitzli has joined #archiveteam-bs [08:38] *** BlueMaxim has quit IRC (Quit: Leaving) [08:42] A round of items are leaving FOS and going into IA just because I'm trying to clean space. [08:43] So a lot of little projects mostly dormant are having One More Item being added. [08:45] *** BlueMaxim has joined #archiveteam-bs [09:34] *** GE has joined #archiveteam-bs [09:44] looks like infoworld 1989-09-04 and 1989-09-11 are reverse [09:45] only the covers are have the right dates [09:52] dashcloud: I was thinking about putting it on LTO tape and storing it in tin cans, together with desiccant, (and rustprotection on the outside), and storing it in a noearly no air circulation earth cellar to keep the temperature around the yearly average of 10°C. [09:53] mkram_: what are the drive belts on those tape carts made out of? [09:53] i work with tapes from the 80s that have a urethane composite drive belt, and they tend to snap [09:54] I am more interested in whether such pricing would make some of the bigger archiving efforts realistic. [09:54] Polyethylene napthalate [09:56] Rated for 30 years under archving conditions (20% humidity, 15°C). But that i can not keep that way, I would have it a bit colder, and especially, if possible, I would like it being a lot drier at around 10% relative humidity, so I would be able to use stronger desiccant. [10:18] *** Nons has quit IRC () [10:37] *** inittux- has joined #archiveteam-bs [10:39] *** SilSte has quit IRC (Read error: Operation timed out) [10:40] *** inittux has quit IRC (Read error: Operation timed out) [10:41] *** SilSte has joined #archiveteam-bs [10:54] Anyone in europe having issues with cloning from heroku? [10:59] *** BlueMaxim has quit IRC (Quit: Leaving) [11:04] *** pizzaiolo has joined #archiveteam-bs [11:30] *** fie_ has joined #archiveteam-bs [11:30] *** fie has quit IRC (Read error: Operation timed out) [11:38] i'm starting to upload metro korea issues from 2012 [11:38] first one: https://archive.org/details/issuu_metro_korea_20121026_kr_seoul [11:38] i will be upload all of the issuu ones but there also pdfs going back to least 2014 [12:07] *** kvieta has quit IRC (Ping timeout: 370 seconds) [12:11] *** kvieta has joined #archiveteam-bs [12:33] *** GE has quit IRC (Ping timeout: 255 seconds) [12:34] *** GE has joined #archiveteam-bs [12:47] *** kniffy has quit IRC (Ping timeout: 240 seconds) [12:56] *** kniffy has joined #archiveteam-bs [13:50] *** khu has joined #archiveteam-bs [14:19] *** khu has quit IRC (Ping timeout: 633 seconds) [14:58] *** vitzli has quit IRC (Quit: Leaving) [15:12] *** ndiddy has joined #archiveteam-bs [15:33] *** ndiddy has quit IRC (Read error: Connection reset by peer) [15:33] *** ndizzle has joined #archiveteam-bs [15:50] *** ZexaronS has joined #archiveteam-bs [15:59] *** odemg has joined #archiveteam-bs [16:16] *** Oddy has joined #archiveteam-bs [16:36] *** Aranje has joined #archiveteam-bs [16:58] *** Oddy has quit IRC (Read error: Operation timed out) [17:07] *** schbirid has joined #archiveteam-bs [17:22] Nemo_bis: well, for routine stuff, they should contact IA and ask if there's a reason they're excluded from the Wayback machine. [17:23] I think [17:23] He isn't in here. [17:23] ah, right [17:24] *** Nemo_bis has joined #archiveteam-bs [17:25] but but but I only wanted to link the wiki page ;) [17:25] Nemo_bis, AO3 and OTW are part of my monthly archival. I currently index and archive all major web fiction sites that I track, which includes RoyalRoadl, Literotica, WebFictionGuide, TopWebFiction, AO3, OTW, FanFiction.net, FictionPress, Writing.com, and a few smaller ones. [17:27] And do you upload that to archive.org? [17:27] Yes. [17:27] You could update the wiki page. [17:28] Nemo_bis: the way I'm parsing http://archiveofourown.org/robots.txt, it looks like ao3 is excluding the wayback machine from all of it's works. Is that correct? [17:28] I guess I could, but technically it is not a ArchiveTeam project. ;) [17:31] I'm not super sure why https://web.archive.org/web/*/http://archiveofourown.org says that it's excluded, but it might be a manual exclusion? I would suggest reaching out to them [17:32] Which line do you mean? [17:32] ./works? [17:43] robots.txt is not supposed to be a regex format, that should match https://archiveofourown.org/works/718790?view_adult=true but not /works/718790 [17:44] Nemo_bis, whatever the case, their robots.txt or own action blocks them from viewing on archive.org/web/. This does not mean we don't archive them. [17:44] It just means those archives aren't publically viewable. [17:46] looks like infoworld 2004-09-13 is incomplete [17:47] also it has November 22 2004 in it [17:47] also november 22 2004 had all pages in its own id [17:49] *** pizzaiolo has quit IRC (Read error: Operation timed out) [17:53] rocode: "we"? [17:54] Does an "Allow" line for User-agent: ia_archiver have any effect nowadays [17:54] The FAQ no longer mentions the user-agent https://archive.org/about/faq.php?faq_id=2 [17:55] Nemo_bis, ArchiveTeam and the individuals composed therein. :) [17:56] ok [17:56] It helps to make other people aware of what gets archived this way, to avoid needless worries and redundant work. [17:56] I happen to specialize in webfiction, but by no means am I the only one grabbing it. See https://archive.org/details/savefanfiction [17:57] Hmm, are you saying you upload those crawls on https://archive.org/details/savefanfiction [17:57] Redundant work is okay in this situation, as it is all deduplicated and derived. [17:57] No, you can find my publicly available crawls here: https://archive.org/details/@rocode, as they finish. [17:58] Are you archiving http://www.finfanfun.fi/ [17:58] No. [17:58] Some fun stuff there. [17:58] I focus on english works. If you want, you can get a scaleway node and archive it yourself, and upload it to archive.org. :) [18:09] *** odemg has quit IRC (Remote host closed the connection) [18:12] *** odemg has joined #archiveteam-bs [18:16] I will reask due to the time that has passed, and hoping to make my point more clear: from what I looked at so far, I think it is possible to archive bulk data for somewhere between 10 and 40 years for 20$/TB all included. [18:17] Oh? [18:17] THe minimum where that price comes to play is about half a petabyte though [18:17] Can you elaborate? [18:17] COnsidering that to be 10k$, I don't think I could shoulder the kind of economics of scaling there [18:18] *** mkram_ is now known as mkram [18:18] I would do reed-solomon on 15 data + 5 parity, and use LTO6 Tape [18:19] mkram, http://www.archiveteam.org/index.php?title=Valhalla [18:21] The tape is then put into (commercial size) "tin" cans (actually no longer tin), together with desiccant, and then after a anti-rust coating (maybe some hot wax or such) put them into something strong enough to protect the coating from the earthen cellar in a side of a mountain. [18:21] The earth would be for keeping the temperature stable at very close to the yearly average temperature. [18:23] mkram, the tape reader would need to be 100% open hardware/open source. [18:23] rocode: thanks for getting me back to that page. Do you have any idea how to get a tape reader for that kind? [18:23] Because in 40 years that reader may no longer exist. [18:24] You can still buy Tape readers for pretty much exactly 20 year old tapes. [18:24] mkram, I linked that so you can edit in the details on the tape (commercial) row. ;) [18:24] I do not even know if 40 years are necessary [18:25] How long would it need to be storead anyway? [18:25] My only experience with old tape drive recovery is https://www.wikiwand.com/en/Lunar_Orbiter_Image_Recovery_Project [18:26] And that was a case of the tape reader literally no longer being in existence. [18:27] A complete working model was only created via surplus parts gathered over a 10 year period. [18:27] Tapes are useless unless they can be read. [18:27] Oh. Ok. Ofc they are useless in that case. [18:29] From looking at http://www.mkomo.com/cost-per-gigabyte-update I estimate 20 years until the price of storing at IA is about the same (20$/TB) [18:30] *** Oddy has joined #archiveteam-bs [18:30] *** schbirid has quit IRC (Quit: Leaving) [18:35] rocode: As I sincerely hope that it will be possible to make the "canned" data avaiable in the IA sooner than in 40 years, I do not think it would be necessary to be able to read the tape after such a long time. [18:36] rocode: were you involved with that project? [18:37] xmc: only tangentially. I worked at an electronic supply store in Colorado when the search was occurring for drive heads. We had a bunch of college students going through our bins. [18:38] ah nice [18:39] rocode: do you know by chance how many drives ever existed that could read that format? [18:40] mkram, no clue. I know that it was used by several government agencies and universities. I know the one working model they used for that project was recreated out of surplus parts that took about 10 years to find. [18:42] Ok. Then I think it would at least be easier to find old LTO6-compatible drives, if necessary. Do you think 20 years are too long to stay on one media type? [18:44] I think 10 years is probably an acceptable target goal for Valhalla. [18:49] mkram: I'm not sure if that Dropbox For Business section is up to date. Dropbox for Business definitely has an unlimited tier, my last company used it [18:49] That is easy. Do you think the price for about 10-20 years would be acceptible? THe issue is mostly that a single drive is 1500$ and writes at 160MB/s, so about 120MB/s of raw data ingress, not counting tape changing (every 6 or so hours), speed like in a VCR, automation about the same (maybe a stack of tapes and some sloped feeder and a few lego rubber wheels to get the tape to and from the point [18:49] hwere the drive grabs it from/ ejects it to. [18:49] nightpool: do consider that internet egress traffic (upload) is twice as expensive as the 10-20 year storage [18:50] mkram: definitely. It's much more availible though, and requires less sysadmin work and would be much more stable then a homebrew solution [18:50] Thankfully cable internet for buisness is only about 2$/TB on a big plan, with continuos streaming. [18:50] Not saying it's a great solution, and it would probably be pretty costly, but it's something to consider [18:51] The current Dropbox For Business plan is 20$ per user per month, minimum of 3 users, for "as much space as needed" [18:51] Does dropbox work without syncing to a local drive though? [18:52] I seriously doubt they do that long therm though... [18:52] ... I *think* so. [18:52] I would have to look into it [18:52] I know my last employer kept our database backups there. [18:52] about 60TB total. [18:53] Yes, I looked, the say as much space as you need. BUt that is like in first class airline can i get you *anything at all* (not anything) [18:53] 60TB is not that much though... [18:53] No, it's not a ton. [18:54] It does seem like a option to consider. [18:54] *** GE has quit IRC (Remote host closed the connection) [18:57] I plan to aquire such a drive for my own in about 2-3 months. Once i get around to figure out how to can it, and how dry i can keep it in the cans (otherwise good dessicant keeps the humidity at about 10% relative, too low acording to specs, but apart from acclimatizing before reading I dont know why that is bad) [18:59] I could offer to can data at about that rate, in sets of 37.5TB. For optimal canning 20 times as much, to be able to spread the redundancy over as much cans as possible. [18:59] in this channel: archiveteam discusses jams and other preserves [19:00] jams? [19:00] mkram: canning :D [19:02] *** SpaffGarg has joined #archiveteam-bs [19:02] Oh. I cust decided on cans as a likely viable solution because aluminium lined plasic bags (the aroma saver ind of food, like coffe or such) has a metal layer too thin to block enough water vapor diffusion and estimating high humidity in unclimatized natural cold locations, I do not think that they willhold up (after some somewhat rough calculations) [19:03] I would idealy stuff a small cotton bag of calcium oxide and 5-15 tapes (11*11*2cm) into a can. [19:05] >They'll probably get cranky as we near 100TB though. [19:05] that is what I fear. [19:06] IN that table the tape ranks at a low of 1-2$/TB/year [19:07] *** Oddy has quit IRC (Read error: Operation timed out) [19:09] nightpool: do you have an idea how to store such cans cold, without notable temerature cycling, for very cheap, and long enough? They are waterproof, though I would not vouch for more than 2m deep water pressure resistance... [19:10] I have no idea. I would say your best bet would be to try something experimentally and see what the results are like though [19:11] for nearly any of these more quote unquote "weird" solutions [19:12] Experiment with what? The water resistance of properly sealed cans has been tested for many decades, and experience with the reliability of Polyethylene Napthalate and Linear Serpentine Recorded Metal Particulate tape is also long. [19:13] ok, so stop debating it on irc and go do it [19:13] *** flipflop has joined #archiveteam-bs [19:13] right, what xmc said. results are always going to be the most convincing argument. [19:14] Ok, I will be back in 2-3 months then. [19:14] Also, apart from a bill of materials and the readily canned data on a picture, what kind of result do you hope for? [19:16] There will be obvious leak tests before storage and periodic checks on individual cans in a representative location, to determine the developement of tape health over time. But I do not think that will give any meaningful results in less than 3 years. [19:17] *** prokuz has joined #archiveteam-bs [19:18] xmc: so, if I do get that and archive the expectations, I could hope for it to be adopted as a way to Valhalla for the 10-20 years? [19:22] *** paparus has quit IRC (Read error: Operation timed out) [19:23] *** paparus has joined #archiveteam-bs [19:30] *** flipflop has quit IRC (Read error: Operation timed out) [19:31] *** prokuz has quit IRC (Read error: Operation timed out) [20:20] *** kurt_ has joined #archiveteam-bs [20:20] Those of you with OVH/SYS, they've got IPs in france [20:29] *** gw228 has joined #archiveteam-bs [20:29] *** gw228 has quit IRC (Client Quit) [20:31] *** odemg has quit IRC (Remote host closed the connection) [20:32] *** fie_ has quit IRC (Read error: Connection reset by peer) [20:32] *** odemg has joined #archiveteam-bs [20:33] This backup talk, just made me fish out an old CD-R backup I have, made around 2004. Successfully verified it. [20:41] *** GE has joined #archiveteam-bs [20:46] *** fie_ has joined #archiveteam-bs [20:50] *** inittux- has quit IRC (Quit: ZNC 1.6.4 - http://znc.in) [20:53] *** inittux has joined #archiveteam-bs [21:01] *** odemg has quit IRC (Remote host closed the connection) [21:01] *** nick321 has joined #archiveteam-bs [21:02] *** nick321 has quit IRC (Client Quit) [21:10] *** odemg has joined #archiveteam-bs [21:11] *** mistym has joined #archiveteam-bs [21:24] *** REiN^ has quit IRC (Read error: Operation timed out) [21:25] *** nick321 has joined #archiveteam-bs [21:26] *** nick321 has quit IRC (Client Quit) [21:28] *** nick321 has joined #archiveteam-bs [21:35] Hello. I'm trying to access old Compilers course from Coursera on archive.org. Yesterday I tried to go with pywb and downloaded archives, but discovered, that http://archiveteam.org/index.php?title=Coursera states courses as available on Wayback Machine, but I can't find a way to access them. Does anybody know, what page to look for? [21:36] *** REiN^ has joined #archiveteam-bs [21:37] off topic rant: I can't believe Windows doesn't have a goddamn "split" command. [21:38] it kind of does [21:38] most gnu utils have windows binaries [21:38] It has copy /b as a poor mans "cat". [21:39] heh http://stackoverflow.com/a/1002749 [21:39] naaah [21:39] "Program it yourself" [21:39] Windows finally adopting the UNIX mindset [21:39] I just booted up a Linux VM and shared the folder... even though shared folder performance is shit. [21:40] nick321: do thte pages appear in the wayback machine? [21:40] http://gnuwin32.sourceforge.net/ [21:40] JensRex, http://gnuwin32.sourceforge.net/packages/coreutils.htm [21:40] actual cat [21:41] nightpool: Nope, if we are talking about https://class.coursera.org/compilers-004/lecture [21:41] I know various solutions exist. I'm just baffled that they don't have such a trivial thing built in. [21:41] Powershell is literally cancer. [21:42] Random googling says for example, to get free drive space: "Get-WMIObject Win32_LogicalDisk | ForEach-Object {$_.freespace}" [21:42] theres the new linux on windows thing [21:42] As opposed to "du -h". [21:42] whatever that's called [21:43] Does the ubuntu on windows thing allow you to work on the local filesystem, or is it sandboxed? [21:43] local filesystem as far as i understand [21:43] i havent been able to try it [21:43] Neat. [21:44] The only reason I run Windows on my main computer is Steam and games. [21:44] https://www.howtogeek.com/265900/everything-you-can-do-with-windows-10s-new-bash-shell/ [21:44] likewise [21:44] nightpool: and that's where materials were supposed to be, according to spreadsheet, which is listed at that archiveteam.org link [21:44] especially since ubuntu 16.04 broke steam on linux for me [21:48] nick321, looking at https://web.archive.org/web/20150921170422/https://class.coursera.org/compilers-004 it says it got a redirect to a login page [21:49] might not have been able to grab it if a login was required [21:52] It could be this, or it's not the right page to look at. I'm suspecting latter, since https://gist.github.com/mihaitodor/b0d8c8dd824ab936c057508edec377ad this gist claims course materials to be in some archive [22:04] *** BlueMaxim has joined #archiveteam-bs [22:05] Yesterday I tried with pywb and webarchiveplayer on virtual machines, but both attempts failed, now I just unpack that 50GB-archive, but not sure, what to expect inside. [22:07] big json files [22:08] Thank you.Then I better stop it now, since I have no idea what to do with them. [22:13] Well, so no luck whatsoever. Could any of you suggest, how to approach this archive? [22:15] *** Asparagir has joined #archiveteam-bs [22:20] *** pizzaiolo has joined #archiveteam-bs [22:22] *** nick321 has quit IRC (So long.) [22:25] *** pizzaiolo has quit IRC (Remote host closed the connection) [22:27] *** pizzaiolo has joined #archiveteam-bs [22:28] *** odemg has quit IRC (Remote host closed the connection) [22:39] Goddamnit BackBlaze... Create file 2^20 bytes long. BackBlaze B2 says it's 1.07GB. Web interface says "1.1GB". Supoort people directly to useless KB article. [22:39] *2^30 [22:39] This GB vs. GiB nonsense in infuriating. [22:40] Agreed [22:40] s/directly/direct me [22:41] *** GE has quit IRC (Remote host closed the connection) [22:41] My hands and brain disagree about typing some times... [22:42] *** khu has joined #archiveteam-bs [22:42] Support person essentially said "we bill in bytes". Great. [22:43] Eh, in that context using base-10 is probably correct since money is base-10 [22:43] Though they probably advertise their rates in $/"GB", right? [22:44] Exactly. [22:44] And I get 10 "GB" free. [22:45] So I created a 10*2^30 byte VeraCrypt volume, and threw all the shit I'd take with me if my house was on fire in there. [22:46] I split it into 1 GB files, because they only allow you to download 1 GB (whatever that means) free per day. [22:48] *** icedice has joined #archiveteam-bs [22:48] Hmm... that's sneaky. But they're still the cheapest cloud storage by a large margin [22:54] *** BlueMaxim has quit IRC (Quit: Leaving) [23:00] *** odemg has joined #archiveteam-bs [23:01] MrRadar: sneaky, maybe, but it matches the types of costs these providers incur pretty well. [23:01] availibility is hard, and expensive. [23:01] offline storage is less so [23:02] Yeah, like I said it's probably the correct way to price it but they could be more transparent about how they count their data [23:04] what co you talking about [23:04] Backblaze [23:04] thx [23:05] They use base-10 units for counting data quantity [23:05] So 1000 bytes = 1 KB, 1000 KB = 1 MB, etc. [23:06] *** kurt_ has quit IRC (Quit: Connection closed for inactivity) [23:09] * khu shivers [23:10] w h y [23:11] Since they bill by the byte and money is in base-10 [23:11] Also drives are sold by base-10 capacity and that's one of their biggest costs [23:12] my brain has learned how to handle those numbers. it makes sense with physical media that the general public see in store and perceived size of numbers matter. but for a data company to add confusion [23:12] but now that you explain it [23:12] yeah it's probably drives [23:12] (my comment earlier was about the 1GB free downloads) [23:13] *** chfoo has quit IRC (Read error: Operation timed out) [23:13] not so bad. so does it get confusing [23:13] *** chfoo has joined #archiveteam-bs [23:15] *** khu has quit IRC () [23:18] Meh. I had to use CLI to create a Veracrypt volume in 10000000000 bytes. [23:46] Banhammer needed in #cheetoflee. *!*@cpc89770-stok19-2-0-cust1104.1-4.cable.virginm.net