[00:20] *** BlueMaxim has joined #archiveteam-bs [00:43] hook54321: I think I found it for you: http://anno.onb.ac.at/cgi-content/anno-plus?aid=wmw&datum=1938&page=5&size=45 [00:44] The problem was that the journal name was mis-stated [00:54] nope, that appears to be a book review, and I can't find it anyway [00:58] I found it, but yes, it's a book review: http://anno.onb.ac.at/cgi-content/anno-plus?aid=wmw&datum=1938&page=315&size=45 [01:12] *** phuzion has quit IRC (Remote host closed the connection) [01:16] *** fie has joined #archiveteam-bs [01:23] *** JesseW has joined #archiveteam-bs [01:43] *** wyatt8740 has quit IRC (Read error: Operation timed out) [01:51] *** hook54321 has quit IRC (Quit: Connection closed for inactivity) [02:28] *** phuzion has joined #archiveteam-bs [03:01] *** wyatt8740 has joined #archiveteam-bs [03:33] *** hook54321 has joined #archiveteam-bs [03:36] *** acridAxid has quit IRC (marauder) [03:37] *** acridAxid has joined #archiveteam-bs [03:45] *** RichardG_ has joined #archiveteam-bs [03:46] *** RichardG has quit IRC (Ping timeout: 258 seconds) [03:56] *** RichardG_ has quit IRC (Ping timeout: 260 seconds) [03:59] *** RichardG has joined #archiveteam-bs [04:07] *** RichardG_ has joined #archiveteam-bs [04:07] *** RichardG has quit IRC (Read error: Connection reset by peer) [04:08] *** RichardG_ is now known as RichardG [04:38] *** Sk1d has quit IRC (Ping timeout: 194 seconds) [04:46] *** Sk1d has joined #archiveteam-bs [06:03] JW_work, book review? O_o [06:11] If anyone is interested in discussing responses to the recent takeover of SSRN by Elsevier, feel free to join #ssave_rsn [06:35] *** JesseW has quit IRC (Ping timeout: 370 seconds) [06:38] *** vitzli has joined #archiveteam-bs [06:41] *** tomwsmf-a has quit IRC (Ping timeout: 258 seconds) [06:52] SketchCow, what does it mean for someone to be admin of their collections? [07:22] *** schbirid has joined #archiveteam-bs [08:00] *** BlueMaxim has quit IRC (Read error: Operation timed out) [08:02] *** BlueMaxim has joined #archiveteam-bs [08:09] *** metalcamp has joined #archiveteam-bs [09:32] *** BlueMaxim has quit IRC (Quit: Leaving) [10:01] *** hook54321 has quit IRC (Quit: Connection closed for inactivity) [10:52] *** SilSte has quit IRC (Remote host closed the connection) [11:40] *** ndiddy has quit IRC (Read error: Operation timed out) [11:57] *** SilSte has joined #archiveteam-bs [13:01] *** phuzion has quit IRC (Quit: Bye) [13:02] *** phuzion has joined #archiveteam-bs [13:11] *** phuzion has quit IRC (Quit: Bye) [13:13] *** phuzion has joined #archiveteam-bs [14:13] It's like a master of their domain. [14:14] IA has collections. Like "computermagazines" or "doggyepisodes", with items or another collection in them. [14:14] If someone uploads a bunch of stuff, we might make them in charge of a collection. [14:14] And then they can do stuff to it. [14:23] what is ssrn? [14:29] Social Science Research Network [14:51] Hiphop mixtapes went large on reddit/hackernews, so that definitely got attention and I'm seeing the ripples. [14:51] Not done giving the torrent list I have the once over, that's still going on. [14:52] The "console demos" on Internet Archive have also gone well, all working pretty OK and I'm trying to implement more before I head to Japan [15:47] *** tomwsmf-a has joined #archiveteam-bs [16:11] *** JesseW has joined #archiveteam-bs [16:19] *** JesseW has quit IRC (Ping timeout: 370 seconds) [16:28] *** vitzli has quit IRC (Quit: Leaving) [16:30] *** JesseW has joined #archiveteam-bs [16:38] *** Honno has joined #archiveteam-bs [16:38] *** JesseW has quit IRC (Ping timeout: 370 seconds) [16:51] *** Honno_ has quit IRC (Read error: Operation timed out) [17:41] *** tomwsmf-a has quit IRC (Read error: Operation timed out) [18:05] *** hook54321 has joined #archiveteam-bs [18:22] *** godane has quit IRC (Quit: Leaving.) [18:31] it's official [18:31] malware authors now have a better EOL policy than most startups [18:31] TeslaCrypt authors are going out of business [18:31] and published the universal master key [18:31] for decrypting everything that's encrypted with TeslaCrypt [18:32] cc SketchCow [18:32] TeslaCrypt is cryptlocker malware, I presume? [18:32] aye [18:32] Was. They shut down their C&C server [18:33] When a security researcher realized what was happening they asked Teslacrypt's online chat support if they would be willing to post the master key, which they did [18:33] Probably the most productive use of online chat support ever [18:35] in terms of dollar value per minute? ..., yeah. [19:10] wow [19:15] wow joepie91 [19:28] ..wow. [19:29] wait, does that mean that cryptolockers no longer pay well enough, or that the author just had enough money to retire for 2 lifetimes? [19:33] luckcolor, PurpleSym: Would that work though? If the IA would dark it unencrypted then why wouldn't they do the same to encrypted files, if they were made aware [19:33] And they might dark it for "spam" if they're not aware of the content (since it would be complete gibberish without the decryption key) [19:33] Also, they probably don’t see any value in encrypted files. [19:33] mmh [19:34] the ideas was just temporary i mean [19:34] it [19:34] sorry [19:34] this would work if he data like needs to saty encrypted fro a smal period of time [19:34] it's a risk though, because if they do that then the data is as good as gone [19:34] yeah [19:36] dont dump terabytes of encrypted data into IA please [19:37] i mean either someone has a big space were stuff can be stored or this project is not really going anywhere [19:42] if I had either 50TB, or a few thousand dollars sitting around, I would. But alas... :p [19:43] I tecjnically have 50T. What do we need to store this time? [19:43] tech* [19:43] phillipsj: I haven't the foggiest [19:43] zino: the discussion is about storing a copy of libgen [19:45] Ah. Well, that doesn't sound like a panic offloading. Not doing that then. [19:45] nope, not a panic offloading. [19:45] I thought it was Sci-Hub [19:45] or is that the same thing [19:46] SciHub is the tool for automatically passing requests for DOIs through to donated site accounts. libgen is the store of papers. [19:46] ah, okay [19:47] i dont worry about libgen [19:48] *** blahah has joined #archiveteam-bs [19:48] * blahah waves [19:48] Hi blahah! [19:48] I've been classified as bs -_- [19:48] I'm here to embrace it [19:48] lol it's just because #archiveteam is supposed to be a low-traffic channel [19:48] ah I see, very reasonable [19:48] that's why there's a separate channel for every little thing, this is the catch-all [19:48] yeah welcome here too [19:48] :P [19:49] I wonder how many pieces 50TB would have to be broken up into to be turned into comment field spam... [19:49] lmao [19:49] haha, I like that line of thinking [19:49] it's 50 million papers [19:49] I don't know anyone with access to a massive spamming operation — but that might be a good way to handle it. [19:49] so any level of atomicity between each paper stored separately, right up to all of them in a single archive, is tractable [19:50] well another idea could be to make the ouput data smaller [19:50] instead of gzipping the warc [19:50] yeah, is the 50TB before or after compression? [19:50] If you were to stuff them into comment boxes, you'd probably need to break them up by paragraph, not paper. [19:50] if we are going to use warc.gz it will be gzip [19:50] if it's "papers" then it should be easily compressable [19:50] use lzma 2 [19:51] It would be really entertaining to throw the entire contents of libgen into random blog comment boxes, though. [19:51] I imagine a lot of it would just get filtered :p [19:51] lel [19:51] right, but often filtered stuff remains in various logs, etc. [19:51] after auto-filters, administrator deletions, and fly-by-night websites disappearing, I think you'd not get a good retention :p [19:52] yeah [19:52] I think you'd get a much better retention than without doing it, though. [19:52] well, yeah, you spread out the SPOF at least :p [19:52] It would be highly painful to restore from, though. [19:53] another funny solution i've seen was to split the files in strings which you coul attach to bitcoin transactions comments: [19:53] problrms [19:53] for that matter, I don't know that this isn't *already being done* — you've seen the random semi-science-paper-sounding stuff inserted into various email spam, haven't you? Maybe that's from libgen [19:53] need loads of money [19:53] and will take years :P [19:54] printing it on t-shirts would probably be infeasible [19:55] 50tb is with moderate compression [19:55] comment spam, bitcoin comments, t-shirts... we're getting progressively more ridiculous :p [19:55] it's productive I think [19:55] for example the tshirt idea is not so crazy [19:55] except instead of t-shirts, some other kind of media [19:55] like dna [19:56] like... paper [19:56] something electronically readable [19:56] it's much easier to print 94 699 040 255 592 155 765 623 877 on t-shirts [19:56] but DNA is self-preserving and self-copying [19:56] and much smaller :) [19:57] ok so I like the comment idea - generalising that, we're talking about putting the data in some place where someone else will archive it for us because they are already doing that [19:57] ..maybe there's a paper on the subject? ;) [19:58] I mean, give it a few years and we can just store it in everyones homes. Storage scales faster than the writing of papers. My home backup machine is 4u and 84T. And it's 5 year old hardware. [19:59] nice [19:59] how much would it cost to put it on microsds? [19:59] alot [20:01] OK, here's another line of thinking [20:01] I'm a scientist and I can apply for funding, and I collaborate with much better scientists than me who can get more funding [20:01] zino: have you read DSHR's blog on the topic of storage costs? [20:02] e,g, http://blog.dshr.org/2016/05/the-future-of-storage.html [20:02] we're looking at what we can collaborate with the IA on, to bring in money for massive-scale storage of science stuff [20:02] do you guys have examples of important things that care currently too big for the IA to handle, that are (a) legal and (b) recognisable to normal people [20:02] guys / gals [20:02] YouTube [20:03] well youtube is just soo much [20:03] JW_work: I'll get on that. He seems to talk about things I spend my days thinking about. [20:03] $pornsites [20:03] 600pb + [20:03] yes [20:03] it's not too much for google [20:03] zino: yes his blog is VERY WORTH READING. [20:03] yeah apparently :P [20:03] and cold storing it is a different thing than hot storing [20:04] well they have you know [20:04] BIG datacenters [20:04] google has pockets deeper than the marianas trench [20:04] ok here's a question - can you store data inside youtube videos? [20:04] think so [20:04] yeah, video frames [20:04] steganography, frame by frame? [20:04] audio frames [20:04] oh you meant something else [20:04] never mind [20:05] no I meant that [20:05] you could put the data in video an the keep them private [20:05] you can, just as long as it retains integrity through compression [20:05] anyone know of an algorithm to put arbitrary data inside a video? [20:05] I didn't realize what this conversation was about so I should realllllly back away from this one [20:05] i think i read about that somewhere [20:05] yipdw: storing libgen somewhere :p [20:05] Frogging: yeah that's the issue in general with steganography [20:05] just download it and put it on disks or tapes [20:05] yipdw: I'm asking in abstract though, not just for that [20:06] I work in genomics where we deal with petabytes of dna sequencing data [20:06] oh wait [20:06] ... [20:06] http://archiveteam.org/index.php?title=INTERNETARCHIVE.BAK [20:06] we can encode anything as dna [20:06] relevant : http://web.cse.msstate.edu/~ramkumar/icassp01gang.pdf [20:06] blahah: then do it :p [20:06] blahah XD [20:06] can't believe I didn't think of that until now ;\ [20:08] blahah: Oh, are you one of the sciententists filling my disks with PB of bio data and then processing it with Perl scripts? :-) [20:08] zino: no, but I will join you in complaining about those people [20:08] perl is the bane of my life [20:08] blahah: Heh [20:08] but I do fill disks with PB of data [20:09] yum [20:09] perl is about the only language I refuse to use [20:09] this channel is awesome [20:09] lol [20:10] you are all heroes, the world will look back and recognise archiveteam as one of the most forward-thinking movements on the net [20:10] yeah [20:10] or put us all in prison (or both) [20:10] it's kind of crazy how people just take the internet for granted [20:10] To be fair, most of my scientists are well behaved physicists. The fetish for FORTRAN is a bit unsettling, but they try their best to make efficient code. [20:10] or burn [20:10] zino: you work at a sci place? [20:11] blahah: I work at a university in Sweden. [20:11] http://archiveteam.org/images/c/ce/Archive-all-the-things-thumb.jpg [20:11] love this meme [20:11] :P [20:11] zino: cool! lund? [20:12] blahah: Nope, Linköping. National Supercumputer Centre. [20:12] ooh [20:13] I did ship a bunch of hardware to Lund recently though, so I feel OK taking credit for anything good Lund does. :) [20:13] haha [20:13] lund has some interesting stuff going on [20:14] I think it's very unlikely you'll all end up in prison btw, or even any of you, for what you do as part of archiveteam [20:15] blaha: They do. We host a lot of their scientists doing post processing from the Max projects. Looking forward to what they will do after Max IV goes online. [20:15] copyright law is under the most scrutiny it has ever been - it can't last [20:16] does anyone here get involved politically in pushing for (c) change? [20:16] I think you all would make a great voice in the EU parliament for example [20:18] blahah: you said you analize dna. what type of research you do? I'm curious :P [20:18] if you can say of course [20:19] bukkakademia [20:19] sure, I'm not anonymous [20:20] I study photosynthesis, specifically how a naturally very efficient type of photosynthesis works at the genetic level, so we can make crops more efficient [20:20] cool [20:20] this is me: http://rik.smith-unna.com/ [20:21] I have mostly transitioned from studying plane genomes to getting very annoyed at how science works [20:21] *plant [20:22] you mean you are bored by your research? [20:22] no, more frustrated by the difference by how easy it should be, vs how difficult it is [20:23] ah understandable [20:23] the difficulty almost entirely stems from vast companies trying to make a fat pile of cash [20:23] the actual research is easy (once you've got your head into it) and very fun [20:23] but working inside academic bureaucracy, publishing, navigating the politics, all that stuff is excruciating [20:29] *** zgrant has joined #archiveteam-bs [20:29] well in don't have alot of experienc ein that sector but for sure it is [20:29] *i [20:29] *** zgrant has quit IRC (Client Quit) [20:31] blahah: how academics works* I think? [20:31] rather than science [20:31] [22:16] does anyone here get involved politically in pushing for (c) change? [20:31] depends how you define 'politically' [20:31] government etc., no [20:32] very actively arguing against it, writing stuff, etc, yes [20:34] *** brayden_ has quit IRC (Read error: Operation timed out) [20:34] joepie91: +1 on academia rather than science [20:37] blahah: well i'll be going now. thanks for the chat hope to se ya some time again on this irc. :P [20:38] luckcolor: thanks to you too - I'm happy to have found this place and will be sticking around :) [20:40] ok cya. [20:41] anyway i raccomend you if you haven't used irc before to install on amachine which is always online some kind of 24/7 irc client [20:41] like quassel [20:41] so you can always receive messages [20:42] reading the logs of a chat can become really tedious really quickly :P [20:43] that's my irc pro-tip (i'm not a pro though :D) [20:45] :) thanks - I used to live on irc as a teenager but that is depressingly long ago, the tools have moved on [20:45] I'm using irccloud, but happy to take recommendations for good things [20:49] I was going to say something about weechat as an irc client, but unless you are comfortable with linux and running a server somewhere, forget it [20:49] sometimes I forget the rest of the world is not always as geeky as me [20:50] no I'm fine with all that :) [20:50] Well for quassel is the same as IRC cloud you just need a server to run it on [20:51] And it's very easy to install and it has an official android client [20:51] oh nice [20:52] so both of those options run a daemon that stays in your channels for you, then sync messages with you when you connecy? [20:52] *connect [20:53] Yeah but the awesome thing about quassel is that you con figure basically everything using the client [20:53] And you don't have any limit on connections [20:54] Also you can con figure it to use SQL as datastore instead of the embedded SQLite db [20:55] If you ever need to do some madness using the logs [20:56] *** godane has joined #archiveteam-bs [20:56] But to generally installing it is just like [20:56] On debian [20:56] Apt install quassel-core [20:57] And then you connect using the core-client [20:57] And you are good to go [20:57] I will really go offline now lol [20:58] Cya [21:00] blahah, I am the Secretary for the Canadian Pirate Party. We have not been as active as I would have liked that past 3 years or so. [21:01] * phillipsj just has "downloading the Internet" on their to-do list at the moment [21:38] *** Madthias has joined #archiveteam-bs [21:40] *** schbirid has quit IRC (Quit: Leaving) [21:45] have you looked at Backblaze B2? Unless I'm counting it wrong, it appears to be $5/month to store a TB ($0.005/GB/month * 1000 GB) https://www.backblaze.com/b2/cloud-storage-pricing.html redundancy is not in the same ballpark as Amazon, Google or Microsoft, which may matter to you [21:47] *** metalcamp has quit IRC (Ping timeout: 244 seconds) [22:10] *** tomwsmf-a has joined #archiveteam-bs [22:16] *** Honno has quit IRC (Read error: Operation timed out) [22:19] any idea how to "Press Start" on this? https://archive.org/details/demo_badapple_flm_2012 [22:25] !a https://twitter.com/NOS/status/733234910807220225 [22:26] !a https://twitter.com/NOS/status/733171664486203393 [22:26] oh oops [22:26] *** incog has joined #archiveteam-bs [22:26] pressing 1 works for me.. [22:28] http://www.mess.org/mess/howto#console_emulation [22:31] *** hook54321 has quit IRC (Quit: Connection closed for inactivity) [22:37] *** Stiletto has quit IRC (Ping timeout: 244 seconds) [22:40] thanks, yes that works [22:47] *** atrocity has quit IRC (Ping timeout: 246 seconds) [23:08] *** JW_work has quit IRC (Read error: Operation timed out) [23:18] *** JW_work has joined #archiveteam-bs [23:20] regarding storage of 5TB. Actually, I wonder about paper. There *are* a lot of printers out there, many with relatively relaxed cost provisions. [23:20] If it was divided up, printing out 5TB of papers might be quite feasible… [23:20] and if it was printed in an easily OCRable format… [23:23] *** atrocity has joined #archiveteam-bs [23:30] do you guys rescan things every so often? (E.g. that github) [23:30] I know IA does [23:31] we don't generally, no [23:31] but does archive team? [23:31] ah [23:31] it'd be nice to set up a general system for doing that, though [23:36] *** BlueMaxim has joined #archiveteam-bs [23:50] JW_work: here's just such a system for storing data on paper: http://www.ollydbg.de/Paperbak/index.html [23:53] since the data in question is generally already in a printable form, I'd just use that [23:54] but paperback is neat [23:57] dashcloud: unfortunately it's windows-only and looks pretty clunky. [23:58] murk: eh, you can probably run it in the browser via the emulatrity! [23:58] and it's just a proof of concept, really [23:58] JW_work: that's got to be a horrible comment on the state of software development. [23:58] *** Stiletto has joined #archiveteam-bs [23:59] I prefer to think of it as a testment [23:59] or just make your own version- there's lots of simple compression schemes out there, and you could just print that on the page, and then process it using an existing program probably