[00:01] http://www.wnd.com/2016/05/california-wants-copyrights-on-everything/ [00:07] *** hive-mind has quit IRC (Ping timeout: 260 seconds) [00:09] *** hive-mind has joined #archiveteam [00:20] *** BlueMaxim has joined #archiveteam [00:24] or for those who don't find wnd an appealing source, here's the press release they cribbed from: https://www.eff.org/deeplinks/2016/04/ab-2880 [00:51] asking my question from yesterday is there an irc channel for yuku archive? [01:12] Fusl: I'm not sure. I've seen discussion of it in this one. Ping: arkiver [01:12] *** phuzion has quit IRC (Remote host closed the connection) [01:16] *** fie has joined #archiveteam [01:23] *** JesseW has joined #archiveteam [01:27] *** philpem has quit IRC (Ping timeout: 260 seconds) [01:43] arkiver: copy-pasting what i dumped in here yesterday regarding yuku... https://scr.meo.ws/paste/2016-05-19-03-42-48-jeda5tEL.txt [01:43] *** wyatt8740 has quit IRC (Read error: Operation timed out) [01:51] *** hook54321 has quit IRC (Quit: Connection closed for inactivity) [02:28] *** phuzion has joined #archiveteam [02:37] *** MMovie1 has joined #archiveteam [02:38] *** MMovie has quit IRC (Read error: Operation timed out) [03:01] *** wyatt8740 has joined #archiveteam [03:33] *** hook54321 has joined #archiveteam [03:36] *** acridAxid has quit IRC (marauder) [03:37] *** acridAxid has joined #archiveteam [03:45] *** RichardG_ has joined #archiveteam [03:46] *** RichardG has quit IRC (Ping timeout: 258 seconds) [03:56] *** RichardG_ has quit IRC (Ping timeout: 260 seconds) [03:59] *** RichardG has joined #archiveteam [04:07] *** RichardG_ has joined #archiveteam [04:07] *** RichardG has quit IRC (Read error: Connection reset by peer) [04:08] *** RichardG_ is now known as RichardG [04:38] *** Sk1d has quit IRC (Ping timeout: 194 seconds) [04:46] *** Sk1d has joined #archiveteam [04:47] *** BartoCH has quit IRC (Ping timeout: 260 seconds) [04:54] *** BartoCH has joined #archiveteam [06:15] *** blahah has joined #archiveteam [06:35] *** JesseW has quit IRC (Ping timeout: 370 seconds) [06:38] *** vitzli has joined #archiveteam [06:41] *** tomwsmf-a has quit IRC (Ping timeout: 258 seconds) [07:22] *** schbirid has joined #archiveteam [07:48] *** ariscop has quit IRC (Read error: Operation timed out) [08:00] *** BlueMaxim has quit IRC (Read error: Operation timed out) [08:02] *** BlueMaxim has joined #archiveteam [08:09] *** metalcamp has joined #archiveteam [08:10] *** no2pencil has quit IRC (Read error: Operation timed out) [08:11] *** no2pencil has joined #archiveteam [08:41] *** WinterFox has joined #archiveteam [08:53] *** ariscop has joined #archiveteam [09:20] *** atomotic has joined #archiveteam [09:32] *** BlueMaxim has quit IRC (Quit: Leaving) [10:01] *** hook54321 has quit IRC (Quit: Connection closed for inactivity) [10:32] *** atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com) [10:52] *** SilSte has quit IRC (Remote host closed the connection) [11:40] *** ndiddy has quit IRC (Read error: Operation timed out) [11:57] *** SilSte has joined #archiveteam [12:07] *** Morbus has joined #archiveteam [12:28] *** WinterFox has quit IRC (Remote host closed the connection) [13:01] *** phuzion has quit IRC (Quit: Bye) [13:02] *** phuzion has joined #archiveteam [13:11] *** phuzion has quit IRC (Quit: Bye) [13:13] *** phuzion has joined #archiveteam [13:29] anyone here interested in archiving the BCC recipes site? [13:29] someone has made a clone, and the code is open [13:29] but I feel like it would be safer in the archive, and distributed https://github.com/user24/auntiesrecipes [13:56] We already ran it through Archivebot [13:57] nice [14:40] *** khaoohs has quit IRC (Read error: Connection reset by peer) [15:47] *** tomwsmf-a has joined #archiveteam [16:11] *** JesseW has joined #archiveteam [16:19] *** JesseW has quit IRC (Ping timeout: 370 seconds) [16:22] *** atomotic has joined #archiveteam [16:28] *** vitzli has quit IRC (Quit: Leaving) [16:28] SSRN archival https://github.com/paultopia/scholaw/issues/1#issuecomment-220328277 [16:30] *** JesseW has joined #archiveteam [16:38] *** Honno has joined #archiveteam [16:38] *** JesseW has quit IRC (Ping timeout: 370 seconds) [16:51] *** Honno_ has quit IRC (Read error: Operation timed out) [17:01] https://github.com/user24/auntiesrecipes [17:06] *** atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com) [17:16] *** Froggypwn has quit IRC (Quit: ~ Trillian Astra - www.trillian.im ~) [17:21] *** philpem has joined #archiveteam [17:22] *** Froggypwn has joined #archiveteam [17:41] *** tomwsmf-a has quit IRC (Read error: Operation timed out) [17:44] do you guys rescan things every so often? (E.g. that github) [17:53] The news grabber constantly monitors over 800 news sites for new articles but otherwise everything we do is pretty much one-off [18:05] *** hook54321 has joined #archiveteam [18:08] is anyone interested in archiving academic papers? [18:08] I know about the PDF sweep which will get stuff that is freely available [18:08] what about stuff that is not? [18:10] Like scihub? [18:11] yes, like that [18:12] or generally any non-public papers [18:13] Where would you put them? IA would have to take them down pretty quickly. [18:14] not sure - I guess a distributed archive would be best [18:15] the problem at the moment is that the torrent speeds are terrible because of the routing from russia to EU / US [18:17] so currently archiveteam puts everything on the internet archive? [18:17] Yes. [18:17] I'm interested in where the line is for copyright [18:17] basically if someone is upset, they can request a takedown? [18:18] That’s how it works right now. [18:18] ok [18:22] *** godane has quit IRC (Quit: Leaving.) [18:25] Also, your account might be taken down entirely if too many complaints arrive. [18:30] I see [18:31] ArchiveTeam has a few things stored off of the Internet Archive (e.g. gittorious, the IA.BAK stuff, seeding of the URLteam results) and we are OK with more. [18:31] But IA provides a very nice host for a lot of the stuff we grab. [18:31] are there any other giant hosts? [18:32] what do you mean by "other giant hosts"? [18:32] other hosts that are high capacity like IA, I assume [18:33] There aren't any with similar political purposes, AFAIK. Others with similar capacity include Google, Microsoft, Amazon and the NSA. :-) [18:34] others with similar aims include various national libraries [18:35] google, microsoft, and amazon don't host shit for us :p [18:35] I bet they do, we just may not know it. [18:35] I really really doubt if google doesn't have an impressive fraction of IA's collections quietly sitting on their servers somewhere. [18:35] It's not like they don't have the space. [18:45] there are places like CERN that have vast capacity too - they have zenodo for scientific data [18:46] yeah I did mean places with high capacity, and I was thinking specifically of places that welcome deposits [18:46] good point [18:53] ok so the hypothetical scenario I put to you all is this... [18:53] scihub has copied about 50million papers that were previously locked behind a paywall [18:53] it's in the region of 50TB of data [18:54] if scihub were to be raided or otherwise dismantled in the future, what strategies could they hypothetically use to prevent the loss of all the data [18:54] ideas so far include to hide all the pdfs inside images using steganography, and archive them on flickr and other photo stores [18:55] or to disguise them as scientific datasets and archive them on scientific data archives [18:55] IPFS? [18:55] to spread them out in tiny archives to lots of free http static hosts around the world [18:55] both of those seem sensible to me [18:55] IPFS is also on the table, but it requires people willing to join the swarm [18:56] That’s always the problem with distributed solutions. [19:00] any other crazy ideas? [19:00] or not crazy [19:01] this seems relevant (though perhaps not useful) http://www.archiveteam.org/index.php?title=Valhalla [19:02] it's the same question; "where can we put big things, other than the Archive" [19:02] Universities tend to have lots of storage as well. Might be worth asking them to – silently – host the data. [19:06] 50TB, at US$100 / TB is $5,000. [19:07] which isn't cheap, but isn't completely unreasonable either [19:09] what does Amazon Glacier charge? [19:11] The basic issue is maintaining the doublethink of "there's this data — I don't know what it is, I can't access it, I certainly don't have any reason to think it is illegal — but if someone happens to want it, sometime in the future, I will keep it for them" [19:12] glacier for 50T is USD$350/month => USD$4,200/yr [19:13] glacier is expensive for downloading data [19:14] And 4300$ for retrieval bandwidth. [19:17] Backblaze B2 is even cheaper than Amazon Glacier at $0.005/GB/month [19:21] 1PB would cost just 60k/year, if we just stuff it full :p [19:21] Dedicated box at OVH: 0.008€/GB/month. (12x4TB/Softraid) [19:22] is that supposed to be euros? [19:22] or did you mean dollars [19:23] Yes, Euro. [19:23] that'd be $5376.72 USD per year for 50T [19:24] (for comparison's sake) [19:24] And dedicated box at Hetzner: 0.003€/GB/month. (15x6TB) [19:25] (includes 100TB bandwidth) [19:26] ~$2000 USD/year for 50T. [19:26] that's not terrible [19:27] Note that you can’t get “just” 50T though. It’s all or nothing. [19:28] how about you upload the encrypted archives on archive.org [19:28] and then when the site closes you can release the decryption key [19:28] #archiveteam-bs [19:29] agree [19:29] WOOP WOOP Off topic [19:29] ok [19:31] The offtopic alarm has been triggered [19:31] :P [19:33] blahah: join #archiveteam-bs [19:46] sorry was putting kid to bed. I was trying to calculate S3 costs earlier - seemed silly [19:46] JW_work: the doublethink is spot one [19:46] *on [19:46] no s3 it's probably not worth it [19:46] there are two basic scenarios: someone knowingly hosts the data, or someone hosts it while being ignorant of the contents [19:47] * Frogging points at #archiveteam-bs [19:47] luckcolor: yeah I realised that eventually [19:47] luckcolor: I was thinking along similar lines for encrypted stuff [19:47] blahah: please join #archiveteam-bs and discuss it there, not here [19:47] also works with any place that will archive data [19:47] ok sorry [20:22] *** ariscop has quit IRC (Ping timeout: 506 seconds) [20:30] *** zgrant has joined #archiveteam [20:31] *** zgrant has quit IRC (Client Quit) [20:34] *** brayden_ has quit IRC (Read error: Operation timed out) [20:52] *** ariscop has joined #archiveteam [20:56] *** godane has joined #archiveteam [21:16] *** khaoohs has joined #archiveteam [21:38] *** Madthias has joined #archiveteam [21:40] *** schbirid has quit IRC (Quit: Leaving) [21:47] *** metalcamp has quit IRC (Ping timeout: 244 seconds) [22:03] *** zgrant has joined #archiveteam [22:09] *** incog has joined #archiveteam [22:09] anybody got a scrape of kuro5hin? [22:09] im coming up blank with the usual searches [22:10] *** tomwsmf-a has joined #archiveteam [22:12] no wayback no cache [22:13] im looking for the ogg frog zines and a specific article on xanga being a ghetto botnet due to an exploited vuln [22:13] these were the only places as far as i know they were [22:16] used to be at http://www.kuro5hin.org/story/2004/12/28/161214/43 [22:16] *** Honno has quit IRC (Read error: Operation timed out) [22:16] *** zgrant has quit IRC (Quit: http://chat.efnet.org (EOF)) [22:31] *** hook54321 has quit IRC (Quit: Connection closed for inactivity) [22:37] *** Stiletto has quit IRC (Ping timeout: 244 seconds) [22:40] http://k5.semantic-db.org/diary-slurp/161942--archive-diaries--html-diaries--nested-format.zip [22:40] found smth [22:41] yeah, I remembered there was something, but couldn't remember the details [22:47] *** atrocity has quit IRC (Ping timeout: 246 seconds) [22:56] http://archive.is/mtpf oh here it is [23:06] still no ogg frog, oh well [23:08] *** JW_work has quit IRC (Read error: Operation timed out) [23:11] http://atdt.freeshell.org/k5/ [23:18] *** JW_work has joined #archiveteam [23:23] *** atrocity has joined #archiveteam [23:36] *** BlueMaxim has joined #archiveteam [23:58] *** Stiletto has joined #archiveteam