Time |
Nickname |
Message |
00:01
π
|
Start |
http://www.wnd.com/2016/05/california-wants-copyrights-on-everything/ |
00:07
π
|
|
hive-mind has quit IRC (Ping timeout: 260 seconds) |
00:09
π
|
|
hive-mind has joined #archiveteam |
00:20
π
|
|
BlueMaxim has joined #archiveteam |
00:24
π
|
JW_work |
or for those who don't find wnd an appealing source, here's the press release they cribbed from: https://www.eff.org/deeplinks/2016/04/ab-2880 |
00:51
π
|
Fusl |
asking my question from yesterday <Fusl> is there an irc channel for yuku archive? |
01:12
π
|
MrRadar |
Fusl: I'm not sure. I've seen discussion of it in this one. Ping: arkiver |
01:12
π
|
|
phuzion has quit IRC (Remote host closed the connection) |
01:16
π
|
|
fie has joined #archiveteam |
01:23
π
|
|
JesseW has joined #archiveteam |
01:27
π
|
|
philpem has quit IRC (Ping timeout: 260 seconds) |
01:43
π
|
Fusl |
arkiver: copy-pasting what i dumped in here yesterday regarding yuku... https://scr.meo.ws/paste/2016-05-19-03-42-48-jeda5tEL.txt |
01:43
π
|
|
wyatt8740 has quit IRC (Read error: Operation timed out) |
01:51
π
|
|
hook54321 has quit IRC (Quit: Connection closed for inactivity) |
02:28
π
|
|
phuzion has joined #archiveteam |
02:37
π
|
|
MMovie1 has joined #archiveteam |
02:38
π
|
|
MMovie has quit IRC (Read error: Operation timed out) |
03:01
π
|
|
wyatt8740 has joined #archiveteam |
03:33
π
|
|
hook54321 has joined #archiveteam |
03:36
π
|
|
acridAxid has quit IRC (marauder) |
03:37
π
|
|
acridAxid has joined #archiveteam |
03:45
π
|
|
RichardG_ has joined #archiveteam |
03:46
π
|
|
RichardG has quit IRC (Ping timeout: 258 seconds) |
03:56
π
|
|
RichardG_ has quit IRC (Ping timeout: 260 seconds) |
03:59
π
|
|
RichardG has joined #archiveteam |
04:07
π
|
|
RichardG_ has joined #archiveteam |
04:07
π
|
|
RichardG has quit IRC (Read error: Connection reset by peer) |
04:08
π
|
|
RichardG_ is now known as RichardG |
04:38
π
|
|
Sk1d has quit IRC (Ping timeout: 194 seconds) |
04:46
π
|
|
Sk1d has joined #archiveteam |
04:47
π
|
|
BartoCH has quit IRC (Ping timeout: 260 seconds) |
04:54
π
|
|
BartoCH has joined #archiveteam |
06:15
π
|
|
blahah has joined #archiveteam |
06:35
π
|
|
JesseW has quit IRC (Ping timeout: 370 seconds) |
06:38
π
|
|
vitzli has joined #archiveteam |
06:41
π
|
|
tomwsmf-a has quit IRC (Ping timeout: 258 seconds) |
07:22
π
|
|
schbirid has joined #archiveteam |
07:48
π
|
|
ariscop has quit IRC (Read error: Operation timed out) |
08:00
π
|
|
BlueMaxim has quit IRC (Read error: Operation timed out) |
08:02
π
|
|
BlueMaxim has joined #archiveteam |
08:09
π
|
|
metalcamp has joined #archiveteam |
08:10
π
|
|
no2pencil has quit IRC (Read error: Operation timed out) |
08:11
π
|
|
no2pencil has joined #archiveteam |
08:41
π
|
|
WinterFox has joined #archiveteam |
08:53
π
|
|
ariscop has joined #archiveteam |
09:20
π
|
|
atomotic has joined #archiveteam |
09:32
π
|
|
BlueMaxim has quit IRC (Quit: Leaving) |
10:01
π
|
|
hook54321 has quit IRC (Quit: Connection closed for inactivity) |
10:32
π
|
|
atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com) |
10:52
π
|
|
SilSte has quit IRC (Remote host closed the connection) |
11:40
π
|
|
ndiddy has quit IRC (Read error: Operation timed out) |
11:57
π
|
|
SilSte has joined #archiveteam |
12:07
π
|
|
Morbus has joined #archiveteam |
12:28
π
|
|
WinterFox has quit IRC (Remote host closed the connection) |
13:01
π
|
|
phuzion has quit IRC (Quit: Bye) |
13:02
π
|
|
phuzion has joined #archiveteam |
13:11
π
|
|
phuzion has quit IRC (Quit: Bye) |
13:13
π
|
|
phuzion has joined #archiveteam |
13:29
π
|
blahah |
anyone here interested in archiving the BCC recipes site? |
13:29
π
|
blahah |
someone has made a clone, and the code is open |
13:29
π
|
blahah |
but I feel like it would be safer in the archive, and distributed https://github.com/user24/auntiesrecipes |
13:56
π
|
MrRadar |
We already ran it through Archivebot |
13:57
π
|
blahah |
nice |
14:40
π
|
|
khaoohs has quit IRC (Read error: Connection reset by peer) |
15:47
π
|
|
tomwsmf-a has joined #archiveteam |
16:11
π
|
|
JesseW has joined #archiveteam |
16:19
π
|
|
JesseW has quit IRC (Ping timeout: 370 seconds) |
16:22
π
|
|
atomotic has joined #archiveteam |
16:28
π
|
|
vitzli has quit IRC (Quit: Leaving) |
16:28
π
|
Nemo_bis |
SSRN archival https://github.com/paultopia/scholaw/issues/1#issuecomment-220328277 |
16:30
π
|
|
JesseW has joined #archiveteam |
16:38
π
|
|
Honno has joined #archiveteam |
16:38
π
|
|
JesseW has quit IRC (Ping timeout: 370 seconds) |
16:51
π
|
|
Honno_ has quit IRC (Read error: Operation timed out) |
17:01
π
|
schbirid |
https://github.com/user24/auntiesrecipes |
17:06
π
|
|
atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com) |
17:16
π
|
|
Froggypwn has quit IRC (Quit: ~ Trillian Astra - www.trillian.im ~) |
17:21
π
|
|
philpem has joined #archiveteam |
17:22
π
|
|
Froggypwn has joined #archiveteam |
17:41
π
|
|
tomwsmf-a has quit IRC (Read error: Operation timed out) |
17:44
π
|
ranma |
do you guys rescan things every so often? (E.g. that github) |
17:53
π
|
MrRadar |
The news grabber constantly monitors over 800 news sites for new articles but otherwise everything we do is pretty much one-off |
18:05
π
|
|
hook54321 has joined #archiveteam |
18:08
π
|
blahah |
is anyone interested in archiving academic papers? |
18:08
π
|
blahah |
I know about the PDF sweep which will get stuff that is freely available |
18:08
π
|
blahah |
what about stuff that is not? |
18:10
π
|
PurpleSym |
Like scihub? |
18:11
π
|
blahah |
yes, like that |
18:12
π
|
blahah |
or generally any non-public papers |
18:13
π
|
PurpleSym |
Where would you put them? IA would have to take them down pretty quickly. |
18:14
π
|
blahah |
not sure - I guess a distributed archive would be best |
18:15
π
|
blahah |
the problem at the moment is that the torrent speeds are terrible because of the routing from russia to EU / US |
18:17
π
|
blahah |
so currently archiveteam puts everything on the internet archive? |
18:17
π
|
PurpleSym |
Yes. |
18:17
π
|
blahah |
I'm interested in where the line is for copyright |
18:17
π
|
blahah |
basically if someone is upset, they can request a takedown? |
18:18
π
|
PurpleSym |
Thatβs how it works right now. |
18:18
π
|
blahah |
ok |
18:22
π
|
|
godane has quit IRC (Quit: Leaving.) |
18:25
π
|
PurpleSym |
Also, your account might be taken down entirely if too many complaints arrive. |
18:30
π
|
blahah |
I see |
18:31
π
|
JW_work |
ArchiveTeam has a few things stored off of the Internet Archive (e.g. gittorious, the IA.BAK stuff, seeding of the URLteam results) and we are OK with more. |
18:31
π
|
JW_work |
But IA provides a very nice host for a lot of the stuff we grab. |
18:31
π
|
blahah |
are there any other giant hosts? |
18:32
π
|
JW_work |
what do you mean by "other giant hosts"? |
18:32
π
|
Frogging |
other hosts that are high capacity like IA, I assume |
18:33
π
|
JW_work |
There aren't any with similar political purposes, AFAIK. Others with similar capacity include Google, Microsoft, Amazon and the NSA. :-) |
18:34
π
|
JW_work |
others with similar aims include various national libraries |
18:35
π
|
Frogging |
google, microsoft, and amazon don't host shit for us :p |
18:35
π
|
JW_work |
I bet they do, we just may not know it. |
18:35
π
|
JW_work |
I really really doubt if google doesn't have an impressive fraction of IA's collections quietly sitting on their servers somewhere. |
18:35
π
|
JW_work |
It's not like they don't have the space. |
18:45
π
|
blahah |
there are places like CERN that have vast capacity too - they have zenodo for scientific data |
18:46
π
|
blahah |
yeah I did mean places with high capacity, and I was thinking specifically of places that welcome deposits |
18:46
π
|
JW_work |
good point |
18:53
π
|
blahah |
ok so the hypothetical scenario I put to you all is this... |
18:53
π
|
blahah |
scihub has copied about 50million papers that were previously locked behind a paywall |
18:53
π
|
blahah |
it's in the region of 50TB of data |
18:54
π
|
blahah |
if scihub were to be raided or otherwise dismantled in the future, what strategies could they hypothetically use to prevent the loss of all the data |
18:54
π
|
blahah |
ideas so far include to hide all the pdfs inside images using steganography, and archive them on flickr and other photo stores |
18:55
π
|
blahah |
or to disguise them as scientific datasets and archive them on scientific data archives |
18:55
π
|
PurpleSym |
IPFS? |
18:55
π
|
blahah |
to spread them out in tiny archives to lots of free http static hosts around the world |
18:55
π
|
JW_work |
both of those seem sensible to me |
18:55
π
|
blahah |
IPFS is also on the table, but it requires people willing to join the swarm |
18:56
π
|
PurpleSym |
Thatβs always the problem with distributed solutions. |
19:00
π
|
blahah |
any other crazy ideas? |
19:00
π
|
blahah |
or not crazy |
19:01
π
|
Frogging |
this seems relevant (though perhaps not useful) http://www.archiveteam.org/index.php?title=Valhalla |
19:02
π
|
Frogging |
it's the same question; "where can we put big things, other than the Archive" |
19:02
π
|
PurpleSym |
Universities tend to have lots of storage as well. Might be worth asking them to β silently β host the data. |
19:06
π
|
JW_work |
50TB, at US$100 / TB is $5,000. |
19:07
π
|
JW_work |
which isn't cheap, but isn't completely unreasonable either |
19:09
π
|
JW_work |
what does Amazon Glacier charge? |
19:11
π
|
JW_work |
The basic issue is maintaining the doublethink of "there's this data β I don't know what it is, I can't access it, I certainly don't have any reason to think it is illegal β but if someone happens to want it, sometime in the future, I will keep it for them" |
19:12
π
|
xmc |
glacier for 50T is USD$350/month => USD$4,200/yr |
19:13
π
|
midas |
glacier is expensive for downloading data |
19:14
π
|
PurpleSym |
And 4300$ for retrieval bandwidth. |
19:17
π
|
MrRadar |
Backblaze B2 is even cheaper than Amazon Glacier at $0.005/GB/month |
19:21
π
|
midas |
1PB would cost just 60k/year, if we just stuff it full :p |
19:21
π
|
PurpleSym |
Dedicated box at OVH: 0.008β¬/GB/month. (12x4TB/Softraid) |
19:22
π
|
Frogging |
is that supposed to be euros? |
19:22
π
|
Frogging |
or did you mean dollars |
19:23
π
|
PurpleSym |
Yes, Euro. |
19:23
π
|
Frogging |
that'd be $5376.72 USD per year for 50T |
19:24
π
|
Frogging |
(for comparison's sake) |
19:24
π
|
PurpleSym |
And dedicated box at Hetzner: 0.003β¬/GB/month. (15x6TB) |
19:25
π
|
PurpleSym |
(includes 100TB bandwidth) |
19:26
π
|
PurpleSym |
~$2000 USD/year for 50T. |
19:26
π
|
Frogging |
that's not terrible |
19:27
π
|
PurpleSym |
Note that you canβt get βjustβ 50T though. Itβs all or nothing. |
19:28
π
|
luckcolor |
how about you upload the encrypted archives on archive.org |
19:28
π
|
luckcolor |
and then when the site closes you can release the decryption key |
19:28
π
|
HCross |
#archiveteam-bs |
19:29
π
|
luckcolor |
agree |
19:29
π
|
HCross |
WOOP WOOP Off topic |
19:29
π
|
Frogging |
ok |
19:31
π
|
luckcolor |
The offtopic alarm has been triggered |
19:31
π
|
luckcolor |
:P |
19:33
π
|
Frogging |
blahah: join #archiveteam-bs |
19:46
π
|
blahah |
sorry was putting kid to bed. I was trying to calculate S3 costs earlier - seemed silly |
19:46
π
|
blahah |
JW_work: the doublethink is spot one |
19:46
π
|
blahah |
*on |
19:46
π
|
luckcolor |
no s3 it's probably not worth it |
19:46
π
|
blahah |
there are two basic scenarios: someone knowingly hosts the data, or someone hosts it while being ignorant of the contents |
19:47
π
|
* |
Frogging points at #archiveteam-bs |
19:47
π
|
blahah |
luckcolor: yeah I realised that eventually |
19:47
π
|
blahah |
luckcolor: I was thinking along similar lines for encrypted stuff |
19:47
π
|
JW_work |
blahah: please join #archiveteam-bs and discuss it there, not here |
19:47
π
|
blahah |
also works with any place that will archive data |
19:47
π
|
blahah |
ok sorry |
20:22
π
|
|
ariscop has quit IRC (Ping timeout: 506 seconds) |
20:30
π
|
|
zgrant has joined #archiveteam |
20:31
π
|
|
zgrant has quit IRC (Client Quit) |
20:34
π
|
|
brayden_ has quit IRC (Read error: Operation timed out) |
20:52
π
|
|
ariscop has joined #archiveteam |
20:56
π
|
|
godane has joined #archiveteam |
21:16
π
|
|
khaoohs has joined #archiveteam |
21:38
π
|
|
Madthias has joined #archiveteam |
21:40
π
|
|
schbirid has quit IRC (Quit: Leaving) |
21:47
π
|
|
metalcamp has quit IRC (Ping timeout: 244 seconds) |
22:03
π
|
|
zgrant has joined #archiveteam |
22:09
π
|
|
incog has joined #archiveteam |
22:09
π
|
incog |
anybody got a scrape of kuro5hin? |
22:09
π
|
incog |
im coming up blank with the usual searches |
22:10
π
|
|
tomwsmf-a has joined #archiveteam |
22:12
π
|
incog |
no wayback no cache |
22:13
π
|
incog |
im looking for the ogg frog zines and a specific article on xanga being a ghetto botnet due to an exploited vuln |
22:13
π
|
incog |
these were the only places as far as i know they were |
22:16
π
|
incog |
used to be at http://www.kuro5hin.org/story/2004/12/28/161214/43 |
22:16
π
|
|
Honno has quit IRC (Read error: Operation timed out) |
22:16
π
|
|
zgrant has quit IRC (Quit: http://chat.efnet.org (EOF)) |
22:31
π
|
|
hook54321 has quit IRC (Quit: Connection closed for inactivity) |
22:37
π
|
|
Stiletto has quit IRC (Ping timeout: 244 seconds) |
22:40
π
|
incog |
http://k5.semantic-db.org/diary-slurp/161942--archive-diaries--html-diaries--nested-format.zip |
22:40
π
|
incog |
found smth |
22:41
π
|
JW_work |
yeah, I remembered there was something, but couldn't remember the details |
22:47
π
|
|
atrocity has quit IRC (Ping timeout: 246 seconds) |
22:56
π
|
incog |
http://archive.is/mtpf oh here it is |
23:06
π
|
incog |
still no ogg frog, oh well |
23:08
π
|
|
JW_work has quit IRC (Read error: Operation timed out) |
23:11
π
|
incog |
http://atdt.freeshell.org/k5/ |
23:18
π
|
|
JW_work has joined #archiveteam |
23:23
π
|
|
atrocity has joined #archiveteam |
23:36
π
|
|
BlueMaxim has joined #archiveteam |
23:58
π
|
|
Stiletto has joined #archiveteam |