#archiveteam-bs 2017-09-24,Sun

↑back Search

Time	Nickname	Message
00:04 ^🔗		Odd0002 has joined #archiveteam-bs
00:10 ^🔗		Odd0002 has quit IRC (Quit: ZNC - http://znc.in)
00:11 ^🔗		Odd0002 has joined #archiveteam-bs
00:18 ^🔗		Odd0002 has quit IRC (ZNC - http://znc.in)
00:19 ^🔗		Odd0002 has joined #archiveteam-bs
00:28 ^🔗	dd0a13f37	Anyone want to download all the issues of Dagens Nyheter and Dagens Industri? I can get all the pdf links and the auth cookie, but they're 80-90mb a piece and archivebot won't handle auth cookie
00:29 ^🔗	dd0a13f37	1 per day, 3 different papers total, only the last 3 years would be 100gb
00:34 ^🔗	dd0a13f37	They're both fairly large newspapers in sweden
00:50 ^🔗	VADemon	Now that's something I'd happily do. Yet 1 per day and the 3 last years?
00:51 ^🔗	dd0a13f37	What do you mean? They're a newspaper, these are pdf renders of it
00:52 ^🔗	dd0a13f37	There's only 1/day, Dagens industri only has going back last 3 years (DN has longer)
00:53 ^🔗	VADemon	one issue of the newspaper per day or one download per day (rate limit)
01:00 ^🔗	dd0a13f37	One issue per day
01:00 ^🔗	dd0a13f37	Gets published
01:02 ^🔗		BlueMaxim has quit IRC (Quit: Leaving)
01:03 ^🔗		Aranje has joined #archiveteam-bs
01:04 ^🔗	second	mundus: Did you convert these ebooks to epub or is that the original?
01:04 ^🔗	mundus	what?
01:05 ^🔗	second	http://dh.mundus.xyz/requests/Stephen%20King%20-%20The%20Dark%20Tower%20Series/
01:05 ^🔗	mundus	that's what I downloaded from bib
01:05 ^🔗	second	What is bib?
01:05 ^🔗	mundus	biblotik, largest ebook tracker
01:06 ^🔗		refeed has joined #archiveteam-bs
01:06 ^🔗	dd0a13f37	Isn't libgen larger?
01:07 ^🔗	second	mundus: and how might I get access to this?
01:07 ^🔗	mundus	dunno
01:07 ^🔗	mundus	find an invite thread on other trackers
01:08 ^🔗	mundus	dd0a13f37, I thought bib was the biggest, not heard of libgen
01:08 ^🔗	dd0a13f37	2 million books (science/technology) + fiction + nearly all scientifical journals
01:08 ^🔗		swebb has quit IRC (Read error: Operation timed out)
01:09 ^🔗		atlogbot has quit IRC (Read error: Operation timed out)
01:10 ^🔗	mundus	Torrents: 296700
01:10 ^🔗	mundus	spose so
01:12 ^🔗	dd0a13f37	libgen.io
01:12 ^🔗	mundus	ah, libgen isn't a tracker
01:13 ^🔗	second	What is in this 2015.tar.gz ? mundus
01:13 ^🔗	mundus	pack of books
01:13 ^🔗	dd0a13f37	VADemon: my.mixtape.moe/fwijcd.txt my.mixtape.moe/eazdrw.txt my.mixtape.moe/oghmyp.txt see query for auth cookie
01:14 ^🔗	mundus	https://mirror.mundus.xyz/drive/Archives/TrainPacks/
01:14 ^🔗	second	mundus: you don't have any bandwith limits do you?
01:14 ^🔗	mundus	no
01:14 ^🔗	second	hmm
01:14 ^🔗	second	mundus: are there duplicates in these packs?
01:15 ^🔗	mundus	no
01:16 ^🔗	dd0a13f37	It's not a tracker, but it operates in the same way
01:16 ^🔗	second	ooh boy
01:16 ^🔗	second	mundus: is there metadata with these books?
01:16 ^🔗	second	mundus: mind if I grab?
01:16 ^🔗	mundus	dd0a13f37, so it follows DMCA takedowns?
01:17 ^🔗	mundus	I don't know
01:17 ^🔗	dd0a13f37	No
01:17 ^🔗	mundus	go ahead
01:17 ^🔗	second	thank you
01:17 ^🔗	dd0a13f37	They don't care, their hosting is in the seychelles and domain in indian ocean
01:17 ^🔗		r3c0d3x has quit IRC (Ping timeout: 260 seconds)
01:17 ^🔗	second	mundus: what is here? https://mirror.mundus.xyz/drive/Keys/
01:18 ^🔗	mundus	keys lol
01:18 ^🔗	second	Many of your dirs 404
01:18 ^🔗	second	I can't read it though...
01:18 ^🔗	mundus	if they 404 refresh
01:19 ^🔗	second	keys requires a password
01:19 ^🔗	dd0a13f37	Those should not be in a web-exposed directory, even if it has password auth
01:19 ^🔗	mundus	meh
01:20 ^🔗		r3c0d3x has joined #archiveteam-bs
01:20 ^🔗	second	mundus: see https://mirror.mundus.xyz/drive/Pictures/
01:20 ^🔗	mundus	Yeah, that has a password
01:20 ^🔗	dd0a13f37	If you want large amounts of books as torrents you can download libgen's torrents, you can also download their database (only 6gb), that contains hashes of all the books in the most common formats (when they coded it)
01:20 ^🔗	dd0a13f37	MD5, TTH, ED2K, something more
01:20 ^🔗	mundus	because I don't need any of you fucks seein my pics
01:21 ^🔗	dd0a13f37	You can use that to bulk classify unsorted books
01:22 ^🔗		Aranje has quit IRC (Ping timeout: 506 seconds)
01:24 ^🔗	second	mundus: what kind of pics?
01:25 ^🔗	mundus	of my family?
01:27 ^🔗	second	ahh, so you have family, ok
01:27 ^🔗	mundus	eh
01:28 ^🔗	second	its fine, good to have family
01:28 ^🔗	second	mundus: can you unlock this https://mirror.mundus.xyz/drive/Keys/
01:28 ^🔗	second	And thanks! :D
01:28 ^🔗	mundus	no
01:28 ^🔗	second	aww
01:28 ^🔗	second	is there a key to it?
01:28 ^🔗	mundus	it's like ssh keys
01:28 ^🔗	Frogging	second: why are you asking dumb questions
01:28 ^🔗	second	oooo
01:29 ^🔗	second	mundus: I thought it was software keys and stuff, nvm
01:29 ^🔗	second	Frogging: curious
01:30 ^🔗	second	mundus: thank you very very much for the data
01:30 ^🔗	dd0a13f37	sure hope there's no timing attacks in the password protection or anything like that, as we all know vulnerabilities in obscure features added as an afterthought are extremely rare
01:30 ^🔗		BlueMaxim has joined #archiveteam-bs
01:30 ^🔗	mundus	if someone pwnd it I would give zero fucks
01:31 ^🔗		r3c0d3x has quit IRC (Read error: Connection timed out)
01:31 ^🔗		r3c0d3x has joined #archiveteam-bs
01:32 ^🔗	second	mundus: how big is this directory if its not going to take you too much work to figure out https://mirror.mundus.xyz/drive/Manga/
01:33 ^🔗	mundus	538GB
01:33 ^🔗	second	thank you
01:33 ^🔗	Frogging	this is cool mundus, thanks for sharing
01:33 ^🔗	mundus	np
01:34 ^🔗	dd0a13f37	You should upload it somewhere
01:34 ^🔗	Frogging	like where
01:34 ^🔗	mundus	what?
01:34 ^🔗	second	it is uploaded "somewhere"
01:34 ^🔗	dd0a13f37	http://libgen.io/comics/index.php
01:34 ^🔗	dd0a13f37	The contents of the drive
01:34 ^🔗	mundus	...
01:35 ^🔗	Frogging	what?
01:35 ^🔗	dd0a13f37	???
01:35 ^🔗	dd0a13f37	Libgen accepts larger uploads via FTP
01:35 ^🔗	mundus	You're saying upload the trainpacks stuff?
01:36 ^🔗	dd0a13f37	Yes, or would they already have it all?
01:36 ^🔗	mundus	ohh
01:36 ^🔗	dd0a13f37	And also the manga, I think they would accept it under comics
01:36 ^🔗	mundus	I thought you were saying the hwole thing
01:36 ^🔗	mundus	isn't manga video
01:37 ^🔗	mundus	or is that comics
01:37 ^🔗	dd0a13f37	Anime is video, manga is comics
01:38 ^🔗	mundus	uh I wish there was a better way I could host this stuff
01:38 ^🔗	mundus	I get too much traffic
01:38 ^🔗	dd0a13f37	Torrents?
01:38 ^🔗	Frogging	where's it hosted now?
01:38 ^🔗	mundus	Google Drive + Caching FS + Scaleway
01:39 ^🔗	mundus	But I have no money
01:39 ^🔗	mundus	so
01:39 ^🔗	mundus	this is the result
01:39 ^🔗	dd0a13f37	Is google drive the backend? What?
01:39 ^🔗	mundus	yes
01:39 ^🔗	dd0a13f37	Isn't that expensive as fuck?
01:39 ^🔗	mundus	No
01:39 ^🔗	dd0a13f37	Or are you using lots of accounts?
01:39 ^🔗	mundus	I have unlimited storage through school
01:39 ^🔗	Frogging	ayyyy
01:40 ^🔗	Frogging	is it encrypted on Drive?
01:40 ^🔗	dd0a13f37	I always thought that just meant 20gbs instead of 2
01:40 ^🔗	mundus	Yes
01:40 ^🔗	mundus	And I have an ebay account too
01:40 ^🔗	mundus	which it's all mirrored to
01:41 ^🔗	mundus	unencrypted
01:41 ^🔗	mundus	cuz idgak
01:41 ^🔗	Frogging	ebay?
01:41 ^🔗	dd0a13f37	Does ebay have cloud storage?
01:41 ^🔗	mundus	*idgaf
01:41 ^🔗	dd0a13f37	What about using torrents? You could use the server as webseed
01:41 ^🔗	mundus	No, I bought an unlimited account off ebay
01:41 ^🔗	mundus	that's a lot of hashing
01:41 ^🔗	mundus	just what I have shared is 150TB
01:42 ^🔗	dd0a13f37	So 5 months at 100mbit
01:42 ^🔗	dd0a13f37	Could gradually switch it over I guess, some of it might already be hashed if you got it as a whole
01:44 ^🔗	mundus	I dream of https://www.online.net/en/dedicated-server/dedibox-st48
01:47 ^🔗	dd0a13f37	At $4.5/tb-month you would recoup the costs pretty quickly by buying hard drives and hosting yourself
01:47 ^🔗	dd0a13f37	and use some cheap VPS as reverse proxy
01:47 ^🔗	mundus	I have a shit connection
01:47 ^🔗	mundus	And my parents would never let me spend that much
01:48 ^🔗	dd0a13f37	But $200/mo is okay?
01:48 ^🔗	mundus	>Dream of
01:48 ^🔗	mundus	no :p
01:49 ^🔗	dd0a13f37	What's taking up the majority of the space?
01:52 ^🔗		drumstick has quit IRC (Remote host closed the connection)
02:04 ^🔗		ruunyan has joined #archiveteam-bs
02:05 ^🔗	second	mundus: I must mirror your stuff quicker then...
02:13 ^🔗	dd0a13f37	A lot of it already is, search for the filenames on btdig.com
02:26 ^🔗		etudier has quit IRC (Quit: My MacBook has gone to sleep. ZZZzzz…)
02:33 ^🔗		swebb has joined #archiveteam-bs
02:34 ^🔗		svchfoo1 sets mode: +o swebb
02:35 ^🔗	VADemon	dd0a13f37: ind downloaded 791 files, 13G; wee 133 files and 1.9G; ovrigt 255 files and 3.9G
02:43 ^🔗	dd0a13f37	791 di_all.txt 255 dio_all.txt 133 diw_all.txt
02:44 ^🔗	second	mundus: do you have this book? cooking for geeks 2nd edition
02:47 ^🔗	dd0a13f37	http://libgen.io/book/index.php?md5=994D5F0D6F0D2C4F8107FCEF98080698
02:53 ^🔗	second	thanks
03:07 ^🔗	mundus	have we archived libgen?
03:09 ^🔗	dd0a13f37	No, but they provide repository torrents, are uploaded to usenet, and are backed up in various other places, maybe the internet archive
03:13 ^🔗	Frogging	mundus: I love dmca.mp4
03:13 ^🔗	mundus	thanks :)
03:13 ^🔗	Frogging	what is the source? I must know :p
03:13 ^🔗	mundus	I need to make sure that's on all my servers
03:13 ^🔗	mundus	idk
03:15 ^🔗	dd0a13f37	Question about the !yahoo switch
03:15 ^🔗	dd0a13f37	Is 4 workers the best for performance?
03:16 ^🔗	Frogging	not necessarily
03:16 ^🔗	Frogging	more workers = more load on the site
03:16 ^🔗	godane	so i'm looking at patreon to get money to buy vhs tapes
03:17 ^🔗	dd0a13f37	But if they have a powerful CDN
03:17 ^🔗	Frogging	and they won't block you? sure
03:17 ^🔗	dd0a13f37	After a certain point, it won't do anything - 1 million workers will just kill the pipeline
03:17 ^🔗	dd0a13f37	so where's the "maximum"?
03:18 ^🔗	Frogging	dunno if there is one but most people just use the default of 3. there's no need to use more than that in most cases
03:32 ^🔗		dd0a13f37 has quit IRC (Ping timeout: 268 seconds)
03:50 ^🔗		zenguy has quit IRC (Read error: Operation timed out)
03:55 ^🔗		drumstick has joined #archiveteam-bs
04:37 ^🔗		_refeed_ has joined #archiveteam-bs
04:38 ^🔗		refeed has quit IRC (Read error: Connection reset by peer)
04:39 ^🔗		Sk1d has quit IRC (Ping timeout: 250 seconds)
04:41 ^🔗		pizzaiolo has quit IRC (Ping timeout: 260 seconds)
04:46 ^🔗		Sk1d has joined #archiveteam-bs
04:49 ^🔗		pizzaiolo has joined #archiveteam-bs
06:19 ^🔗		__refeed_ has joined #archiveteam-bs
06:27 ^🔗		_refeed_ has quit IRC (Ping timeout: 600 seconds)
06:50 ^🔗		schbirid has joined #archiveteam-bs
07:09 ^🔗		__refeed_ is now known as refeed
07:10 ^🔗	refeed	is there a https://github.com/bibanon/tubeup maintainer around here?
07:43 ^🔗		BartoCH has quit IRC (Ping timeout: 260 seconds)
07:49 ^🔗	VADemon	refeed: #bibanon on Rizon IRC
07:49 ^🔗		VADemon has quit IRC (Quit: left4dead)
07:50 ^🔗	refeed	thx
07:52 ^🔗		BartoCH has joined #archiveteam-bs
09:07 ^🔗	JAA	mundus: IA most likely has an archive of Library Genesis.
09:07 ^🔗	JAA	But not publicly accessible.
09:08 ^🔗	mundus	they ave not publicly accessible stuff?
09:08 ^🔗	mundus	*have
09:11 ^🔗	JAA	Yes, tons of it.
09:12 ^🔗	JAA	All of Wayback Machine isn't publicly accessible, for example.
09:12 ^🔗	JAA	Well, everything archived by IA's crawlers etc.
09:13 ^🔗	JAA	If the copyright holder complains about an item, they'll also block access but not delete it, by the way.
09:38 ^🔗	schbirid	my libgen scimag effots are pretty dead. the torrents are not well seeded :(
09:51 ^🔗		dashcloud has quit IRC (Read error: Operation timed out)
09:55 ^🔗		dashcloud has joined #archiveteam-bs
10:42 ^🔗		Dimtree has quit IRC (Read error: Operation timed out)
11:05 ^🔗		Dimtree has joined #archiveteam-bs
11:09 ^🔗		brayden has joined #archiveteam-bs
11:09 ^🔗		swebb sets mode: +o brayden
11:12 ^🔗		drumstick has quit IRC (Read error: Operation timed out)
11:18 ^🔗		brayden has quit IRC (Ping timeout: 255 seconds)
11:21 ^🔗		brayden has joined #archiveteam-bs
11:21 ^🔗		swebb sets mode: +o brayden
11:26 ^🔗		brayden has quit IRC (Ping timeout: 255 seconds)
11:28 ^🔗		brayden has joined #archiveteam-bs
11:28 ^🔗		swebb sets mode: +o brayden
12:03 ^🔗		refeed has quit IRC (Ping timeout: 600 seconds)
12:21 ^🔗		brayden has quit IRC (Ping timeout: 255 seconds)
12:24 ^🔗		brayden has joined #archiveteam-bs
12:24 ^🔗		swebb sets mode: +o brayden
12:28 ^🔗		Soni has quit IRC (Read error: Operation timed out)
12:35 ^🔗		dashcloud has quit IRC (Read error: Operation timed out)
12:35 ^🔗		dashcloud has joined #archiveteam-bs
12:39 ^🔗		Soni has joined #archiveteam-bs
13:01 ^🔗		dd0a13f37 has joined #archiveteam-bs
13:30 ^🔗		refeed has joined #archiveteam-bs
13:53 ^🔗		BlueMaxim has quit IRC (Quit: Leaving)
14:22 ^🔗	SketchCow	Had a request about "Deez Nutz"
14:22 ^🔗	refeed	yup
14:22 ^🔗	SketchCow	https://www.youtube.com/watch?v=uODUnXf-7qc
14:23 ^🔗	SketchCow	Here's the Fat Boys in 1987 with a song called "My Nuts"
14:23 ^🔗	*	refeed is watching the video
14:24 ^🔗	dd0a13f37	Is there a limit to how many URLs curl can take in oen command
14:24 ^🔗	SketchCow	And then in 1992, five years Later, Dr. Dre released a song called "Deez Nuuts".
14:24 ^🔗	SketchCow	http://www.urbandictionary.com/define.php?term=deez%20nutz
14:25 ^🔗	dd0a13f37	I did something like `curl 'something/search.php?page_number='{0..2000}`
14:25 ^🔗	dd0a13f37	And it only got about halfway through
14:30 ^🔗	dd0a13f37	nvm, i miscalculated the offsets
14:36 ^🔗	refeed	>http://www.urbandictionary.com/define.php?term=deez%20nutz , okay now I understand ._.
14:54 ^🔗		mls has quit IRC (Ping timeout: 250 seconds)
14:55 ^🔗		mls has joined #archiveteam-bs
15:26 ^🔗		refeed has quit IRC (Leaving)
15:26 ^🔗		mls has quit IRC (Ping timeout: 250 seconds)
15:49 ^🔗		mls has joined #archiveteam-bs
16:00 ^🔗		brayden has quit IRC (Read error: Connection reset by peer)
16:01 ^🔗		brayden has joined #archiveteam-bs
16:01 ^🔗		swebb sets mode: +o brayden
16:01 ^🔗		brayden has quit IRC (Read error: Connection reset by peer)
16:02 ^🔗		brayden has joined #archiveteam-bs
16:02 ^🔗		swebb sets mode: +o brayden
16:19 ^🔗		Soni has quit IRC (Ping timeout: 255 seconds)
16:24 ^🔗		Soni has joined #archiveteam-bs
16:31 ^🔗		odemg has quit IRC (Quit: Leaving)
16:31 ^🔗		dd0a13f37 has quit IRC (Ping timeout: 268 seconds)
16:36 ^🔗	JAA	dd0a13f37: The only limit I can think of is the kernel's maximum command length limit.
16:45 ^🔗		dd0a13f37 has joined #archiveteam-bs
17:04 ^🔗	dd0a13f37	What's the fastest concurrency you should use if you're interacting with a powerful server and you're not ratelimited? At some point, it won't do anything. Right now I'm using 20, but should I crank it up further?
17:04 ^🔗	JAA	Depends on your machine and the network, I'd say.
17:04 ^🔗	dd0a13f37	For example, would 200 make things faster? What about 2000? 20000? At some point you're limited by internet, obviously (10mbit approx)
17:05 ^🔗	dd0a13f37	I'm using aria2c, so CPU load isn't a problem
17:06 ^🔗	JAA	I guess you'll just have to test. Check how many responses you get per time for different concurrencies, then pick the point of diminishing returns.
17:08 ^🔗	dd0a13f37	That seems like a reasonable idea
17:14 ^🔗	dd0a13f37	Another question: I have a list of links, they have 2 parameters, call them issue and pagenum
17:14 ^🔗	dd0a13f37	each issue is a number of pages long, so if pagenum N exists then pagenum N-1 exists
17:15 ^🔗	dd0a13f37	What's the best way to download all these? Get random sample of 1000 urls, increment page number by 1 until 0 matches, then feed to archivebot and put up with very ghigh error%
17:16 ^🔗	dd0a13f37	(a very high rate of errors)
17:18 ^🔗	JAA	I think this would be best solved with wpull and a hook script. You throw page 1 for each issue into wpull. The hook checks whether page N exists and adds page N+1 if so.
17:18 ^🔗	JAA	That wastes one request per issue.
17:18 ^🔗	JAA	It won't work with ArchiveBot though, obviously.
17:21 ^🔗	JAA	Any idea how large this is?
17:21 ^🔗	dd0a13f37	Data dependencies also
17:23 ^🔗	dd0a13f37	No idea, 43618 issues, X pages per issue, each page is a jpg file 1950x? px
17:23 ^🔗	dd0a13f37	one jpg is maybe 500k-1mb
17:23 ^🔗	dd0a13f37	one issue is 20-50 pages, seems to vary
17:24 ^🔗	JAA	Hmm, I see. That's a bit too large for me currently.
17:27 ^🔗	dd0a13f37	But sending a missed HTTP request, is that such a big deal?
17:29 ^🔗	JAA	I've seen some IP bans after too many 404s per time, but generally probably not. I'd be more worried about potentially missing pages.
17:30 ^🔗	dd0a13f37	So I can do like I said, get the highest page number, then generate a list of pages to get?
17:31 ^🔗	dd0a13f37	Does akamai ban for 404?
17:32 ^🔗	JAA	Yeah, you can try that. We can always go through the logs later and see which issues had no 404s, then look into those in detail and queue any missed pages.
17:32 ^🔗	JAA	No idea. I don't think I've archived anything from Akamai yet (at least not knowingly).
17:33 ^🔗	JAA	But I doubt it, to be honest. Most of those that I've seen were obscure little websites.
17:33 ^🔗	dd0a13f37	Well, the time-critical part of the archiving is done anyway
17:33 ^🔗	dd0a13f37	So you probably have time, yeah
17:44 ^🔗	dd0a13f37	Bloody hell, there's lots that have >100 pages while the majority are around 40, this will be much harder than I thought
17:45 ^🔗	JAA	Hm, yeah, then the other way with wpull and a hook script might be better.
17:46 ^🔗	JAA	It'll be a few TB then, I guess.
17:49 ^🔗	dd0a13f37	Or you could autogen 200 requests/issue which should be plenty, then feed it into archivebot - at an overhead at 1kb/request and an average of 20pages/issue, this gives you 180kb wasted/issue, which is <2% of total
17:51 ^🔗	dd0a13f37	Or will it take too much CPU?
18:38 ^🔗	dd0a13f37	hey chfoo, could you remove password protection on archivebot logs? Anything that's private can just be requested via PM anyway
18:39 ^🔗		jsa has quit IRC (Remote host closed the connection)
18:40 ^🔗		jsa has joined #archiveteam-bs
18:42 ^🔗		Mateon1 has quit IRC (Ping timeout: 260 seconds)
18:42 ^🔗		Mateon1 has joined #archiveteam-bs
18:50 ^🔗		kristian_ has joined #archiveteam-bs
19:11 ^🔗		spacegirl has quit IRC (Read error: Operation timed out)
19:14 ^🔗		spacegirl has joined #archiveteam-bs
19:15 ^🔗		icedice has joined #archiveteam-bs
19:18 ^🔗	dd0a13f37	Does archivebot deduplicate? If archive.org already has something, will it upload a new copy still?
19:18 ^🔗	JAA	It does not deduplicate (except inside a job).
19:21 ^🔗		kristian_ has quit IRC (Quit: Leaving)
19:22 ^🔗		odemg has joined #archiveteam-bs
19:22 ^🔗	SketchCow	It does NOT.
19:24 ^🔗	dd0a13f37	If something gets darked on IA, can you still download it if you ask them nicely out of band or is it kept secret until some arbitrary date?
19:26 ^🔗		box41 has joined #archiveteam-bs
19:30 ^🔗		box41 has quit IRC (Client Quit)
19:35 ^🔗	SketchCow	Not that I know of
19:40 ^🔗	chfoo	dd0a13f37, i can't. the logs were made private because of a good reason i can't remember.
19:41 ^🔗	JAA	chfoo: Can you tell me the password? I've been asking several times, and nobody knew what it was... :-\|
19:43 ^🔗	chfoo	sent a pm
19:50 ^🔗	dd0a13f37	SketchCow: You don't know you can download or that they keep it secret?
19:51 ^🔗		schbirid has quit IRC (Quit: Leaving)
19:54 ^🔗		kristian_ has joined #archiveteam-bs
20:03 ^🔗	dashcloud	dd0a13f37: I'm not sure he can tell you, but if you're a researcher, they can get access to other things in person at IA- you'd need to email any requests of that nature and get them approved first though
20:08 ^🔗	dd0a13f37	Okay, I see
20:08 ^🔗	dd0a13f37	So if I upload something that will get darked, it's not "wasted" in the sense that it will take 70+ years for it to become availab
20:08 ^🔗	dd0a13f37	le? In theory, that is
20:19 ^🔗		icedice has quit IRC (Quit: Leaving)
20:20 ^🔗		icedice has joined #archiveteam-bs
20:21 ^🔗	dashcloud	dd0a13f37: you should realistically plan on never seeing something that was darked again, unless you have research credentials- that way you'll be pleasantly surprised in case it does turn up again
20:21 ^🔗	dd0a13f37	So in other words, you have to archive the archives
20:21 ^🔗	dd0a13f37	What about internetarchive.bak, won't they be getting a full collection?
20:21 ^🔗	JAA	IA.BAK
20:22 ^🔗	dd0a13f37	IA.BAK, okay
20:22 ^🔗	dd0a13f37	Why did they change the name?
20:22 ^🔗	dashcloud	sure- but generally darked items are spam or items that the copyright holder cares enough about to write in about- in which case, you should easily be able to get a copy elsewhere
20:22 ^🔗	JAA	It's the same. We were writing at the same time.
20:23 ^🔗	dd0a13f37	ah okay
20:23 ^🔗	dd0a13f37	I'm wondering about the newspapers I'm archiving, since they're uploaded directly to IA and there's nobody that downloads and extracts PDFs, a copyright complaint could make them more or less permanently unavailable
20:27 ^🔗	dashcloud	dd0a13f37: if there's an archive that's being sold or actively managed, I'd be wary of uploading, otherwise chances seem pretty good
20:28 ^🔗	dd0a13f37	It's the subscriber-only section of a few swedish newspapers, some of it is old stuff (eg last 100 years except last 20), some of it is new stuff (eg last 3 years)
20:29 ^🔗	dd0a13f37	I've already uploaded it, I'm behind Tor so I don't care, but it would be a shame to have it all disappear into the void
20:29 ^🔗	dashcloud	you've uploaded it, hopefully with excellent metadata, so you've done the best you can
20:30 ^🔗	dd0a13f37	No, I could rent a cheap server and get the torrents if it's at risk of darking, but then I would have to fuck around with bitcoins
20:30 ^🔗	dd0a13f37	The metadata is not included since it's all from archivebot
20:43 ^🔗		dd0a13f37 has quit IRC (Ping timeout: 268 seconds)
21:09 ^🔗		mls has quit IRC (Ping timeout: 250 seconds)
21:17 ^🔗		mls has joined #archiveteam-bs
21:17 ^🔗		kristian_ has quit IRC (Quit: Leaving)
21:52 ^🔗		Mateon1 has quit IRC (Remote host closed the connection)
22:28 ^🔗		drumstick has joined #archiveteam-bs
22:30 ^🔗	Somebody2	Anything that is darked may or may not be kept. No guarantees whatsover, except that it isn't available for any random person to download.
22:31 ^🔗	Somebody2	If you want to make sure something remains available, you need to keep a copy, and re-upload it to a new distribution channel
22:31 ^🔗	Somebody2	if the existing ones decide to cease distributing it.

irclogger-viewer