#archiveteam 2013-12-27,Fri

↑back Search

Time	Nickname	Message
02:40 ^🔗	SketchCow	Schbirid: FOS will be available for it soon. I am currently uploading Wretch
07:56 ^🔗	Schbirid	SketchCow: excellent, i will see to get my dockstar into the network today! it would probably end up as rsync daemon to sync from.
09:46 ^🔗	arkiver	saving warhammeronline.com in just under 60 minutes...
09:46 ^🔗	arkiver	:)
09:46 ^🔗	arkiver	new record!!
09:56 ^🔗	m1das	nice job arkiver
10:03 ^🔗	nico_32	how?much was it ? (in size)
10:16 ^🔗	arkiver	using a different method
10:16 ^🔗	arkiver	not adding a website and downloading that whole website
10:16 ^🔗	arkiver	since it is then downloading everything one by one
10:16 ^🔗	arkiver	but I used a program that quickly discoveres all the links from a website
10:17 ^🔗	arkiver	then I download all those links instead of the website
10:17 ^🔗	arkiver	the website is then faster downloaded
10:37 ^🔗	arkiver	is http://commons.wikimedia.org/ also saved by the archiveteam already?
10:43 ^🔗	nico_32	probably backuped by the wikiteam
10:50 ^🔗	arkiver	ah ok
11:00 ^🔗	Nemo_bis	arkiver: what part of it?
11:00 ^🔗	Nemo_bis	the text is in http://dumps.wikimedia.org/backup-index.html with some mirrors
11:00 ^🔗	arkiver	yes
11:00 ^🔗	arkiver	but I mean all the images and videos and so on
11:01 ^🔗	Nemo_bis	uploads are close to 30 TB, I spent a few months archiving them
11:01 ^🔗	Nemo_bis	if you find something/someone to seed the torrents, that's appreciated :) there's one per month https://archive.org/details/wikimediacommons-torrents
11:09 ^🔗	arkiver	are you only uploading them as torrents or also as warc's?
11:10 ^🔗	Nemo_bis	O_o
11:10 ^🔗	Nemo_bis	they're uploaded as ZIP files (which contain the individual media files + XML descriptions), torrents are just a way for distribution
11:10 ^🔗	m1das	30TB, thats about the storage i have in total.
11:10 ^🔗	arkiver	no I mean are they in the wayback machine?
11:11 ^🔗	Nemo_bis	see https://meta.wikimedia.org/wiki/Mirroring_Wikimedia_project_XML_dumps#Media_tarballs for more info
11:11 ^🔗	arkiver	brb
11:11 ^🔗	Nemo_bis	I doubt it and it wouldn't be very useful anyway, you can't download more than 100 MB per file from wayback
11:11 ^🔗	Nemo_bis	though the legend says you can from some machines
11:48 ^🔗	arkiver	??
11:48 ^🔗	arkiver	you can download more then 100 MB per file from the wayback machine...
12:27 ^🔗	Nemo_bis	arkiver: not always https://archive.org/post/1003894/wayback-machine-doesnt-support-the-range-header-aka-wget-continue-doesnt-work
12:28 ^🔗	arkiver	Nemo_bis: I never experienced that yet...
12:28 ^🔗	arkiver	can someone here create good scripts or little programs for windows?
12:29 ^🔗	Nemo_bis	arkiver: then try downloading http://web.archive.org/web/20070810113028/http://www.knams.wikimedia.org/wikimania/highquality/Wikimania05-AP1.avi and tell me what you get :)
12:31 ^🔗	arkiver	Nemo_bis: ah yes, I see...
12:31 ^🔗	arkiver	I did get that sometimes but never always on 100 MB
12:31 ^🔗	arkiver	it is different everytime
12:31 ^🔗	arkiver	but for what I learned is that it just needs to be archived again
12:31 ^🔗	arkiver	since there was probably some kind of error in the connection at that time
12:32 ^🔗	Nemo_bis	ouch, that would be terrible because those videos are gone; where did you read this?
12:34 ^🔗	arkiver	no it's just from what I tried out
12:34 ^🔗	arkiver	I tried and tried with other links
12:34 ^🔗	arkiver	and that is my "conclusion"
12:35 ^🔗	arkiver	but man
12:35 ^🔗	arkiver	maybe we should put wikimedia in the wayback machine?
12:36 ^🔗	Nemo_bis	that's a bit generic :) what part of it
12:36 ^🔗	arkiver	hmm
12:37 ^🔗	arkiver	alright if we talk about this a little later
12:37 ^🔗	arkiver	lol
12:37 ^🔗	arkiver	doing several things atm
12:37 ^🔗	arkiver	and I want to have a good conversation about it
12:37 ^🔗	arkiver	ok?
12:39 ^🔗	arkiver	till when are you online?
15:04 ^🔗	chfoo	the #btch project is up and running. manual script running: https://github.com/ArchiveTeam/ptch-grab
15:05 ^🔗	Marcelo	Only manual?
15:08 ^🔗	chfoo	i need an admin to add it to projects.json please
15:17 ^🔗	nico_32	another project ?
15:19 ^🔗	chfoo	yahoo! is shutting down ptch. ~5 days remain.
15:21 ^🔗	nico_32	74k todo ? definitive number ?
15:23 ^🔗	nico_32	chfoo: how much concurrent by ipv4 ?
15:25 ^🔗	chfoo	nico_32: 74k should be definitive based on the list deathy gave me. i'm not sure about how concurrent threads is ok.
15:26 ^🔗	chfoo	if possible, best advice is use a sacrificial ip address and let us know.
15:27 ^🔗	deathy	for ptch there was no obvious/visible rate-limiting when I did initial research/API calls.
15:28 ^🔗	deathy	that being said... 2 concurrent is safe...let's at least see how it goes before trying to break it
15:28 ^🔗	nico_32	so running concurrent=2 on 4 ipv4
15:28 ^🔗	nico_32	got another dedicated server
15:30 ^🔗	Marcelo	Can I increase upload slots?
15:30 ^🔗	Marcelo	Concurrent uploads
15:34 ^🔗	nico_32	the upload target is slow
15:34 ^🔗	nico_32	~75 kBps here
15:35 ^🔗	Marcelo	75.98kB/s here
15:39 ^🔗	nico_32	from Schbirid (was got klined from efnet): "hey, could someone test the speed of my jamendo vorbis album server?"
15:40 ^🔗	nico_32	from Schbirid (was got klined from efnet): "rsync -avP 151.217.55.80::albums2 ."
15:40 ^🔗	nico_32	s/was/who/g
15:40 ^🔗	nico_32	from Schbirid (who got klined from efnet): "if it works, maybe someone could sync from/to fos? albums2 is the first hdd with 2TB"
15:41 ^🔗	nico_32	from Schbirid (who got klined from efnet): "rsync -avP --dry-run 151.217.55.80::albums2 jamendo-albums/"
15:41 ^🔗	nico_32	poke SketchCow
15:51 ^🔗	SketchCow	OK.
15:55 ^🔗	Nemo_bis	chfoo: the README doesn't include the instructions added in last revisions of https://github.com/ArchiveTeam/wretch-grab
15:56 ^🔗	chfoo	Nemo_bis: noted. i'll fix it now
15:56 ^🔗	Nemo_bis	I guess they need to be pushed to the upstream repo?
15:56 ^🔗	Nemo_bis	thanks
15:57 ^🔗	Nemo_bis	I also noted we still require gnutls-dev[el] and openssl-dev[el], I had to install them on fedora (this used to be the most common problem, with mobileme)
15:57 ^🔗	Nemo_bis	so maybe that's to add too
16:11 ^🔗	nico_32	it is openssl-dev or gnutls-dev
16:11 ^🔗	nico_32	one is enough
16:22 ^🔗	Nemo_bis	hmmm
16:24 ^🔗	Nemo_bis	I can't make sense out of my package manager history, oh well
16:25 ^🔗	joepie91	Nemo_bis: wait, you mean there are people that -can- make sense out of package manager history?
16:25 ^🔗	joepie91	where do I find these mythical creatures?
16:27 ^🔗	Nemo_bis	:) apper is rather easy to use
16:28 ^🔗	Nemo_bis	but apparently I didn't install the packages I remembered, probably I'm the wrong one ;)
18:22 ^🔗	wp494	!!
18:22 ^🔗	wp494	http://www.theverge.com/2013/12/27/5248286/vdio-shut-down-by-rdio
18:25 ^🔗	yipdw	rdio killed the vdio star
19:09 ^🔗	zenguy_pc	how does web,archive.org determine what imgur links they cache
19:09 ^🔗	zenguy_pc	yipdw: lol
20:47 ^🔗	DFJustin	zenguy_pc: I would assume it's just a crapshoot based on what their spiders reach
20:47 ^🔗	DFJustin	so popular images linked from multiple external pages are more likely

irclogger-viewer