#archiveteam 2013-11-16,Sat

↑back Search

Time	Nickname	Message
00:01 ^🔗	balrog	nico_: I fed those all to archivebot already
00:01 ^🔗	balrog	(pretty much right when the news broke)
00:01 ^🔗	balrog	if you want to do it anyway feel free
00:02 ^🔗	odie5533_	What news?
00:02 ^🔗	nico_	his dead
00:03 ^🔗	odie5533_	oh
00:03 ^🔗	nico_	(french) https://linuxfr.org/news/deces-de-cedric-blancher-chercheur-en-securite-informatique
00:04 ^🔗	nico_	(english) http://www.theregister.co.uk/2013/11/12/cdric_sid_blancher_dead_at_37/
00:04 ^🔗	nico_	if you want more, we need to go to -bs
05:46 ^🔗	godane	SketchCow: i found webstock 2013 videos was released
05:47 ^🔗	godane	going to make a collection for them
07:48 ^🔗	Nemo_bis	13.15 <@Nemo_bis> where is odie5533 when one needs him :) https://code.google.com/p/wikiteam/issues/detail?id=78
07:49 ^🔗	Nemo_bis	uh he's not here, stupid page up :)
07:52 ^🔗	Turnip	Page-up has been broken on my laptop for a while, irssi has been stressful
08:12 ^🔗	odie5533_	Nemo_bis: hmm?
08:13 ^🔗	odie5533_	I didn't write wikiteam scripts.
08:14 ^🔗	odie5533_	Nemo_bis: I think the wikiteam script works by using urllib to get a list of urls, then wget for the actual download. If I were writing it, I'd write the url grabbing part as a Scrapy project that outputs urls.
08:15 ^🔗	odie5533_	Nemo_bis: well, it's hitting a redirect loop. Can you give me a url that this occurs on?
08:16 ^🔗	odie5533_	And in any case, the grabber should catch the HTTPError and just continue grabbing the other urls.
08:18 ^🔗	Nemo_bis	odie5533_: I know. Some days ago you asked me to provide URLs where the problems happen, here you go. :)
08:18 ^🔗	Nemo_bis	They are in the bug report, just add some dots to the domain names in dirs
08:18 ^🔗	odie5533_	eh... can you give me a link?
08:19 ^🔗	odie5533_	I'm really not sure what you mean since that won't give me a url to one of the images which seem to be having the problem
08:20 ^🔗	odie5533_	None of the domains seem to exist. e.g. http://amzwikkii.com/
08:21 ^🔗	odie5533_	Nemo_bis: let's talk in #wikiteam
08:44 ^🔗	godane	Question about vimeo
08:45 ^🔗	godane	i'm looking at vmeo webstock archive and you can't download the original video file even though there is a link
08:47 ^🔗	godane	whats funny is if your over the link you get this message: "This source file has been lovingly stored in our archive."
08:47 ^🔗	godane	but you can freaking download it
08:47 ^🔗	godane	*can't
08:57 ^🔗	godane	i also found out that more videos from webstock 2011 was release more recently
08:57 ^🔗	Nemo_bis	maybe they moved them all to tapes in order to save money :D
09:02 ^🔗	godane	maybe but its very weird
09:03 ^🔗	godane	cause some videos in that area still have the original links working just fine
09:04 ^🔗	godane	anyways d-addicts.com wiki dump is done downloading
09:04 ^🔗	godane	making a 7z file of it
09:12 ^🔗	Nemo_bis	still uploading as 50 KB/s average to s3...
09:16 ^🔗	odie5533_	That seems really low
10:36 ^🔗	SketchCow	I'm really pounding s3.
11:45 ^🔗	Nemo_bis	SketchCow: and derivers too now? :)
11:46 ^🔗	Nemo_bis	2,391,921,500 KB so far, I hope we'll have some slice of s3 available for us too soon :P
13:09 ^🔗	joepie92	AAAAAAND WE'RE OFF: http://tracker.archiveteam.org/hyves/
13:49 ^🔗	SketchCow	Is FOS getting this fun?
14:16 ^🔗	SmileyG	SketchCow: I think so
14:16 ^🔗	SmileyG	S[h]O[r]T has offered space too
15:02 ^🔗	joepie92	SketchCow: I'm not sure if FOS has been added as a target yet
15:02 ^🔗	joepie92	the initial target was icefortress, awaiting FOS space to be freed
15:12 ^🔗	nico_	GLaDOS: VERSION = "20131116.02_" + subprocess.check_output(["git","rev-parse", "HEAD"]).strip()[:7]
15:12 ^🔗	GLaDOS	nico_: I saw it (i'm also twrist)
15:12 ^🔗	nico_	(need a import subprocess before)
18:14 ^🔗	w0rp	http://bpaste.net/show/gWTMl6R6j3bdSgAuFHZj/ I used this for ripping a table from Wikipedia as CSV. Maybe someone else here might find this useful.
18:15 ^🔗	w0rp	It doesn't cope with there being two tables matching the selector on the table, but a minor set of modifications could make it do that.
18:16 ^🔗	w0rp	*matching the selector on the page
18:55 ^🔗	godane	uploaded: https://archive.org/details/wikid_addictscom_static-20131115-wikidump
22:10 ^🔗	odie5533_	hmpf, people disabling the ability to submit Issues to stuff on github...
22:23 ^🔗	odie5533_	Does anyone know what a CDX warcinfo/request entry is supposed to look like when the filename has a space in it?
22:27 ^🔗	odie5533	The Python program CDX-Writer formats the massaged url as 'warcinfo:/output file.warc.gz/version'. Should there really be a space in the middle of an otherwise space-separated file? None of the test cases for CDX-Writer have spaces in the file names, so perhaps it was overlooked.
22:31 ^🔗	odie5533	And the author disabled the Issues section on Github and doesn't list an email, so I can't even contact them.
22:37 ^🔗	xmc	you can usually get an email by cloning the repo and looking at their commits
22:43 ^🔗	SketchCow	Archive Team Bot GO! The Third: https://archive.org/details/archiveteam_archivebot_go_003\
22:46 ^🔗	ivan`	SketchCow: did the plugins.jetbrains.com warc get nuked or are you specially handling it?
22:49 ^🔗	ivan`	it's from around Oct 23
22:51 ^🔗	ivan`	it ran into the 40GB limit so maybe something went wrong with the rsync

irclogger-viewer