#archiveteam 2013-11-16,Sat

↑back Search

Time Nickname Message
00:01 🔗 balrog nico_: I fed those all to archivebot already
00:01 🔗 balrog (pretty much right when the news broke)
00:01 🔗 balrog if you want to do it anyway feel free
00:02 🔗 odie5533_ What news?
00:02 🔗 nico_ his dead
00:03 🔗 odie5533_ oh
00:03 🔗 nico_ (french) https://linuxfr.org/news/deces-de-cedric-blancher-chercheur-en-securite-informatique
00:04 🔗 nico_ (english) http://www.theregister.co.uk/2013/11/12/cdric_sid_blancher_dead_at_37/
00:04 🔗 nico_ if you want more, we need to go to -bs
05:46 🔗 godane SketchCow: i found webstock 2013 videos was released
05:47 🔗 godane going to make a collection for them
07:48 🔗 Nemo_bis 13.15 <@Nemo_bis> where is odie5533 when one needs him :) https://code.google.com/p/wikiteam/issues/detail?id=78
07:49 🔗 Nemo_bis uh he's not here, stupid page up :)
07:52 🔗 Turnip Page-up has been broken on my laptop for a while, irssi has been stressful
08:12 🔗 odie5533_ Nemo_bis: hmm?
08:13 🔗 odie5533_ I didn't write wikiteam scripts.
08:14 🔗 odie5533_ Nemo_bis: I think the wikiteam script works by using urllib to get a list of urls, then wget for the actual download. If I were writing it, I'd write the url grabbing part as a Scrapy project that outputs urls.
08:15 🔗 odie5533_ Nemo_bis: well, it's hitting a redirect loop. Can you give me a url that this occurs on?
08:16 🔗 odie5533_ And in any case, the grabber should catch the HTTPError and just continue grabbing the other urls.
08:18 🔗 Nemo_bis odie5533_: I know. Some days ago you asked me to provide URLs where the problems happen, here you go. :)
08:18 🔗 Nemo_bis They are in the bug report, just add some dots to the domain names in dirs
08:18 🔗 odie5533_ eh... can you give me a link?
08:19 🔗 odie5533_ I'm really not sure what you mean since that won't give me a url to one of the images which seem to be having the problem
08:20 🔗 odie5533_ None of the domains seem to exist. e.g. http://amzwikkii.com/
08:21 🔗 odie5533_ Nemo_bis: let's talk in #wikiteam
08:44 🔗 godane Question about vimeo
08:45 🔗 godane i'm looking at vmeo webstock archive and you can't download the original video file even though there is a link
08:47 🔗 godane whats funny is if your over the link you get this message: "This source file has been lovingly stored in our archive."
08:47 🔗 godane but you can freaking download it
08:47 🔗 godane *can't
08:57 🔗 godane i also found out that more videos from webstock 2011 was release more recently
08:57 🔗 Nemo_bis maybe they moved them all to tapes in order to save money :D
09:02 🔗 godane maybe but its very weird
09:03 🔗 godane cause some videos in that area still have the original links working just fine
09:04 🔗 godane anyways d-addicts.com wiki dump is done downloading
09:04 🔗 godane making a 7z file of it
09:12 🔗 Nemo_bis still uploading as 50 KB/s average to s3...
09:16 🔗 odie5533_ That seems really low
10:36 🔗 SketchCow I'm really pounding s3.
11:45 🔗 Nemo_bis SketchCow: and derivers too now? :)
11:46 🔗 Nemo_bis 2,391,921,500 KB so far, I hope we'll have some slice of s3 available for us too soon :P
13:09 🔗 joepie92 AAAAAAND WE'RE OFF: http://tracker.archiveteam.org/hyves/
13:49 🔗 SketchCow Is FOS getting this fun?
14:16 🔗 SmileyG SketchCow: I think so
14:16 🔗 SmileyG S[h]O[r]T has offered space too
15:02 🔗 joepie92 SketchCow: I'm not sure if FOS has been added as a target yet
15:02 🔗 joepie92 the initial target was icefortress, awaiting FOS space to be freed
15:12 🔗 nico_ GLaDOS: VERSION = "20131116.02_" + subprocess.check_output(["git","rev-parse", "HEAD"]).strip()[:7]
15:12 🔗 GLaDOS nico_: I saw it (i'm also twrist)
15:12 🔗 nico_ (need a import subprocess before)
18:14 🔗 w0rp http://bpaste.net/show/gWTMl6R6j3bdSgAuFHZj/ I used this for ripping a table from Wikipedia as CSV. Maybe someone else here might find this useful.
18:15 🔗 w0rp It doesn't cope with there being two tables matching the selector on the table, but a minor set of modifications could make it do that.
18:16 🔗 w0rp *matching the selector on the page
18:55 🔗 godane uploaded: https://archive.org/details/wikid_addictscom_static-20131115-wikidump
22:10 🔗 odie5533_ hmpf, people disabling the ability to submit Issues to stuff on github...
22:23 🔗 odie5533_ Does anyone know what a CDX warcinfo/request entry is supposed to look like when the filename has a space in it?
22:27 🔗 odie5533 The Python program CDX-Writer formats the massaged url as 'warcinfo:/output file.warc.gz/version'. Should there really be a space in the middle of an otherwise space-separated file? None of the test cases for CDX-Writer have spaces in the file names, so perhaps it was overlooked.
22:31 🔗 odie5533 And the author disabled the Issues section on Github and doesn't list an email, so I can't even contact them.
22:37 🔗 xmc you can usually get an email by cloning the repo and looking at their commits
22:43 🔗 SketchCow Archive Team Bot GO! The Third: https://archive.org/details/archiveteam_archivebot_go_003\
22:46 🔗 ivan` SketchCow: did the plugins.jetbrains.com warc get nuked or are you specially handling it?
22:49 🔗 ivan` it's from around Oct 23
22:51 🔗 ivan` it ran into the 40GB limit so maybe something went wrong with the rsync

irclogger-viewer