Time |
Nickname |
Message |
00:01
🔗
|
balrog |
nico_: I fed those all to archivebot already |
00:01
🔗
|
balrog |
(pretty much right when the news broke) |
00:01
🔗
|
balrog |
if you want to do it anyway feel free |
00:02
🔗
|
odie5533_ |
What news? |
00:02
🔗
|
nico_ |
his dead |
00:03
🔗
|
odie5533_ |
oh |
00:03
🔗
|
nico_ |
(french) https://linuxfr.org/news/deces-de-cedric-blancher-chercheur-en-securite-informatique |
00:04
🔗
|
nico_ |
(english) http://www.theregister.co.uk/2013/11/12/cdric_sid_blancher_dead_at_37/ |
00:04
🔗
|
nico_ |
if you want more, we need to go to -bs |
05:46
🔗
|
godane |
SketchCow: i found webstock 2013 videos was released |
05:47
🔗
|
godane |
going to make a collection for them |
07:48
🔗
|
Nemo_bis |
13.15 <@Nemo_bis> where is odie5533 when one needs him :) https://code.google.com/p/wikiteam/issues/detail?id=78 |
07:49
🔗
|
Nemo_bis |
uh he's not here, stupid page up :) |
07:52
🔗
|
Turnip |
Page-up has been broken on my laptop for a while, irssi has been stressful |
08:12
🔗
|
odie5533_ |
Nemo_bis: hmm? |
08:13
🔗
|
odie5533_ |
I didn't write wikiteam scripts. |
08:14
🔗
|
odie5533_ |
Nemo_bis: I think the wikiteam script works by using urllib to get a list of urls, then wget for the actual download. If I were writing it, I'd write the url grabbing part as a Scrapy project that outputs urls. |
08:15
🔗
|
odie5533_ |
Nemo_bis: well, it's hitting a redirect loop. Can you give me a url that this occurs on? |
08:16
🔗
|
odie5533_ |
And in any case, the grabber should catch the HTTPError and just continue grabbing the other urls. |
08:18
🔗
|
Nemo_bis |
odie5533_: I know. Some days ago you asked me to provide URLs where the problems happen, here you go. :) |
08:18
🔗
|
Nemo_bis |
They are in the bug report, just add some dots to the domain names in dirs |
08:18
🔗
|
odie5533_ |
eh... can you give me a link? |
08:19
🔗
|
odie5533_ |
I'm really not sure what you mean since that won't give me a url to one of the images which seem to be having the problem |
08:20
🔗
|
odie5533_ |
None of the domains seem to exist. e.g. http://amzwikkii.com/ |
08:21
🔗
|
odie5533_ |
Nemo_bis: let's talk in #wikiteam |
08:44
🔗
|
godane |
Question about vimeo |
08:45
🔗
|
godane |
i'm looking at vmeo webstock archive and you can't download the original video file even though there is a link |
08:47
🔗
|
godane |
whats funny is if your over the link you get this message: "This source file has been lovingly stored in our archive." |
08:47
🔗
|
godane |
but you can freaking download it |
08:47
🔗
|
godane |
*can't |
08:57
🔗
|
godane |
i also found out that more videos from webstock 2011 was release more recently |
08:57
🔗
|
Nemo_bis |
maybe they moved them all to tapes in order to save money :D |
09:02
🔗
|
godane |
maybe but its very weird |
09:03
🔗
|
godane |
cause some videos in that area still have the original links working just fine |
09:04
🔗
|
godane |
anyways d-addicts.com wiki dump is done downloading |
09:04
🔗
|
godane |
making a 7z file of it |
09:12
🔗
|
Nemo_bis |
still uploading as 50 KB/s average to s3... |
09:16
🔗
|
odie5533_ |
That seems really low |
10:36
🔗
|
SketchCow |
I'm really pounding s3. |
11:45
🔗
|
Nemo_bis |
SketchCow: and derivers too now? :) |
11:46
🔗
|
Nemo_bis |
2,391,921,500 KB so far, I hope we'll have some slice of s3 available for us too soon :P |
13:09
🔗
|
joepie92 |
AAAAAAND WE'RE OFF: http://tracker.archiveteam.org/hyves/ |
13:49
🔗
|
SketchCow |
Is FOS getting this fun? |
14:16
🔗
|
SmileyG |
SketchCow: I think so |
14:16
🔗
|
SmileyG |
S[h]O[r]T has offered space too |
15:02
🔗
|
joepie92 |
SketchCow: I'm not sure if FOS has been added as a target yet |
15:02
🔗
|
joepie92 |
the initial target was icefortress, awaiting FOS space to be freed |
15:12
🔗
|
nico_ |
GLaDOS: VERSION = "20131116.02_" + subprocess.check_output(["git","rev-parse", "HEAD"]).strip()[:7] |
15:12
🔗
|
GLaDOS |
nico_: I saw it (i'm also twrist) |
15:12
🔗
|
nico_ |
(need a import subprocess before) |
18:14
🔗
|
w0rp |
http://bpaste.net/show/gWTMl6R6j3bdSgAuFHZj/ I used this for ripping a table from Wikipedia as CSV. Maybe someone else here might find this useful. |
18:15
🔗
|
w0rp |
It doesn't cope with there being two tables matching the selector on the table, but a minor set of modifications could make it do that. |
18:16
🔗
|
w0rp |
*matching the selector on the page |
18:55
🔗
|
godane |
uploaded: https://archive.org/details/wikid_addictscom_static-20131115-wikidump |
22:10
🔗
|
odie5533_ |
hmpf, people disabling the ability to submit Issues to stuff on github... |
22:23
🔗
|
odie5533_ |
Does anyone know what a CDX warcinfo/request entry is supposed to look like when the filename has a space in it? |
22:27
🔗
|
odie5533 |
The Python program CDX-Writer formats the massaged url as 'warcinfo:/output file.warc.gz/version'. Should there really be a space in the middle of an otherwise space-separated file? None of the test cases for CDX-Writer have spaces in the file names, so perhaps it was overlooked. |
22:31
🔗
|
odie5533 |
And the author disabled the Issues section on Github and doesn't list an email, so I can't even contact them. |
22:37
🔗
|
xmc |
you can usually get an email by cloning the repo and looking at their commits |
22:43
🔗
|
SketchCow |
Archive Team Bot GO! The Third: https://archive.org/details/archiveteam_archivebot_go_003\ |
22:46
🔗
|
ivan` |
SketchCow: did the plugins.jetbrains.com warc get nuked or are you specially handling it? |
22:49
🔗
|
ivan` |
it's from around Oct 23 |
22:51
🔗
|
ivan` |
it ran into the 40GB limit so maybe something went wrong with the rsync |