Time |
Nickname |
Message |
10:32
🔗
|
Nemo_bis |
So much hate for bots: 1 0 , 5 8 0 , 4 9 8 , 3 1 5 Bad Bots Blocked http://www.distilnetworks.com/ |
10:55
🔗
|
midas |
Ia_archiver is blocked |
10:56
🔗
|
midas |
or, could be blocked |
10:56
🔗
|
midas |
but it isnt clear what they do and dont block |
12:57
🔗
|
balrog |
midas: probably is |
12:57
🔗
|
balrog |
Ia_archiver |
12:57
🔗
|
balrog |
Ia_archiver is a web crawler for Alexa, an analytics and web information company. |
12:57
🔗
|
balrog |
they consider it an alexa analytics crawler which it isn't |
13:31
🔗
|
Nemo_bis |
they might be a decade or two out of date |
13:42
🔗
|
midas |
not too shabby |
14:06
🔗
|
balrog |
https://alexa.zendesk.com/hc/en-us/articles/200450194-Alexa-s-Web-and-Site-Audit-Crawlers |
14:06
🔗
|
balrog |
that says nothing about IA |
14:31
🔗
|
DFJustin |
http://archivebot.at.ninjawedding.org:4567/#/histories/http://cryptome.org/ |
14:31
🔗
|
DFJustin |
er |
14:38
🔗
|
Nemo_bis |
300 GB almost? :o http://archivebot.at.ninjawedding.org:4567/#/histories/http://wdl2.winworldpc.com/ |
14:39
🔗
|
DFJustin |
yeah I guess it won't show up on there until the job finishes properly |
14:39
🔗
|
DFJustin |
which will need manual intervention from SketchCow |
14:41
🔗
|
Nemo_bis |
lol what sense does this make http://archivebot.at.ninjawedding.org:4567/#/histories/https://wiki.archlinux.org/ |
14:42
🔗
|
Nemo_bis |
that wiki is hyper-easy to archive and has almost no custom extensions or anyting |
14:45
🔗
|
ivan` |
does the hyper-easy archiving thing put pages into wayback? |
14:46
🔗
|
ivan` |
we've been grabbing a lot of wikis with archivebot just for that |
14:50
🔗
|
Nemo_bis |
Sure, that's a benefit |
14:51
🔗
|
Nemo_bis |
But what's the benefit of sending the poor archivebot in Special:WhatLinksHere rabbit holes :) |
14:52
🔗
|
* |
ats imagines aggregating all the grabbed wikis into an enormous meta-wiki, so you can follow links between them easily... |
14:52
🔗
|
Nemo_bis |
ats, that was the point of the interwikis when they were invented :) |
14:52
🔗
|
Nemo_bis |
OTOH in this case they can't blame us, there isn't even a robots.txt AFAICS https://wiki.archlinux.org/robots.txt |
14:53
🔗
|
ats |
yeah, but that requires the authors to be aware of stuff on other wikis... |
14:53
🔗
|
Nemo_bis |
http://meatballwiki.org/wiki/InterWiki |
14:54
🔗
|
Nemo_bis |
Well, not according to some, ats: "InterWikiSearch deals with the obvious problem of not knowing what is where." |
14:54
🔗
|
Nemo_bis |
However that had only been implemented for the c2.com wikis and perhaps communitywiki IIRC |
15:23
🔗
|
balrog |
why not make a bot like archivebot that instead uses DumpGenerator for wikis? |
15:24
🔗
|
exmic |
hmmmm |
15:25
🔗
|
exmic |
yes |
15:26
🔗
|
DFJustin |
I would like that although I checked the wikis on my to-archive list the other day and wikiteam already hit them, I think their coverage is pretty high at this point |
15:36
🔗
|
Nemo_bis |
Current plan is to buy a list of MediaWikis from one of those web-crawlers |
15:37
🔗
|
Nemo_bis |
But mostly we need to make our code more modern to catch all those wikis which we fail to download for silly urllib2 errors and the like |
15:37
🔗
|
Nemo_bis |
When we've done that we could launch it all over Wikia with Warrior, haha |
15:49
🔗
|
balrog |
Nemo_bis: I thought I got one of those bugs fixed |
15:49
🔗
|
balrog |
because urllib2 wasn't being used right :P |
15:55
🔗
|
Nemo_bis |
true but there's so many |
15:56
🔗
|
balrog |
report them all |
15:56
🔗
|
Nemo_bis |
they're surely all features, not bugs |
15:56
🔗
|
Nemo_bis |
But still they kill download of about 2400 wikis we have, I expect |
15:57
🔗
|
balrog |
they're probably bugs in the downloader :/ |
15:57
🔗
|
balrog |
2400? |
15:57
🔗
|
balrog |
yeah, that certainly has to be looked at |
18:29
🔗
|
wp494 |
RIP appurify |
18:29
🔗
|
wp494 |
just announced on google i/o |
18:33
🔗
|
SketchCow |
Grabbing |
18:40
🔗
|
wp494 |
looks like twitch lives on for another while |
18:58
🔗
|
Jonimus |
If twitch dies livestream or similar will hopefully take their place, thought all that lost archived video will be sad |
19:14
🔗
|
APerti |
Twitch has zillions of dollars from M$, no? |
19:15
🔗
|
midas |
Jonimus: and all we could do is dance around the burning rubble |
19:18
🔗
|
db48x |
nah, we could archive twitch |
19:18
🔗
|
db48x |
we should be archiving twitch |
19:24
🔗
|
midas |
all of twitch might be a tad harder, but sure |
19:24
🔗
|
midas |
go ahead |