Time |
Nickname |
Message |
02:40
🔗
|
|
dashcloud has quit IRC (Read error: Operation timed out) |
02:41
🔗
|
|
dashcloud has joined #urlteam |
03:26
🔗
|
|
Aerochron has joined #urlteam |
03:26
🔗
|
Aerochron |
Hi. Can anyone help explain what exactly this project is? |
03:26
🔗
|
Aerochron |
I mean, I get the point of it but I am not sure what the warrior is actually doing when it runs. |
03:39
🔗
|
Somebody2 |
Aerochron: what the warrior does is process "items" sent out by a central tracker server. Each item consists of a small range of URLs to try. |
03:40
🔗
|
Somebody2 |
The warrior tries each one, and (for the ones that redirect) extracts the target URL. When it tries all the ones in the item, |
03:40
🔗
|
Aerochron |
So the VM is more or less wardialing as many shortened URLs as possible? |
03:40
🔗
|
Somebody2 |
it reports back to the tracker with the results, and gets another item to work on. |
03:40
🔗
|
Somebody2 |
Aerochron: yep! |
03:41
🔗
|
Somebody2 |
By default, we work in order -- but for some shortening services, we have custom code that works in a pseudo-random order. |
03:41
🔗
|
Somebody2 |
(But I don't know much about the details of that) |
03:42
🔗
|
Somebody2 |
If a warrior gets and unexpected response, it reports that back to the tracker -- and if a particular item consistently fails, |
03:42
🔗
|
astrid |
it tries to stay below the ratelimit at which the shortener will ban your ip |
03:42
🔗
|
Somebody2 |
eventually the tracker will alert, and demand manual intervention before it will continue handing out items. |
03:43
🔗
|
Somebody2 |
yes, we have a number of different knobs to adjust to stay below ratelimits |
03:44
🔗
|
Somebody2 |
the tracker batches up the results, and approxemately once a day or so, it uploads them (in a semi-silly format) to archive.org |
03:44
🔗
|
Aerochron |
Ah. Also, what does PAUSED mean when it is next to a project? Is the project unavailable for a time? |
03:44
🔗
|
Somebody2 |
where they can downloaded, or torrented (seeding is very welcome!) |
03:44
🔗
|
Somebody2 |
PAUSED means that we had previously scraped that shortening service, but we aren't doing so right now. |
03:45
🔗
|
Somebody2 |
Generally either because the service has died, or we grabbed all of it, or they blocked us, or something like that. |
03:45
🔗
|
Somebody2 |
I assume you've seen the wiki page: http://archiveteam.org/index.php?title=URLTeam |
03:46
🔗
|
Somebody2 |
(help better organizing it is also very welcome) |
08:06
🔗
|
|
Jonison has joined #urlteam |
09:02
🔗
|
|
Jonison has quit IRC (Read error: Connection reset by peer) |
09:39
🔗
|
|
Jonison has joined #urlteam |
11:32
🔗
|
|
dashcloud has quit IRC (Read error: Operation timed out) |
11:39
🔗
|
|
dashcloud has joined #urlteam |
11:51
🔗
|
|
mls has quit IRC (Ping timeout: 250 seconds) |
12:03
🔗
|
|
mls has joined #urlteam |
13:01
🔗
|
|
Jonison has quit IRC (Ping timeout: 260 seconds) |
13:09
🔗
|
|
mls has quit IRC (Ping timeout: 250 seconds) |
13:30
🔗
|
|
dashcloud has quit IRC (Read error: Operation timed out) |
13:44
🔗
|
|
mls has joined #urlteam |
13:50
🔗
|
|
dashcloud has joined #urlteam |
17:50
🔗
|
|
dashcloud has quit IRC (Read error: Operation timed out) |
17:50
🔗
|
|
dashcloud has joined #urlteam |
18:46
🔗
|
|
ix has quit IRC (Quit: oh) |
19:27
🔗
|
JAA |
It looks like X.vu has started redirecting everything to HTTPS. |
19:28
🔗
|
hook54321 |
And to the shutdown notice page |
19:28
🔗
|
hook54321 |
I'm assuming the effort they're referring to is us :P |
19:29
🔗
|
JAA |
I've paused x-vu for now. |
19:34
🔗
|
JAA |
So currently http://x.vu/<shortcode> redirects to https://x.vu/<shortcode> unconditionally, and https://x.vu/<shortcode> always responds with a 301, either to /shutdown/ if the code is invalid or to /shutdown/?url=<target> if the code exists. |
19:35
🔗
|
JAA |
I don't know how to adjust the settings to account for that, so I'll let Somebody2 handle that. |
19:35
🔗
|
JAA |
We'll have to redo tons of codes, too. |
19:36
🔗
|
JAA |
hook54321: No, I think they were referring to our archival efforts in general. |
19:38
🔗
|
JAA |
By the way, they also told me that they're trying "to keep the redirects working for at least a couple more years". |
19:39
🔗
|
JAA |
(On the shutdown notice, they just say "as long as possible".) |
19:57
🔗
|
|
ix has joined #urlteam |
20:00
🔗
|
JAA |
Somebody2: Regarding vgd_6: v.gd does some weird user agent magic. When you access it with a browser, you get an interstitial page "The link you followed has been shortened with v.gd. blahblahblah". If you use Curl, you get a 30x. It looks like they detect our user agent as a browser and return the interstitial page. |
20:01
🔗
|
JAA |
It looks like the user agent is hardcoded in terroroftinytown-client-grab, so I guess we'll have to extract the URL from the HTML instead. |
20:03
🔗
|
JAA |
You can also get around it by setting a cookie preview=1, but that doesn't seem to be supported by terroroftinytown either. |
20:04
🔗
|
JAA |
E.g. curl -vA 'ArchiveTeam Warrior/0.9.2' -b 'preview=1' https://v.gd/FFF54S |
22:00
🔗
|
|
liam has quit IRC (Read error: Operation timed out) |
22:00
🔗
|
|
JAA has quit IRC (Read error: Operation timed out) |
22:00
🔗
|
|
Aerochrom has joined #urlteam |
22:01
🔗
|
|
rocode has quit IRC (Read error: Operation timed out) |
22:01
🔗
|
|
bobazY has quit IRC (Read error: Operation timed out) |
22:01
🔗
|
|
bobazY has joined #urlteam |
22:02
🔗
|
|
JAA has joined #urlteam |
22:02
🔗
|
|
liam has joined #urlteam |
22:02
🔗
|
|
svchfoo3 sets mode: +o JAA |
22:02
🔗
|
|
svchfoo1 sets mode: +o JAA |
22:04
🔗
|
|
Aerochron has quit IRC (Read error: Operation timed out) |
22:06
🔗
|
|
rocode has joined #urlteam |
22:15
🔗
|
|
wabu has quit IRC (Ping timeout: 246 seconds) |
22:26
🔗
|
|
wabu has joined #urlteam |
22:41
🔗
|
|
svchfoo1 has quit IRC (Remote host closed the connection) |
22:42
🔗
|
|
svchfoo1 has joined #urlteam |
22:43
🔗
|
|
svchfoo3 sets mode: +o svchfoo1 |