#urlteam 2017-09-18,Mon

↑back Search ←Prev date Next date→ Show only urls(Click on time to select a line by its url)

WhoWhatWhen
***dashcloud has quit IRC (Read error: Operation timed out)
dashcloud has joined #urlteam
[02:40]
.......... (idle for 45mn)
Aerochron has joined #urlteam [03:26]
AerochronHi. Can anyone help explain what exactly this project is?
I mean, I get the point of it but I am not sure what the warrior is actually doing when it runs.
[03:26]
Somebody2Aerochron: what the warrior does is process "items" sent out by a central tracker server. Each item consists of a small range of URLs to try.
The warrior tries each one, and (for the ones that redirect) extracts the target URL. When it tries all the ones in the item,
[03:39]
AerochronSo the VM is more or less wardialing as many shortened URLs as possible? [03:40]
Somebody2it reports back to the tracker with the results, and gets another item to work on.
Aerochron: yep!
By default, we work in order -- but for some shortening services, we have custom code that works in a pseudo-random order.
(But I don't know much about the details of that)
If a warrior gets and unexpected response, it reports that back to the tracker -- and if a particular item consistently fails,
[03:40]
astridit tries to stay below the ratelimit at which the shortener will ban your ip [03:42]
Somebody2eventually the tracker will alert, and demand manual intervention before it will continue handing out items.
yes, we have a number of different knobs to adjust to stay below ratelimits
the tracker batches up the results, and approxemately once a day or so, it uploads them (in a semi-silly format) to archive.org
[03:42]
AerochronAh. Also, what does PAUSED mean when it is next to a project? Is the project unavailable for a time? [03:44]
Somebody2where they can downloaded, or torrented (seeding is very welcome!)
PAUSED means that we had previously scraped that shortening service, but we aren't doing so right now.
Generally either because the service has died, or we grabbed all of it, or they blocked us, or something like that.
I assume you've seen the wiki page: http://archiveteam.org/index.php?title=URLTeam
(help better organizing it is also very welcome)
[03:44]
..................................................... (idle for 4h20mn)
***Jonison has joined #urlteam [08:06]
............ (idle for 56mn)
Jonison has quit IRC (Read error: Connection reset by peer) [09:02]
........ (idle for 37mn)
Jonison has joined #urlteam [09:39]
....................... (idle for 1h53mn)
dashcloud has quit IRC (Read error: Operation timed out) [11:32]
dashcloud has joined #urlteam [11:39]
mls has quit IRC (Ping timeout: 250 seconds) [11:51]
mls has joined #urlteam [12:03]
............ (idle for 58mn)
Jonison has quit IRC (Ping timeout: 260 seconds) [13:01]
mls has quit IRC (Ping timeout: 250 seconds) [13:09]
..... (idle for 21mn)
dashcloud has quit IRC (Read error: Operation timed out) [13:30]
mls has joined #urlteam [13:44]
dashcloud has joined #urlteam [13:50]
................................................. (idle for 4h0mn)
dashcloud has quit IRC (Read error: Operation timed out)
dashcloud has joined #urlteam
[17:50]
............ (idle for 56mn)
ix has quit IRC (Quit: oh) [18:46]
......... (idle for 41mn)
JAAIt looks like X.vu has started redirecting everything to HTTPS. [19:27]
hook54321And to the shutdown notice page
I'm assuming the effort they're referring to is us :P
[19:28]
JAAI've paused x-vu for now. [19:29]
So currently http://x.vu/<shortcode> redirects to https://x.vu/<shortcode> unconditionally, and https://x.vu/<shortcode> always responds with a 301, either to /shutdown/ if the code is invalid or to /shutdown/?url=<target> if the code exists.
I don't know how to adjust the settings to account for that, so I'll let Somebody2 handle that.
We'll have to redo tons of codes, too.
hook54321: No, I think they were referring to our archival efforts in general.
By the way, they also told me that they're trying "to keep the redirects working for at least a couple more years".
(On the shutdown notice, they just say "as long as possible".)
[19:34]
.... (idle for 18mn)
***ix has joined #urlteam [19:57]
JAASomebody2: Regarding vgd_6: v.gd does some weird user agent magic. When you access it with a browser, you get an interstitial page "The link you followed has been shortened with v.gd. blahblahblah". If you use Curl, you get a 30x. It looks like they detect our user agent as a browser and return the interstitial page.
It looks like the user agent is hardcoded in terroroftinytown-client-grab, so I guess we'll have to extract the URL from the HTML instead.
You can also get around it by setting a cookie preview=1, but that doesn't seem to be supported by terroroftinytown either.
E.g. curl -vA 'ArchiveTeam Warrior/0.9.2' -b 'preview=1' https://v.gd/FFF54S
[20:00]
........................ (idle for 1h56mn)
***liam has quit IRC (Read error: Operation timed out)
JAA has quit IRC (Read error: Operation timed out)
Aerochrom has joined #urlteam
rocode has quit IRC (Read error: Operation timed out)
bobazY has quit IRC (Read error: Operation timed out)
bobazY has joined #urlteam
JAA has joined #urlteam
liam has joined #urlteam
svchfoo3 sets mode: +o JAA
svchfoo1 sets mode: +o JAA
Aerochron has quit IRC (Read error: Operation timed out)
rocode has joined #urlteam
[22:00]
wabu has quit IRC (Ping timeout: 246 seconds) [22:15]
wabu has joined #urlteam [22:26]
.... (idle for 15mn)
svchfoo1 has quit IRC (Remote host closed the connection)
svchfoo1 has joined #urlteam
svchfoo3 sets mode: +o svchfoo1
[22:41]

↑back Search ←Prev date Next date→ Show only urls(Click on time to select a line by its url)