#urlteam 2017-09-18,Mon

↑back Search

Time Nickname Message
02:40 🔗 dashcloud has quit IRC (Read error: Operation timed out)
02:41 🔗 dashcloud has joined #urlteam
03:26 🔗 Aerochron has joined #urlteam
03:26 🔗 Aerochron Hi. Can anyone help explain what exactly this project is?
03:26 🔗 Aerochron I mean, I get the point of it but I am not sure what the warrior is actually doing when it runs.
03:39 🔗 Somebody2 Aerochron: what the warrior does is process "items" sent out by a central tracker server. Each item consists of a small range of URLs to try.
03:40 🔗 Somebody2 The warrior tries each one, and (for the ones that redirect) extracts the target URL. When it tries all the ones in the item,
03:40 🔗 Aerochron So the VM is more or less wardialing as many shortened URLs as possible?
03:40 🔗 Somebody2 it reports back to the tracker with the results, and gets another item to work on.
03:40 🔗 Somebody2 Aerochron: yep!
03:41 🔗 Somebody2 By default, we work in order -- but for some shortening services, we have custom code that works in a pseudo-random order.
03:41 🔗 Somebody2 (But I don't know much about the details of that)
03:42 🔗 Somebody2 If a warrior gets and unexpected response, it reports that back to the tracker -- and if a particular item consistently fails,
03:42 🔗 astrid it tries to stay below the ratelimit at which the shortener will ban your ip
03:42 🔗 Somebody2 eventually the tracker will alert, and demand manual intervention before it will continue handing out items.
03:43 🔗 Somebody2 yes, we have a number of different knobs to adjust to stay below ratelimits
03:44 🔗 Somebody2 the tracker batches up the results, and approxemately once a day or so, it uploads them (in a semi-silly format) to archive.org
03:44 🔗 Aerochron Ah. Also, what does PAUSED mean when it is next to a project? Is the project unavailable for a time?
03:44 🔗 Somebody2 where they can downloaded, or torrented (seeding is very welcome!)
03:44 🔗 Somebody2 PAUSED means that we had previously scraped that shortening service, but we aren't doing so right now.
03:45 🔗 Somebody2 Generally either because the service has died, or we grabbed all of it, or they blocked us, or something like that.
03:45 🔗 Somebody2 I assume you've seen the wiki page: http://archiveteam.org/index.php?title=URLTeam
03:46 🔗 Somebody2 (help better organizing it is also very welcome)
08:06 🔗 Jonison has joined #urlteam
09:02 🔗 Jonison has quit IRC (Read error: Connection reset by peer)
09:39 🔗 Jonison has joined #urlteam
11:32 🔗 dashcloud has quit IRC (Read error: Operation timed out)
11:39 🔗 dashcloud has joined #urlteam
11:51 🔗 mls has quit IRC (Ping timeout: 250 seconds)
12:03 🔗 mls has joined #urlteam
13:01 🔗 Jonison has quit IRC (Ping timeout: 260 seconds)
13:09 🔗 mls has quit IRC (Ping timeout: 250 seconds)
13:30 🔗 dashcloud has quit IRC (Read error: Operation timed out)
13:44 🔗 mls has joined #urlteam
13:50 🔗 dashcloud has joined #urlteam
17:50 🔗 dashcloud has quit IRC (Read error: Operation timed out)
17:50 🔗 dashcloud has joined #urlteam
18:46 🔗 ix has quit IRC (Quit: oh)
19:27 🔗 JAA It looks like X.vu has started redirecting everything to HTTPS.
19:28 🔗 hook54321 And to the shutdown notice page
19:28 🔗 hook54321 I'm assuming the effort they're referring to is us :P
19:29 🔗 JAA I've paused x-vu for now.
19:34 🔗 JAA So currently http://x.vu/<shortcode> redirects to https://x.vu/<shortcode> unconditionally, and https://x.vu/<shortcode> always responds with a 301, either to /shutdown/ if the code is invalid or to /shutdown/?url=<target> if the code exists.
19:35 🔗 JAA I don't know how to adjust the settings to account for that, so I'll let Somebody2 handle that.
19:35 🔗 JAA We'll have to redo tons of codes, too.
19:36 🔗 JAA hook54321: No, I think they were referring to our archival efforts in general.
19:38 🔗 JAA By the way, they also told me that they're trying "to keep the redirects working for at least a couple more years".
19:39 🔗 JAA (On the shutdown notice, they just say "as long as possible".)
19:57 🔗 ix has joined #urlteam
20:00 🔗 JAA Somebody2: Regarding vgd_6: v.gd does some weird user agent magic. When you access it with a browser, you get an interstitial page "The link you followed has been shortened with v.gd. blahblahblah". If you use Curl, you get a 30x. It looks like they detect our user agent as a browser and return the interstitial page.
20:01 🔗 JAA It looks like the user agent is hardcoded in terroroftinytown-client-grab, so I guess we'll have to extract the URL from the HTML instead.
20:03 🔗 JAA You can also get around it by setting a cookie preview=1, but that doesn't seem to be supported by terroroftinytown either.
20:04 🔗 JAA E.g. curl -vA 'ArchiveTeam Warrior/0.9.2' -b 'preview=1' https://v.gd/FFF54S
22:00 🔗 liam has quit IRC (Read error: Operation timed out)
22:00 🔗 JAA has quit IRC (Read error: Operation timed out)
22:00 🔗 Aerochrom has joined #urlteam
22:01 🔗 rocode has quit IRC (Read error: Operation timed out)
22:01 🔗 bobazY has quit IRC (Read error: Operation timed out)
22:01 🔗 bobazY has joined #urlteam
22:02 🔗 JAA has joined #urlteam
22:02 🔗 liam has joined #urlteam
22:02 🔗 svchfoo3 sets mode: +o JAA
22:02 🔗 svchfoo1 sets mode: +o JAA
22:04 🔗 Aerochron has quit IRC (Read error: Operation timed out)
22:06 🔗 rocode has joined #urlteam
22:15 🔗 wabu has quit IRC (Ping timeout: 246 seconds)
22:26 🔗 wabu has joined #urlteam
22:41 🔗 svchfoo1 has quit IRC (Remote host closed the connection)
22:42 🔗 svchfoo1 has joined #urlteam
22:43 🔗 svchfoo3 sets mode: +o svchfoo1

irclogger-viewer