[02:40] *** dashcloud has quit IRC (Read error: Operation timed out) [02:41] *** dashcloud has joined #urlteam [03:26] *** Aerochron has joined #urlteam [03:26] Hi. Can anyone help explain what exactly this project is? [03:26] I mean, I get the point of it but I am not sure what the warrior is actually doing when it runs. [03:39] Aerochron: what the warrior does is process "items" sent out by a central tracker server. Each item consists of a small range of URLs to try. [03:40] The warrior tries each one, and (for the ones that redirect) extracts the target URL. When it tries all the ones in the item, [03:40] So the VM is more or less wardialing as many shortened URLs as possible? [03:40] it reports back to the tracker with the results, and gets another item to work on. [03:40] Aerochron: yep! [03:41] By default, we work in order -- but for some shortening services, we have custom code that works in a pseudo-random order. [03:41] (But I don't know much about the details of that) [03:42] If a warrior gets and unexpected response, it reports that back to the tracker -- and if a particular item consistently fails, [03:42] it tries to stay below the ratelimit at which the shortener will ban your ip [03:42] eventually the tracker will alert, and demand manual intervention before it will continue handing out items. [03:43] yes, we have a number of different knobs to adjust to stay below ratelimits [03:44] the tracker batches up the results, and approxemately once a day or so, it uploads them (in a semi-silly format) to archive.org [03:44] Ah. Also, what does PAUSED mean when it is next to a project? Is the project unavailable for a time? [03:44] where they can downloaded, or torrented (seeding is very welcome!) [03:44] PAUSED means that we had previously scraped that shortening service, but we aren't doing so right now. [03:45] Generally either because the service has died, or we grabbed all of it, or they blocked us, or something like that. [03:45] I assume you've seen the wiki page: http://archiveteam.org/index.php?title=URLTeam [03:46] (help better organizing it is also very welcome) [08:06] *** Jonison has joined #urlteam [09:02] *** Jonison has quit IRC (Read error: Connection reset by peer) [09:39] *** Jonison has joined #urlteam [11:32] *** dashcloud has quit IRC (Read error: Operation timed out) [11:39] *** dashcloud has joined #urlteam [11:51] *** mls has quit IRC (Ping timeout: 250 seconds) [12:03] *** mls has joined #urlteam [13:01] *** Jonison has quit IRC (Ping timeout: 260 seconds) [13:09] *** mls has quit IRC (Ping timeout: 250 seconds) [13:30] *** dashcloud has quit IRC (Read error: Operation timed out) [13:44] *** mls has joined #urlteam [13:50] *** dashcloud has joined #urlteam [17:50] *** dashcloud has quit IRC (Read error: Operation timed out) [17:50] *** dashcloud has joined #urlteam [18:46] *** ix has quit IRC (Quit: oh) [19:27] It looks like X.vu has started redirecting everything to HTTPS. [19:28] And to the shutdown notice page [19:28] I'm assuming the effort they're referring to is us :P [19:29] I've paused x-vu for now. [19:34] So currently http://x.vu/ redirects to https://x.vu/ unconditionally, and https://x.vu/ always responds with a 301, either to /shutdown/ if the code is invalid or to /shutdown/?url= if the code exists. [19:35] I don't know how to adjust the settings to account for that, so I'll let Somebody2 handle that. [19:35] We'll have to redo tons of codes, too. [19:36] hook54321: No, I think they were referring to our archival efforts in general. [19:38] By the way, they also told me that they're trying "to keep the redirects working for at least a couple more years". [19:39] (On the shutdown notice, they just say "as long as possible".) [19:57] *** ix has joined #urlteam [20:00] Somebody2: Regarding vgd_6: v.gd does some weird user agent magic. When you access it with a browser, you get an interstitial page "The link you followed has been shortened with v.gd. blahblahblah". If you use Curl, you get a 30x. It looks like they detect our user agent as a browser and return the interstitial page. [20:01] It looks like the user agent is hardcoded in terroroftinytown-client-grab, so I guess we'll have to extract the URL from the HTML instead. [20:03] You can also get around it by setting a cookie preview=1, but that doesn't seem to be supported by terroroftinytown either. [20:04] E.g. curl -vA 'ArchiveTeam Warrior/0.9.2' -b 'preview=1' https://v.gd/FFF54S [22:00] *** liam has quit IRC (Read error: Operation timed out) [22:00] *** JAA has quit IRC (Read error: Operation timed out) [22:00] *** Aerochrom has joined #urlteam [22:01] *** rocode has quit IRC (Read error: Operation timed out) [22:01] *** bobazY has quit IRC (Read error: Operation timed out) [22:01] *** bobazY has joined #urlteam [22:02] *** JAA has joined #urlteam [22:02] *** liam has joined #urlteam [22:02] *** svchfoo3 sets mode: +o JAA [22:02] *** svchfoo1 sets mode: +o JAA [22:04] *** Aerochron has quit IRC (Read error: Operation timed out) [22:06] *** rocode has joined #urlteam [22:15] *** wabu has quit IRC (Ping timeout: 246 seconds) [22:26] *** wabu has joined #urlteam [22:41] *** svchfoo1 has quit IRC (Remote host closed the connection) [22:42] *** svchfoo1 has joined #urlteam [22:43] *** svchfoo3 sets mode: +o svchfoo1