#urlteam 2017-10-07,Sat

↑back Search ←Prev date Next date→ Show only urls(Click on time to select a line by its url)

WhoWhatWhen
***Odd0002_ has joined #urlteam
Odd0002 has quit IRC (Ping timeout: 600 seconds)
Odd0002_ is now known as Odd0002
[01:21]
.......................................................................................... (idle for 7h28mn)
figpucker has joined #urlteam [08:49]
.... (idle for 18mn)
figpucker has quit IRC (Quit: Leaving)
figpucker has joined #urlteam
[09:07]
.................................. (idle for 2h49mn)
dashcloud has quit IRC (Read error: Connection reset by peer)
dashcloud has joined #urlteam
[11:59]
..................................... (idle for 3h2mn)
dashcloud has quit IRC (Read error: Operation timed out) [15:02]
...... (idle for 29mn)
dashcloud has joined #urlteam [15:31]
............................. (idle for 2h20mn)
Somebody2JAA: grabbing the wordpress blog links sounds like a lovely idea
I'll see about setting that up
[17:51]
JAASomebody2: How about you teach me how to set it up? :-) [17:57]
Somebody2JAA: even better!
So, go to the toplevel admin page, https://tracker.archiveteam.org:1338/projects/overview
and enter the name of the new project in the obvious box. I like to use the name of the shortener, with dots replaced by dashes
(for, afaik, historical raisins)
[18:01]
JAACheck [18:03]
Somebody2then you get to the shorterner settings apge
and you need to set the alphabet, if it isn't default
[18:03]
JAALooks like the default settings are fine for this one.
It returns 301 on success and 404 on failure, but I guess leaving the other codes in doesn't hurt?
Or would you remove those?
[18:05]
Somebody2I'd leave them in, at least to start with.
You need to change the URL Template line
JAA: what *is* the format, btw?
[18:08]
JAASo the shortcode in http://wp.me/code can be either the blog ID in base62 or one of the letters [sPpa] plus the encoded blog ID plus a dash plus the post ID in base62.
s = "slug", in that case it isn't the post ID but a custom shortcode, e.g. http://wp.me/sf2B5-shorten
P = page of a post, I think, but I'm not entirely sure about that one.
p is a link directly to a specific post on the blog.
And "a" is for attachments to a post.
[18:11]
Somebody2Hm.
More examples, please?
[18:14]
JAAhttps://wp.me/a92Te1-q is one which appeared in ArchiveBot yesterday and caused me to hunt all of this down. [18:16]
Somebody2Also, does it work with HEAD requests, or does it require GET requests? [18:16]
JAAHEAD works fine. [18:16]
Somebody2Cool -- note that there's an option in the settings for that, which defaults to HEAD [18:17]
JAAYeah, saw that
I don't have more examples right now, but you can find them on any blog hosted at wordpress.com in the HTML in a <link> tag.
[18:17]
Somebody2So, to get the blog ID ones, can we just iterate through a-zA-Z0-9? [18:18]
JAA0-9a-zA-Z is the order how it's used in the Wordpress plugin, but yeah. [18:18]
Somebody2Ah, better to change the order, then
You can do that in the alphabet setting
[18:19]
JAADefault order is 0-9a-zA-Z though? [18:19]
Somebody2Oh, is it?
Good. :-)
[18:19]
JAA:-)
It'll result in some duplicates because it's really just a base62-encoding. E.g. http://wp.me/02 == http://wp.me/2
[18:19]
Somebody2Hm, that's probably OK.
OK, so update the "URL template" setting on the Shortern settings page.
[18:20]
JAAGuess so. Maybe we can skip 0xxxx later on because that would be quite large and unnecessary.
Yup
[18:21]
Somebody2Yes, we can skip over ranges by adjusting where the auto-queue starts from [18:21]
JAARight [18:21]
Somebody2Now on the Queue Settings page, change the Maximum number of items setting to something like 10, to start with.
You can boost it back up gradually
Make sure to check the AutoQueue checkbox.
Then check the Enabled checkbox, and we're good to go!
[18:21]
JAASweet [18:22]
Somebody2A nice semi-hidden feature is that the current time the page was generated is listed in the upper corner; this is helpful for comparing to timestamps on the page
to see if things are still running as expected.
A likely nice feature addition would be to enhance all the timestamp displays with relative dates, too
and we've got results for wp-me!!
[18:23]
JAAAh yeah, I saw that.
\o/
I'm not a fan of relative dates. "What is 'about 1 minute ago'? Give me a timestamp!"
[18:24]
***figpucker has quit IRC (Read error: Connection reset by peer) [18:26]
Somebody2Oh, I certainly don't want it to *replace* the timestamps; dear god no. [18:26]
JAAYeah, a setting would be neat. [18:26]
Somebody2But it's nice to have as an addition. [18:26]
JAAAlso, I love that everything's in UTC here. :-) [18:26]
Somebody2Esspecially if color coded -- "within a minute", "within an hour", "from a previous day"
As that's usually what I'm looking for -- is this project stuck, and how badly?
[18:27]
JAAHm yeah, makes sense.
So now you'd slowly increase the queue size until it's either large enough or runs into trouble?
[18:27]
Somebody2yep!
I tend to ramp up in units of at least 10, and usually 20
You can check the Error Reports page to see "trouble"
So we're now grabbing 3 character ones -- they don't *seem* to be base62 encodings of the names...
e.g. 12L maps to aehso
[18:28]
JAANo, it's the blog ID.
I'm not sure if that's exposed anywhere really.
[18:29]
Somebody2Ah, cool
well, it's exposed here :-)
[18:29]
JAAOh yeah, it's in the HTML somewhere.
E.g. "siteID":"4618" on https://kidnicky2801.wordpress.com/ (1cu)
[18:30]
Queue at 80 now.
100
[18:35]
Somebody2cool, seems fine [18:39]
JAAIs there any way to see how much "capacity" we have?
We have a huge list of shorteners to do, and I'd love to throw additional ones in.
I still see "no items available currently" errors on my machines, so clearly there's still space for more, but I wonder if there's anything to estimate how *much* more.
[18:40]
Somebody2Please create projects for every single shortener you have the relevant information for
Once we've got them in, we can adjust which ones are running at the same time if needed.
Our current bottleneck is researching them.
[18:43]
JAAGuessed so [18:43]
Somebody2Generally we have about 100 warriors, each of which can run about 3 different jobs at once.
So that's a good ballpark figure for how much capacity we can do simultaneously.
But if we can consistently hit that, we can likely recruit more warriors.
And if/when there isn't another job running, suddenly our number of warriors jumps up to 200+
another non-URLTeam job
[18:44]
JAAMakes sense
I need to add the shorteners to the wiki page manually, correct?
[18:46]
Somebody2For now, until someone writes code to do it automatically, yeah.
I'm going AFK for a bit.
[18:47]
.... (idle for 19mn)
JAAI'm cleaning up the wiki page now. Still lists various shorteners as active which were deactivated months ago, e.g. cmplx-it. [19:07]
Somebody2JAA: thank you!!
and we've found over 300,000 wp-me results
[19:11]
JAABy the way, what's the matter with go-usa-gov? Did anything happen since treyo was here? [19:12]
Somebody2JAA: a couple of days later, it seemed to be blocking us, IIRC.
have they posted a dump yet?
[19:12]
JAAJAA shrugs [19:14]
...................... (idle for 1h48mn)
***dashcloud has quit IRC (Remote host closed the connection)
dashcloud has joined #urlteam
[21:02]
................ (idle for 1h18mn)
JAA2M wp-me scanned, 1.85M found. :-) [22:21]
.... (idle for 15mn)
Somebody2Yay! [22:36]
.............. (idle for 1h9mn)
***dashcloud has quit IRC (Read error: Operation timed out)
dashcloud has joined #urlteam
[23:45]

↑back Search ←Prev date Next date→ Show only urls(Click on time to select a line by its url)