Time |
Nickname |
Message |
01:21
🔗
|
|
Odd0002_ has joined #urlteam |
01:21
🔗
|
|
Odd0002 has quit IRC (Ping timeout: 600 seconds) |
01:21
🔗
|
|
Odd0002_ is now known as Odd0002 |
08:49
🔗
|
|
figpucker has joined #urlteam |
09:07
🔗
|
|
figpucker has quit IRC (Quit: Leaving) |
09:10
🔗
|
|
figpucker has joined #urlteam |
11:59
🔗
|
|
dashcloud has quit IRC (Read error: Connection reset by peer) |
12:00
🔗
|
|
dashcloud has joined #urlteam |
15:02
🔗
|
|
dashcloud has quit IRC (Read error: Operation timed out) |
15:31
🔗
|
|
dashcloud has joined #urlteam |
17:51
🔗
|
Somebody2 |
JAA: grabbing the wordpress blog links sounds like a lovely idea |
17:52
🔗
|
Somebody2 |
I'll see about setting that up |
17:57
🔗
|
JAA |
Somebody2: How about you teach me how to set it up? :-) |
18:01
🔗
|
Somebody2 |
JAA: even better! |
18:02
🔗
|
Somebody2 |
So, go to the toplevel admin page, https://tracker.archiveteam.org:1338/projects/overview |
18:02
🔗
|
Somebody2 |
and enter the name of the new project in the obvious box. I like to use the name of the shortener, with dots replaced by dashes |
18:03
🔗
|
Somebody2 |
(for, afaik, historical raisins) |
18:03
🔗
|
JAA |
Check |
18:03
🔗
|
Somebody2 |
then you get to the shorterner settings apge |
18:03
🔗
|
Somebody2 |
and you need to set the alphabet, if it isn't default |
18:05
🔗
|
JAA |
Looks like the default settings are fine for this one. |
18:05
🔗
|
JAA |
It returns 301 on success and 404 on failure, but I guess leaving the other codes in doesn't hurt? |
18:06
🔗
|
JAA |
Or would you remove those? |
18:08
🔗
|
Somebody2 |
I'd leave them in, at least to start with. |
18:09
🔗
|
Somebody2 |
You need to change the URL Template line |
18:09
🔗
|
Somebody2 |
JAA: what *is* the format, btw? |
18:11
🔗
|
JAA |
So the shortcode in http://wp.me/code can be either the blog ID in base62 or one of the letters [sPpa] plus the encoded blog ID plus a dash plus the post ID in base62. |
18:12
🔗
|
JAA |
s = "slug", in that case it isn't the post ID but a custom shortcode, e.g. http://wp.me/sf2B5-shorten |
18:12
🔗
|
JAA |
P = page of a post, I think, but I'm not entirely sure about that one. |
18:12
🔗
|
JAA |
p is a link directly to a specific post on the blog. |
18:12
🔗
|
JAA |
And "a" is for attachments to a post. |
18:14
🔗
|
Somebody2 |
Hm. |
18:15
🔗
|
Somebody2 |
More examples, please? |
18:16
🔗
|
JAA |
https://wp.me/a92Te1-q is one which appeared in ArchiveBot yesterday and caused me to hunt all of this down. |
18:16
🔗
|
Somebody2 |
Also, does it work with HEAD requests, or does it require GET requests? |
18:16
🔗
|
JAA |
HEAD works fine. |
18:17
🔗
|
Somebody2 |
Cool -- note that there's an option in the settings for that, which defaults to HEAD |
18:17
🔗
|
JAA |
Yeah, saw that |
18:18
🔗
|
JAA |
I don't have more examples right now, but you can find them on any blog hosted at wordpress.com in the HTML in a <link> tag. |
18:18
🔗
|
Somebody2 |
So, to get the blog ID ones, can we just iterate through a-zA-Z0-9? |
18:18
🔗
|
JAA |
0-9a-zA-Z is the order how it's used in the Wordpress plugin, but yeah. |
18:19
🔗
|
Somebody2 |
Ah, better to change the order, then |
18:19
🔗
|
Somebody2 |
You can do that in the alphabet setting |
18:19
🔗
|
JAA |
Default order is 0-9a-zA-Z though? |
18:19
🔗
|
Somebody2 |
Oh, is it? |
18:19
🔗
|
Somebody2 |
Good. :-) |
18:19
🔗
|
JAA |
:-) |
18:19
🔗
|
JAA |
It'll result in some duplicates because it's really just a base62-encoding. E.g. http://wp.me/02 == http://wp.me/2 |
18:20
🔗
|
Somebody2 |
Hm, that's probably OK. |
18:20
🔗
|
Somebody2 |
OK, so update the "URL template" setting on the Shortern settings page. |
18:21
🔗
|
JAA |
Guess so. Maybe we can skip 0xxxx later on because that would be quite large and unnecessary. |
18:21
🔗
|
JAA |
Yup |
18:21
🔗
|
Somebody2 |
Yes, we can skip over ranges by adjusting where the auto-queue starts from |
18:21
🔗
|
JAA |
Right |
18:21
🔗
|
Somebody2 |
Now on the Queue Settings page, change the Maximum number of items setting to something like 10, to start with. |
18:22
🔗
|
Somebody2 |
You can boost it back up gradually |
18:22
🔗
|
Somebody2 |
Make sure to check the AutoQueue checkbox. |
18:22
🔗
|
Somebody2 |
Then check the Enabled checkbox, and we're good to go! |
18:22
🔗
|
JAA |
Sweet |
18:23
🔗
|
Somebody2 |
A nice semi-hidden feature is that the current time the page was generated is listed in the upper corner; this is helpful for comparing to timestamps on the page |
18:23
🔗
|
Somebody2 |
to see if things are still running as expected. |
18:23
🔗
|
Somebody2 |
A likely nice feature addition would be to enhance all the timestamp displays with relative dates, too |
18:24
🔗
|
Somebody2 |
and we've got results for wp-me!! |
18:24
🔗
|
JAA |
Ah yeah, I saw that. |
18:24
🔗
|
JAA |
\o/ |
18:24
🔗
|
JAA |
I'm not a fan of relative dates. "What is 'about 1 minute ago'? Give me a timestamp!" |
18:26
🔗
|
|
figpucker has quit IRC (Read error: Connection reset by peer) |
18:26
🔗
|
Somebody2 |
Oh, I certainly don't want it to *replace* the timestamps; dear god no. |
18:26
🔗
|
JAA |
Yeah, a setting would be neat. |
18:26
🔗
|
Somebody2 |
But it's nice to have as an addition. |
18:26
🔗
|
JAA |
Also, I love that everything's in UTC here. :-) |
18:27
🔗
|
Somebody2 |
Esspecially if color coded -- "within a minute", "within an hour", "from a previous day" |
18:27
🔗
|
Somebody2 |
As that's usually what I'm looking for -- is this project stuck, and how badly? |
18:27
🔗
|
JAA |
Hm yeah, makes sense. |
18:27
🔗
|
JAA |
So now you'd slowly increase the queue size until it's either large enough or runs into trouble? |
18:28
🔗
|
Somebody2 |
yep! |
18:28
🔗
|
Somebody2 |
I tend to ramp up in units of at least 10, and usually 20 |
18:28
🔗
|
Somebody2 |
You can check the Error Reports page to see "trouble" |
18:29
🔗
|
Somebody2 |
So we're now grabbing 3 character ones -- they don't *seem* to be base62 encodings of the names... |
18:29
🔗
|
Somebody2 |
e.g. 12L maps to aehso |
18:29
🔗
|
JAA |
No, it's the blog ID. |
18:29
🔗
|
JAA |
I'm not sure if that's exposed anywhere really. |
18:29
🔗
|
Somebody2 |
Ah, cool |
18:30
🔗
|
Somebody2 |
well, it's exposed here :-) |
18:30
🔗
|
JAA |
Oh yeah, it's in the HTML somewhere. |
18:30
🔗
|
JAA |
E.g. "siteID":"4618" on https://kidnicky2801.wordpress.com/ (1cu) |
18:35
🔗
|
JAA |
Queue at 80 now. |
18:39
🔗
|
JAA |
100 |
18:39
🔗
|
Somebody2 |
cool, seems fine |
18:40
🔗
|
JAA |
Is there any way to see how much "capacity" we have? |
18:41
🔗
|
JAA |
We have a huge list of shorteners to do, and I'd love to throw additional ones in. |
18:42
🔗
|
JAA |
I still see "no items available currently" errors on my machines, so clearly there's still space for more, but I wonder if there's anything to estimate how *much* more. |
18:43
🔗
|
Somebody2 |
Please create projects for every single shortener you have the relevant information for |
18:43
🔗
|
Somebody2 |
Once we've got them in, we can adjust which ones are running at the same time if needed. |
18:43
🔗
|
Somebody2 |
Our current bottleneck is researching them. |
18:43
🔗
|
JAA |
Guessed so |
18:44
🔗
|
Somebody2 |
Generally we have about 100 warriors, each of which can run about 3 different jobs at once. |
18:44
🔗
|
Somebody2 |
So that's a good ballpark figure for how much capacity we can do simultaneously. |
18:45
🔗
|
Somebody2 |
But if we can consistently hit that, we can likely recruit more warriors. |
18:45
🔗
|
Somebody2 |
And if/when there isn't another job running, suddenly our number of warriors jumps up to 200+ |
18:45
🔗
|
Somebody2 |
another non-URLTeam job |
18:46
🔗
|
JAA |
Makes sense |
18:46
🔗
|
JAA |
I need to add the shorteners to the wiki page manually, correct? |
18:47
🔗
|
Somebody2 |
For now, until someone writes code to do it automatically, yeah. |
18:48
🔗
|
Somebody2 |
I'm going AFK for a bit. |
19:07
🔗
|
JAA |
I'm cleaning up the wiki page now. Still lists various shorteners as active which were deactivated months ago, e.g. cmplx-it. |
19:11
🔗
|
Somebody2 |
JAA: thank you!! |
19:11
🔗
|
Somebody2 |
and we've found over 300,000 wp-me results |
19:12
🔗
|
JAA |
By the way, what's the matter with go-usa-gov? Did anything happen since treyo was here? |
19:12
🔗
|
Somebody2 |
JAA: a couple of days later, it seemed to be blocking us, IIRC. |
19:13
🔗
|
Somebody2 |
have they posted a dump yet? |
19:14
🔗
|
* |
JAA shrugs |
21:02
🔗
|
|
dashcloud has quit IRC (Remote host closed the connection) |
21:03
🔗
|
|
dashcloud has joined #urlteam |
22:21
🔗
|
JAA |
2M wp-me scanned, 1.85M found. :-) |
22:36
🔗
|
Somebody2 |
Yay! |
23:45
🔗
|
|
dashcloud has quit IRC (Read error: Operation timed out) |
23:48
🔗
|
|
dashcloud has joined #urlteam |