#urlteam 2017-10-07,Sat

↑back Search

Time Nickname Message
01:21 🔗 Odd0002_ has joined #urlteam
01:21 🔗 Odd0002 has quit IRC (Ping timeout: 600 seconds)
01:21 🔗 Odd0002_ is now known as Odd0002
08:49 🔗 figpucker has joined #urlteam
09:07 🔗 figpucker has quit IRC (Quit: Leaving)
09:10 🔗 figpucker has joined #urlteam
11:59 🔗 dashcloud has quit IRC (Read error: Connection reset by peer)
12:00 🔗 dashcloud has joined #urlteam
15:02 🔗 dashcloud has quit IRC (Read error: Operation timed out)
15:31 🔗 dashcloud has joined #urlteam
17:51 🔗 Somebody2 JAA: grabbing the wordpress blog links sounds like a lovely idea
17:52 🔗 Somebody2 I'll see about setting that up
17:57 🔗 JAA Somebody2: How about you teach me how to set it up? :-)
18:01 🔗 Somebody2 JAA: even better!
18:02 🔗 Somebody2 So, go to the toplevel admin page, https://tracker.archiveteam.org:1338/projects/overview
18:02 🔗 Somebody2 and enter the name of the new project in the obvious box. I like to use the name of the shortener, with dots replaced by dashes
18:03 🔗 Somebody2 (for, afaik, historical raisins)
18:03 🔗 JAA Check
18:03 🔗 Somebody2 then you get to the shorterner settings apge
18:03 🔗 Somebody2 and you need to set the alphabet, if it isn't default
18:05 🔗 JAA Looks like the default settings are fine for this one.
18:05 🔗 JAA It returns 301 on success and 404 on failure, but I guess leaving the other codes in doesn't hurt?
18:06 🔗 JAA Or would you remove those?
18:08 🔗 Somebody2 I'd leave them in, at least to start with.
18:09 🔗 Somebody2 You need to change the URL Template line
18:09 🔗 Somebody2 JAA: what *is* the format, btw?
18:11 🔗 JAA So the shortcode in http://wp.me/code can be either the blog ID in base62 or one of the letters [sPpa] plus the encoded blog ID plus a dash plus the post ID in base62.
18:12 🔗 JAA s = "slug", in that case it isn't the post ID but a custom shortcode, e.g. http://wp.me/sf2B5-shorten
18:12 🔗 JAA P = page of a post, I think, but I'm not entirely sure about that one.
18:12 🔗 JAA p is a link directly to a specific post on the blog.
18:12 🔗 JAA And "a" is for attachments to a post.
18:14 🔗 Somebody2 Hm.
18:15 🔗 Somebody2 More examples, please?
18:16 🔗 JAA https://wp.me/a92Te1-q is one which appeared in ArchiveBot yesterday and caused me to hunt all of this down.
18:16 🔗 Somebody2 Also, does it work with HEAD requests, or does it require GET requests?
18:16 🔗 JAA HEAD works fine.
18:17 🔗 Somebody2 Cool -- note that there's an option in the settings for that, which defaults to HEAD
18:17 🔗 JAA Yeah, saw that
18:18 🔗 JAA I don't have more examples right now, but you can find them on any blog hosted at wordpress.com in the HTML in a <link> tag.
18:18 🔗 Somebody2 So, to get the blog ID ones, can we just iterate through a-zA-Z0-9?
18:18 🔗 JAA 0-9a-zA-Z is the order how it's used in the Wordpress plugin, but yeah.
18:19 🔗 Somebody2 Ah, better to change the order, then
18:19 🔗 Somebody2 You can do that in the alphabet setting
18:19 🔗 JAA Default order is 0-9a-zA-Z though?
18:19 🔗 Somebody2 Oh, is it?
18:19 🔗 Somebody2 Good. :-)
18:19 🔗 JAA :-)
18:19 🔗 JAA It'll result in some duplicates because it's really just a base62-encoding. E.g. http://wp.me/02 == http://wp.me/2
18:20 🔗 Somebody2 Hm, that's probably OK.
18:20 🔗 Somebody2 OK, so update the "URL template" setting on the Shortern settings page.
18:21 🔗 JAA Guess so. Maybe we can skip 0xxxx later on because that would be quite large and unnecessary.
18:21 🔗 JAA Yup
18:21 🔗 Somebody2 Yes, we can skip over ranges by adjusting where the auto-queue starts from
18:21 🔗 JAA Right
18:21 🔗 Somebody2 Now on the Queue Settings page, change the Maximum number of items setting to something like 10, to start with.
18:22 🔗 Somebody2 You can boost it back up gradually
18:22 🔗 Somebody2 Make sure to check the AutoQueue checkbox.
18:22 🔗 Somebody2 Then check the Enabled checkbox, and we're good to go!
18:22 🔗 JAA Sweet
18:23 🔗 Somebody2 A nice semi-hidden feature is that the current time the page was generated is listed in the upper corner; this is helpful for comparing to timestamps on the page
18:23 🔗 Somebody2 to see if things are still running as expected.
18:23 🔗 Somebody2 A likely nice feature addition would be to enhance all the timestamp displays with relative dates, too
18:24 🔗 Somebody2 and we've got results for wp-me!!
18:24 🔗 JAA Ah yeah, I saw that.
18:24 🔗 JAA \o/
18:24 🔗 JAA I'm not a fan of relative dates. "What is 'about 1 minute ago'? Give me a timestamp!"
18:26 🔗 figpucker has quit IRC (Read error: Connection reset by peer)
18:26 🔗 Somebody2 Oh, I certainly don't want it to *replace* the timestamps; dear god no.
18:26 🔗 JAA Yeah, a setting would be neat.
18:26 🔗 Somebody2 But it's nice to have as an addition.
18:26 🔗 JAA Also, I love that everything's in UTC here. :-)
18:27 🔗 Somebody2 Esspecially if color coded -- "within a minute", "within an hour", "from a previous day"
18:27 🔗 Somebody2 As that's usually what I'm looking for -- is this project stuck, and how badly?
18:27 🔗 JAA Hm yeah, makes sense.
18:27 🔗 JAA So now you'd slowly increase the queue size until it's either large enough or runs into trouble?
18:28 🔗 Somebody2 yep!
18:28 🔗 Somebody2 I tend to ramp up in units of at least 10, and usually 20
18:28 🔗 Somebody2 You can check the Error Reports page to see "trouble"
18:29 🔗 Somebody2 So we're now grabbing 3 character ones -- they don't *seem* to be base62 encodings of the names...
18:29 🔗 Somebody2 e.g. 12L maps to aehso
18:29 🔗 JAA No, it's the blog ID.
18:29 🔗 JAA I'm not sure if that's exposed anywhere really.
18:29 🔗 Somebody2 Ah, cool
18:30 🔗 Somebody2 well, it's exposed here :-)
18:30 🔗 JAA Oh yeah, it's in the HTML somewhere.
18:30 🔗 JAA E.g. "siteID":"4618" on https://kidnicky2801.wordpress.com/ (1cu)
18:35 🔗 JAA Queue at 80 now.
18:39 🔗 JAA 100
18:39 🔗 Somebody2 cool, seems fine
18:40 🔗 JAA Is there any way to see how much "capacity" we have?
18:41 🔗 JAA We have a huge list of shorteners to do, and I'd love to throw additional ones in.
18:42 🔗 JAA I still see "no items available currently" errors on my machines, so clearly there's still space for more, but I wonder if there's anything to estimate how *much* more.
18:43 🔗 Somebody2 Please create projects for every single shortener you have the relevant information for
18:43 🔗 Somebody2 Once we've got them in, we can adjust which ones are running at the same time if needed.
18:43 🔗 Somebody2 Our current bottleneck is researching them.
18:43 🔗 JAA Guessed so
18:44 🔗 Somebody2 Generally we have about 100 warriors, each of which can run about 3 different jobs at once.
18:44 🔗 Somebody2 So that's a good ballpark figure for how much capacity we can do simultaneously.
18:45 🔗 Somebody2 But if we can consistently hit that, we can likely recruit more warriors.
18:45 🔗 Somebody2 And if/when there isn't another job running, suddenly our number of warriors jumps up to 200+
18:45 🔗 Somebody2 another non-URLTeam job
18:46 🔗 JAA Makes sense
18:46 🔗 JAA I need to add the shorteners to the wiki page manually, correct?
18:47 🔗 Somebody2 For now, until someone writes code to do it automatically, yeah.
18:48 🔗 Somebody2 I'm going AFK for a bit.
19:07 🔗 JAA I'm cleaning up the wiki page now. Still lists various shorteners as active which were deactivated months ago, e.g. cmplx-it.
19:11 🔗 Somebody2 JAA: thank you!!
19:11 🔗 Somebody2 and we've found over 300,000 wp-me results
19:12 🔗 JAA By the way, what's the matter with go-usa-gov? Did anything happen since treyo was here?
19:12 🔗 Somebody2 JAA: a couple of days later, it seemed to be blocking us, IIRC.
19:13 🔗 Somebody2 have they posted a dump yet?
19:14 🔗 * JAA shrugs
21:02 🔗 dashcloud has quit IRC (Remote host closed the connection)
21:03 🔗 dashcloud has joined #urlteam
22:21 🔗 JAA 2M wp-me scanned, 1.85M found. :-)
22:36 🔗 Somebody2 Yay!
23:45 🔗 dashcloud has quit IRC (Read error: Operation timed out)
23:48 🔗 dashcloud has joined #urlteam

irclogger-viewer