[00:13] *** Start has joined #urlteam [00:37] *** WinterFox has joined #urlteam [01:57] Is it worth attempting to map "private" URL shorteners for the project? [01:58] I just noticed a popular japanese goods website, JList, uses jli.st as a short URL in all of their public social media posts in order to link to their website, jlist.com [01:58] but is it worth archiving if the links are all made by jlist for their social media accounts? [02:01] upon further inspection, it appears to be utilizing bit.ly as a backend [02:02] for example, http://jli.st/1XU5gSE works (it was a URL generated by them) [02:02] but if you try something random like http://jli.st/sugoi it redirects to a bit.ly error page [02:09] *** cechk01 has quit IRC (Read error: Connection reset by peer) [02:48] *** W1nterFox has joined #urlteam [02:53] *** WinterFox has quit IRC (Read error: Operation timed out) [03:29] *** JesseW has joined #urlteam [03:30] *** svchfoo1 sets mode: +o JesseW [03:42] *** bwn has quit IRC (Ping timeout: 606 seconds) [03:45] !igset 1cm1ynbk9mm2wnd0o67e3k74p forums [03:45] oops, wrong channel [04:09] cybersec: please do add such private shorteners to the wiki page (under http://archiveteam.org/index.php?title=URLTeam#.22Official.22_shorteners ) and bit.ly aliases in particular under their own section, http://archiveteam.org/index.php?title=URLTeam#bit.ly_aliases [04:10] I certainly think of them as lower priority than multi-source shorteners -- but if/when we have spare time/energy, I think it's certainly worth grabbing them, as they are links that can break if the site decides to stop supporting the shortener (or if the site as a whole goes away, they can provide a useful way to archive *it*) [04:21] chfoo -- thanks for merging my PRs! [04:46] *** aaaaaaaaa has quit IRC (Leaving) [04:50] no problem [04:52] this should let me add a number of shorteners that I couldn't before [05:25] started da-gd at a very low rate (2 queue) per phuzion's request [05:26] chfoo -- http://tracker.archiveteam.org:1337/status is giving a 500 error [05:27] could you let me know what's blowing up? [05:28] the api/live_stats websocket seems to still work... [05:29] JesseW: _tt_tmp = project.location_regex # status.html:55 (via index.html:32, base.html:13) [05:29] dammit, how did my regex miss that one... :-( [05:30] PR coming asap [05:32] https://github.com/ArchiveTeam/terroroftinytown/pull/52 [05:34] chfoo: ping [05:35] i also added you to the urlteam team so you can just commit to develop for changes that don't require too much review [05:35] ah, cool, thanks [05:35] I think this is one of those. :-) [05:37] hm, I don't seem to be able to merge that PR... [05:38] I'm not listed on https://github.com/orgs/ArchiveTeam/people ... [05:39] you have to list yourself as public [05:39] oh, you have to accept the invite first [05:40] ah, that would be it. :-) [05:41] and merged [05:42] so are changes to develop (or master) automatically deployed, or? [05:45] no [05:47] ok. let me know when you deploy the fix to /status, then. :-) [05:59] *** dashcloud has quit IRC (Read error: Operation timed out) [06:02] *** dashcloud has joined #urlteam [06:02] *** svchfoo1 sets mode: +o dashcloud [06:03] 2-gp started, and working! [07:14] da-gd has it's first result! (after only 10,950 searches) [07:14] er, its first [07:15] http://da.gd/102J9 [07:22] starting 2.ly, too. [07:28] ah, I see /status is fixed [07:39] *** bwn has joined #urlteam [07:48] *** Infreq has joined #urlteam [07:53] * JesseW is now going through the catalog of 301works restricted files, and identifying ones for dead shorteners, which, according to the 301works rules, should be made available. [07:53] Hopefully we can get IA to do so. :-) [08:02] *** JesseW has quit IRC (Leaving.) [10:35] Are we archiving links faster then people make new ones? [10:43] sometimes for some shorteners, yes it has happened [14:42] *** W1nterFox has quit IRC (Remote host closed the connection) [16:05] *** Start has quit IRC (Quit: Disconnected.) [17:13] *** JesseW has joined #urlteam [17:13] *** svchfoo1 sets mode: +o JesseW [17:26] *** Start has joined #urlteam [17:28] WinterFox -- yeah, for the smaller incremental ones, I found it took us a few hours to catch up with all the shorturls added in a year. [17:29] For the non-incremental ones, it's harder, as we'd have to re-check the whole possibility space, which (assuming 4 or 5 characters) does take a few months. [17:30] *** JesseW has quit IRC (Leaving.) [18:19] *** asdf has joined #urlteam [18:38] *** Start has quit IRC (Quit: Disconnected.) [18:44] *** Start has joined #urlteam [18:52] *** Start has quit IRC (Ping timeout: 252 seconds) [19:08] *** Start has joined #urlteam [19:15] *** Start has quit IRC (Quit: Disconnected.) [19:22] *** Start has joined #urlteam [20:16] *** aaaaaaaaa has joined #urlteam [20:16] *** swebb sets mode: +o aaaaaaaaa [20:44] *** Start has quit IRC (Quit: Disconnected.) [20:47] I've got another url shortener to crawl [20:47] Oh, it appears to be on the wiki already, nevermind [20:48] *** Start has joined #urlteam [20:52] *** bwn has quit IRC (Read error: Operation timed out) [21:08] phuzion: which one? [21:08] JW_work: nig.gr [21:09] Oh, yeah, we've grabbed that multiple times. [21:10] JW_work: When's the last time it was crawled? [21:11] check the wiki page [21:12] Ah, right, apparently this month according to the wiki. Ok, cool. [21:13] btw, can I increase the load on da-gd? say, up to 10qps = 20 item queue? [21:13] have you heard any complaints from your friend? [21:16] I asked, but he didn't respond. I'd say speed it up a bit, and I'll let you know if he hollers [21:17] ok, I'll boost it to 20 item queue tonight [21:20] ok, if he shouts, I'll let you or someone else know [21:21] me, arkiver or chfoo [21:21] got it [21:26] *** bwn has joined #urlteam [21:55] Is there a relatively easy method to test that a URL shortener is easily crawlable before submitting it to the wiki? [21:57] Please add them to the wiki no matter how easy or hard they are to crawl — I want to make that wiki page as complete a directory of ALL THE SHORTENERS as we can get. [21:57] Fair enough. [21:57] That said, here's what I do to research shorteners: [22:00] 1) Check what the homepage looks like — is there an easy way to create a shorturl yourself? If so, make one, and include it on the wiki entry. If not, note that. [22:00] 2) Do a web search for examples of the shortener's short URLs. If you find one, include it. [22:01] 3) If you've found an example, run `curl -I http://the.short.url.you.found` and look at what HTTP status code it returns (and if it gives the destination in the Location: header). [22:02] 4) Do curl -I on a non-existing shorcode, and record what that HTTP status code is. [22:02] Also, when looking at the homepage, if it happens to mention how many total shorturls they have, note that down. [22:03] That's about it. [22:03] Awesome. Thanks. Want me to document that on the wiki page? [22:03] Yes please! [22:05] another good idea is to check which ip addresses it resolves to. Lets you know if they just have a different domain name for an existing shortner. [22:06] Ah, good point. Please mention that, too. [22:07] But even if it's the same IP, it may be a different shortener — it's good to check a few shortcodes and make sure they are consistent, too. [22:16] *** Start has quit IRC (Quit: Disconnected.) [22:23] In regards to our work on da.gd: "I haven't noticed anything too crazy" [22:31] *** cechk01 has joined #urlteam [22:34] good [22:37] You didn't crank up the speed on it yet, have you? [22:39] nope [22:41] ok, was just wondering. [22:46] JW_work: I've documented the basics, if there's anything else on there that I should add, let me know. http://archiveteam.org/index.php?title=URLTeam#Researching_URL_Shorteners [22:47] a working shorturl is *not* required [22:48] I'm glad for people to add a domain name they /think/ might at some time in the past have given out shorturls, even if they can't find one. [22:49] otherwise, it looks great! [22:49] thanks very much for writing it up onto the wiki [22:50] you might also mention keeping the Alive list alphabetized, and including the current date [22:50] (which you can do by putting 5 tildes, ~~~~~) [22:55] JW_work: http://archiveteam.org/index.php?title=URLTeam&diff=24930&oldid=24929 [22:56] * phuzion heads out [22:56] looks good, thanks again [23:22] *** Start has joined #urlteam [23:22] *** WinterFox has joined #urlteam [23:37] *** bwn_ has joined #urlteam [23:42] *** Fusl has quit IRC (Ping timeout: 255 seconds) [23:42] *** Fusl has joined #urlteam [23:44] *** bwn has quit IRC (Read error: Operation timed out) [23:48] *** svchfoo1 has quit IRC (Ping timeout: 369 seconds) [23:54] *** joepie91 has quit IRC (Ping timeout: 369 seconds) [23:56] *** W1nterFox has joined #urlteam [23:56] *** joepie91 has joined #urlteam [23:56] *** svchfoo3 sets mode: +o joepie91 [23:56] *** WinterFox has quit IRC (Ping timeout: 1221 seconds)