#urlteam 2015-12-15,Tue

↑back Search

Time Nickname Message
00:13 🔗 Start has joined #urlteam
00:37 🔗 WinterFox has joined #urlteam
01:57 🔗 cybersec Is it worth attempting to map "private" URL shorteners for the project?
01:58 🔗 cybersec I just noticed a popular japanese goods website, JList, uses jli.st as a short URL in all of their public social media posts in order to link to their website, jlist.com
01:58 🔗 cybersec but is it worth archiving if the links are all made by jlist for their social media accounts?
02:01 🔗 cybersec upon further inspection, it appears to be utilizing bit.ly as a backend
02:02 🔗 cybersec for example, http://jli.st/1XU5gSE works (it was a URL generated by them)
02:02 🔗 cybersec but if you try something random like http://jli.st/sugoi it redirects to a bit.ly error page
02:09 🔗 cechk01 has quit IRC (Read error: Connection reset by peer)
02:48 🔗 W1nterFox has joined #urlteam
02:53 🔗 WinterFox has quit IRC (Read error: Operation timed out)
03:29 🔗 JesseW has joined #urlteam
03:30 🔗 svchfoo1 sets mode: +o JesseW
03:42 🔗 bwn has quit IRC (Ping timeout: 606 seconds)
03:45 🔗 JesseW !igset 1cm1ynbk9mm2wnd0o67e3k74p forums
03:45 🔗 JesseW oops, wrong channel
04:09 🔗 JesseW cybersec: please do add such private shorteners to the wiki page (under http://archiveteam.org/index.php?title=URLTeam#.22Official.22_shorteners ) and bit.ly aliases in particular under their own section, http://archiveteam.org/index.php?title=URLTeam#bit.ly_aliases
04:10 🔗 JesseW I certainly think of them as lower priority than multi-source shorteners -- but if/when we have spare time/energy, I think it's certainly worth grabbing them, as they are links that can break if the site decides to stop supporting the shortener (or if the site as a whole goes away, they can provide a useful way to archive *it*)
04:21 🔗 JesseW chfoo -- thanks for merging my PRs!
04:46 🔗 aaaaaaaaa has quit IRC (Leaving)
04:50 🔗 chfoo no problem
04:52 🔗 JesseW this should let me add a number of shorteners that I couldn't before
05:25 🔗 JesseW started da-gd at a very low rate (2 queue) per phuzion's request
05:26 🔗 JesseW chfoo -- http://tracker.archiveteam.org:1337/status is giving a 500 error
05:27 🔗 JesseW could you let me know what's blowing up?
05:28 🔗 JesseW the api/live_stats websocket seems to still work...
05:29 🔗 chfoo JesseW: _tt_tmp = project.location_regex # status.html:55 (via index.html:32, base.html:13)
05:29 🔗 JesseW dammit, how did my regex miss that one... :-(
05:30 🔗 JesseW PR coming asap
05:32 🔗 JesseW https://github.com/ArchiveTeam/terroroftinytown/pull/52
05:34 🔗 JesseW chfoo: ping
05:35 🔗 chfoo i also added you to the urlteam team so you can just commit to develop for changes that don't require too much review
05:35 🔗 JesseW ah, cool, thanks
05:35 🔗 JesseW I think this is one of those. :-)
05:37 🔗 JesseW hm, I don't seem to be able to merge that PR...
05:38 🔗 JesseW I'm not listed on https://github.com/orgs/ArchiveTeam/people ...
05:39 🔗 chfoo you have to list yourself as public
05:39 🔗 chfoo oh, you have to accept the invite first
05:40 🔗 JesseW ah, that would be it. :-)
05:41 🔗 JesseW and merged
05:42 🔗 JesseW so are changes to develop (or master) automatically deployed, or?
05:45 🔗 chfoo no
05:47 🔗 JesseW ok. let me know when you deploy the fix to /status, then. :-)
05:59 🔗 dashcloud has quit IRC (Read error: Operation timed out)
06:02 🔗 dashcloud has joined #urlteam
06:02 🔗 svchfoo1 sets mode: +o dashcloud
06:03 🔗 JesseW 2-gp started, and working!
07:14 🔗 JesseW da-gd has it's first result! (after only 10,950 searches)
07:14 🔗 JesseW er, its first
07:15 🔗 JesseW http://da.gd/102J9
07:22 🔗 JesseW starting 2.ly, too.
07:28 🔗 JesseW ah, I see /status is fixed
07:39 🔗 bwn has joined #urlteam
07:48 🔗 Infreq has joined #urlteam
07:53 🔗 * JesseW is now going through the catalog of 301works restricted files, and identifying ones for dead shorteners, which, according to the 301works rules, should be made available.
07:53 🔗 JesseW Hopefully we can get IA to do so. :-)
08:02 🔗 JesseW has quit IRC (Leaving.)
10:35 🔗 W1nterFox Are we archiving links faster then people make new ones?
10:43 🔗 ersi sometimes for some shorteners, yes it has happened
14:42 🔗 W1nterFox has quit IRC (Remote host closed the connection)
16:05 🔗 Start has quit IRC (Quit: Disconnected.)
17:13 🔗 JesseW has joined #urlteam
17:13 🔗 svchfoo1 sets mode: +o JesseW
17:26 🔗 Start has joined #urlteam
17:28 🔗 JesseW WinterFox -- yeah, for the smaller incremental ones, I found it took us a few hours to catch up with all the shorturls added in a year.
17:29 🔗 JesseW For the non-incremental ones, it's harder, as we'd have to re-check the whole possibility space, which (assuming 4 or 5 characters) does take a few months.
17:30 🔗 JesseW has quit IRC (Leaving.)
18:19 🔗 asdf has joined #urlteam
18:38 🔗 Start has quit IRC (Quit: Disconnected.)
18:44 🔗 Start has joined #urlteam
18:52 🔗 Start has quit IRC (Ping timeout: 252 seconds)
19:08 🔗 Start has joined #urlteam
19:15 🔗 Start has quit IRC (Quit: Disconnected.)
19:22 🔗 Start has joined #urlteam
20:16 🔗 aaaaaaaaa has joined #urlteam
20:16 🔗 swebb sets mode: +o aaaaaaaaa
20:44 🔗 Start has quit IRC (Quit: Disconnected.)
20:47 🔗 phuzion I've got another url shortener to crawl
20:47 🔗 phuzion Oh, it appears to be on the wiki already, nevermind
20:48 🔗 Start has joined #urlteam
20:52 🔗 bwn has quit IRC (Read error: Operation timed out)
21:08 🔗 JW_work phuzion: which one?
21:08 🔗 phuzion JW_work: nig.gr
21:09 🔗 JW_work Oh, yeah, we've grabbed that multiple times.
21:10 🔗 phuzion JW_work: When's the last time it was crawled?
21:11 🔗 JW_work check the wiki page
21:12 🔗 phuzion Ah, right, apparently this month according to the wiki. Ok, cool.
21:13 🔗 JW_work btw, can I increase the load on da-gd? say, up to 10qps = 20 item queue?
21:13 🔗 JW_work have you heard any complaints from your friend?
21:16 🔗 phuzion I asked, but he didn't respond. I'd say speed it up a bit, and I'll let you know if he hollers
21:17 🔗 JW_work ok, I'll boost it to 20 item queue tonight
21:20 🔗 phuzion ok, if he shouts, I'll let you or someone else know
21:21 🔗 JW_work me, arkiver or chfoo
21:21 🔗 phuzion got it
21:26 🔗 bwn has joined #urlteam
21:55 🔗 phuzion Is there a relatively easy method to test that a URL shortener is easily crawlable before submitting it to the wiki?
21:57 🔗 JW_work Please add them to the wiki no matter how easy or hard they are to crawl — I want to make that wiki page as complete a directory of ALL THE SHORTENERS as we can get.
21:57 🔗 phuzion Fair enough.
21:57 🔗 JW_work That said, here's what I do to research shorteners:
22:00 🔗 JW_work 1) Check what the homepage looks like — is there an easy way to create a shorturl yourself? If so, make one, and include it on the wiki entry. If not, note that.
22:00 🔗 JW_work 2) Do a web search for examples of the shortener's short URLs. If you find one, include it.
22:01 🔗 JW_work 3) If you've found an example, run `curl -I http://the.short.url.you.found` and look at what HTTP status code it returns (and if it gives the destination in the Location: header).
22:02 🔗 JW_work 4) Do curl -I on a non-existing shorcode, and record what that HTTP status code is.
22:02 🔗 JW_work Also, when looking at the homepage, if it happens to mention how many total shorturls they have, note that down.
22:03 🔗 JW_work That's about it.
22:03 🔗 phuzion Awesome. Thanks. Want me to document that on the wiki page?
22:03 🔗 JW_work Yes please!
22:05 🔗 aaaaaaaaa another good idea is to check which ip addresses it resolves to. Lets you know if they just have a different domain name for an existing shortner.
22:06 🔗 JW_work Ah, good point. Please mention that, too.
22:07 🔗 JW_work But even if it's the same IP, it may be a different shortener — it's good to check a few shortcodes and make sure they are consistent, too.
22:16 🔗 Start has quit IRC (Quit: Disconnected.)
22:23 🔗 phuzion In regards to our work on da.gd: "I haven't noticed anything too crazy"
22:31 🔗 cechk01 has joined #urlteam
22:34 🔗 JW_work good
22:37 🔗 phuzion You didn't crank up the speed on it yet, have you?
22:39 🔗 JW_work nope
22:41 🔗 phuzion ok, was just wondering.
22:46 🔗 phuzion JW_work: I've documented the basics, if there's anything else on there that I should add, let me know. http://archiveteam.org/index.php?title=URLTeam#Researching_URL_Shorteners
22:47 🔗 JW_work a working shorturl is *not* required
22:48 🔗 JW_work I'm glad for people to add a domain name they /think/ might at some time in the past have given out shorturls, even if they can't find one.
22:49 🔗 JW_work otherwise, it looks great!
22:49 🔗 JW_work thanks very much for writing it up onto the wiki
22:50 🔗 JW_work you might also mention keeping the Alive list alphabetized, and including the current date
22:50 🔗 JW_work (which you can do by putting 5 tildes, ~~~~~)
22:55 🔗 phuzion JW_work: http://archiveteam.org/index.php?title=URLTeam&diff=24930&oldid=24929
22:56 🔗 * phuzion heads out
22:56 🔗 JW_work looks good, thanks again
23:22 🔗 Start has joined #urlteam
23:22 🔗 WinterFox has joined #urlteam
23:37 🔗 bwn_ has joined #urlteam
23:42 🔗 Fusl has quit IRC (Ping timeout: 255 seconds)
23:42 🔗 Fusl has joined #urlteam
23:44 🔗 bwn has quit IRC (Read error: Operation timed out)
23:48 🔗 svchfoo1 has quit IRC (Ping timeout: 369 seconds)
23:54 🔗 joepie91 has quit IRC (Ping timeout: 369 seconds)
23:56 🔗 W1nterFox has joined #urlteam
23:56 🔗 joepie91 has joined #urlteam
23:56 🔗 svchfoo3 sets mode: +o joepie91
23:56 🔗 WinterFox has quit IRC (Ping timeout: 1221 seconds)

irclogger-viewer