| Time |
Nickname |
Message |
|
00:13
🔗
|
|
Start has joined #urlteam |
|
00:37
🔗
|
|
WinterFox has joined #urlteam |
|
01:57
🔗
|
cybersec |
Is it worth attempting to map "private" URL shorteners for the project? |
|
01:58
🔗
|
cybersec |
I just noticed a popular japanese goods website, JList, uses jli.st as a short URL in all of their public social media posts in order to link to their website, jlist.com |
|
01:58
🔗
|
cybersec |
but is it worth archiving if the links are all made by jlist for their social media accounts? |
|
02:01
🔗
|
cybersec |
upon further inspection, it appears to be utilizing bit.ly as a backend |
|
02:02
🔗
|
cybersec |
for example, http://jli.st/1XU5gSE works (it was a URL generated by them) |
|
02:02
🔗
|
cybersec |
but if you try something random like http://jli.st/sugoi it redirects to a bit.ly error page |
|
02:09
🔗
|
|
cechk01 has quit IRC (Read error: Connection reset by peer) |
|
02:48
🔗
|
|
W1nterFox has joined #urlteam |
|
02:53
🔗
|
|
WinterFox has quit IRC (Read error: Operation timed out) |
|
03:29
🔗
|
|
JesseW has joined #urlteam |
|
03:30
🔗
|
|
svchfoo1 sets mode: +o JesseW |
|
03:42
🔗
|
|
bwn has quit IRC (Ping timeout: 606 seconds) |
|
03:45
🔗
|
JesseW |
!igset 1cm1ynbk9mm2wnd0o67e3k74p forums |
|
03:45
🔗
|
JesseW |
oops, wrong channel |
|
04:09
🔗
|
JesseW |
cybersec: please do add such private shorteners to the wiki page (under http://archiveteam.org/index.php?title=URLTeam#.22Official.22_shorteners ) and bit.ly aliases in particular under their own section, http://archiveteam.org/index.php?title=URLTeam#bit.ly_aliases |
|
04:10
🔗
|
JesseW |
I certainly think of them as lower priority than multi-source shorteners -- but if/when we have spare time/energy, I think it's certainly worth grabbing them, as they are links that can break if the site decides to stop supporting the shortener (or if the site as a whole goes away, they can provide a useful way to archive *it*) |
|
04:21
🔗
|
JesseW |
chfoo -- thanks for merging my PRs! |
|
04:46
🔗
|
|
aaaaaaaaa has quit IRC (Leaving) |
|
04:50
🔗
|
chfoo |
no problem |
|
04:52
🔗
|
JesseW |
this should let me add a number of shorteners that I couldn't before |
|
05:25
🔗
|
JesseW |
started da-gd at a very low rate (2 queue) per phuzion's request |
|
05:26
🔗
|
JesseW |
chfoo -- http://tracker.archiveteam.org:1337/status is giving a 500 error |
|
05:27
🔗
|
JesseW |
could you let me know what's blowing up? |
|
05:28
🔗
|
JesseW |
the api/live_stats websocket seems to still work... |
|
05:29
🔗
|
chfoo |
JesseW: _tt_tmp = project.location_regex # status.html:55 (via index.html:32, base.html:13) |
|
05:29
🔗
|
JesseW |
dammit, how did my regex miss that one... :-( |
|
05:30
🔗
|
JesseW |
PR coming asap |
|
05:32
🔗
|
JesseW |
https://github.com/ArchiveTeam/terroroftinytown/pull/52 |
|
05:34
🔗
|
JesseW |
chfoo: ping |
|
05:35
🔗
|
chfoo |
i also added you to the urlteam team so you can just commit to develop for changes that don't require too much review |
|
05:35
🔗
|
JesseW |
ah, cool, thanks |
|
05:35
🔗
|
JesseW |
I think this is one of those. :-) |
|
05:37
🔗
|
JesseW |
hm, I don't seem to be able to merge that PR... |
|
05:38
🔗
|
JesseW |
I'm not listed on https://github.com/orgs/ArchiveTeam/people ... |
|
05:39
🔗
|
chfoo |
you have to list yourself as public |
|
05:39
🔗
|
chfoo |
oh, you have to accept the invite first |
|
05:40
🔗
|
JesseW |
ah, that would be it. :-) |
|
05:41
🔗
|
JesseW |
and merged |
|
05:42
🔗
|
JesseW |
so are changes to develop (or master) automatically deployed, or? |
|
05:45
🔗
|
chfoo |
no |
|
05:47
🔗
|
JesseW |
ok. let me know when you deploy the fix to /status, then. :-) |
|
05:59
🔗
|
|
dashcloud has quit IRC (Read error: Operation timed out) |
|
06:02
🔗
|
|
dashcloud has joined #urlteam |
|
06:02
🔗
|
|
svchfoo1 sets mode: +o dashcloud |
|
06:03
🔗
|
JesseW |
2-gp started, and working! |
|
07:14
🔗
|
JesseW |
da-gd has it's first result! (after only 10,950 searches) |
|
07:14
🔗
|
JesseW |
er, its first |
|
07:15
🔗
|
JesseW |
http://da.gd/102J9 |
|
07:22
🔗
|
JesseW |
starting 2.ly, too. |
|
07:28
🔗
|
JesseW |
ah, I see /status is fixed |
|
07:39
🔗
|
|
bwn has joined #urlteam |
|
07:48
🔗
|
|
Infreq has joined #urlteam |
|
07:53
🔗
|
* |
JesseW is now going through the catalog of 301works restricted files, and identifying ones for dead shorteners, which, according to the 301works rules, should be made available. |
|
07:53
🔗
|
JesseW |
Hopefully we can get IA to do so. :-) |
|
08:02
🔗
|
|
JesseW has quit IRC (Leaving.) |
|
10:35
🔗
|
W1nterFox |
Are we archiving links faster then people make new ones? |
|
10:43
🔗
|
ersi |
sometimes for some shorteners, yes it has happened |
|
14:42
🔗
|
|
W1nterFox has quit IRC (Remote host closed the connection) |
|
16:05
🔗
|
|
Start has quit IRC (Quit: Disconnected.) |
|
17:13
🔗
|
|
JesseW has joined #urlteam |
|
17:13
🔗
|
|
svchfoo1 sets mode: +o JesseW |
|
17:26
🔗
|
|
Start has joined #urlteam |
|
17:28
🔗
|
JesseW |
WinterFox -- yeah, for the smaller incremental ones, I found it took us a few hours to catch up with all the shorturls added in a year. |
|
17:29
🔗
|
JesseW |
For the non-incremental ones, it's harder, as we'd have to re-check the whole possibility space, which (assuming 4 or 5 characters) does take a few months. |
|
17:30
🔗
|
|
JesseW has quit IRC (Leaving.) |
|
18:19
🔗
|
|
asdf has joined #urlteam |
|
18:38
🔗
|
|
Start has quit IRC (Quit: Disconnected.) |
|
18:44
🔗
|
|
Start has joined #urlteam |
|
18:52
🔗
|
|
Start has quit IRC (Ping timeout: 252 seconds) |
|
19:08
🔗
|
|
Start has joined #urlteam |
|
19:15
🔗
|
|
Start has quit IRC (Quit: Disconnected.) |
|
19:22
🔗
|
|
Start has joined #urlteam |
|
20:16
🔗
|
|
aaaaaaaaa has joined #urlteam |
|
20:16
🔗
|
|
swebb sets mode: +o aaaaaaaaa |
|
20:44
🔗
|
|
Start has quit IRC (Quit: Disconnected.) |
|
20:47
🔗
|
phuzion |
I've got another url shortener to crawl |
|
20:47
🔗
|
phuzion |
Oh, it appears to be on the wiki already, nevermind |
|
20:48
🔗
|
|
Start has joined #urlteam |
|
20:52
🔗
|
|
bwn has quit IRC (Read error: Operation timed out) |
|
21:08
🔗
|
JW_work |
phuzion: which one? |
|
21:08
🔗
|
phuzion |
JW_work: nig.gr |
|
21:09
🔗
|
JW_work |
Oh, yeah, we've grabbed that multiple times. |
|
21:10
🔗
|
phuzion |
JW_work: When's the last time it was crawled? |
|
21:11
🔗
|
JW_work |
check the wiki page |
|
21:12
🔗
|
phuzion |
Ah, right, apparently this month according to the wiki. Ok, cool. |
|
21:13
🔗
|
JW_work |
btw, can I increase the load on da-gd? say, up to 10qps = 20 item queue? |
|
21:13
🔗
|
JW_work |
have you heard any complaints from your friend? |
|
21:16
🔗
|
phuzion |
I asked, but he didn't respond. I'd say speed it up a bit, and I'll let you know if he hollers |
|
21:17
🔗
|
JW_work |
ok, I'll boost it to 20 item queue tonight |
|
21:20
🔗
|
phuzion |
ok, if he shouts, I'll let you or someone else know |
|
21:21
🔗
|
JW_work |
me, arkiver or chfoo |
|
21:21
🔗
|
phuzion |
got it |
|
21:26
🔗
|
|
bwn has joined #urlteam |
|
21:55
🔗
|
phuzion |
Is there a relatively easy method to test that a URL shortener is easily crawlable before submitting it to the wiki? |
|
21:57
🔗
|
JW_work |
Please add them to the wiki no matter how easy or hard they are to crawl — I want to make that wiki page as complete a directory of ALL THE SHORTENERS as we can get. |
|
21:57
🔗
|
phuzion |
Fair enough. |
|
21:57
🔗
|
JW_work |
That said, here's what I do to research shorteners: |
|
22:00
🔗
|
JW_work |
1) Check what the homepage looks like — is there an easy way to create a shorturl yourself? If so, make one, and include it on the wiki entry. If not, note that. |
|
22:00
🔗
|
JW_work |
2) Do a web search for examples of the shortener's short URLs. If you find one, include it. |
|
22:01
🔗
|
JW_work |
3) If you've found an example, run `curl -I http://the.short.url.you.found` and look at what HTTP status code it returns (and if it gives the destination in the Location: header). |
|
22:02
🔗
|
JW_work |
4) Do curl -I on a non-existing shorcode, and record what that HTTP status code is. |
|
22:02
🔗
|
JW_work |
Also, when looking at the homepage, if it happens to mention how many total shorturls they have, note that down. |
|
22:03
🔗
|
JW_work |
That's about it. |
|
22:03
🔗
|
phuzion |
Awesome. Thanks. Want me to document that on the wiki page? |
|
22:03
🔗
|
JW_work |
Yes please! |
|
22:05
🔗
|
aaaaaaaaa |
another good idea is to check which ip addresses it resolves to. Lets you know if they just have a different domain name for an existing shortner. |
|
22:06
🔗
|
JW_work |
Ah, good point. Please mention that, too. |
|
22:07
🔗
|
JW_work |
But even if it's the same IP, it may be a different shortener — it's good to check a few shortcodes and make sure they are consistent, too. |
|
22:16
🔗
|
|
Start has quit IRC (Quit: Disconnected.) |
|
22:23
🔗
|
phuzion |
In regards to our work on da.gd: "I haven't noticed anything too crazy" |
|
22:31
🔗
|
|
cechk01 has joined #urlteam |
|
22:34
🔗
|
JW_work |
good |
|
22:37
🔗
|
phuzion |
You didn't crank up the speed on it yet, have you? |
|
22:39
🔗
|
JW_work |
nope |
|
22:41
🔗
|
phuzion |
ok, was just wondering. |
|
22:46
🔗
|
phuzion |
JW_work: I've documented the basics, if there's anything else on there that I should add, let me know. http://archiveteam.org/index.php?title=URLTeam#Researching_URL_Shorteners |
|
22:47
🔗
|
JW_work |
a working shorturl is *not* required |
|
22:48
🔗
|
JW_work |
I'm glad for people to add a domain name they /think/ might at some time in the past have given out shorturls, even if they can't find one. |
|
22:49
🔗
|
JW_work |
otherwise, it looks great! |
|
22:49
🔗
|
JW_work |
thanks very much for writing it up onto the wiki |
|
22:50
🔗
|
JW_work |
you might also mention keeping the Alive list alphabetized, and including the current date |
|
22:50
🔗
|
JW_work |
(which you can do by putting 5 tildes, ~~~~~) |
|
22:55
🔗
|
phuzion |
JW_work: http://archiveteam.org/index.php?title=URLTeam&diff=24930&oldid=24929 |
|
22:56
🔗
|
* |
phuzion heads out |
|
22:56
🔗
|
JW_work |
looks good, thanks again |
|
23:22
🔗
|
|
Start has joined #urlteam |
|
23:22
🔗
|
|
WinterFox has joined #urlteam |
|
23:37
🔗
|
|
bwn_ has joined #urlteam |
|
23:42
🔗
|
|
Fusl has quit IRC (Ping timeout: 255 seconds) |
|
23:42
🔗
|
|
Fusl has joined #urlteam |
|
23:44
🔗
|
|
bwn has quit IRC (Read error: Operation timed out) |
|
23:48
🔗
|
|
svchfoo1 has quit IRC (Ping timeout: 369 seconds) |
|
23:54
🔗
|
|
joepie91 has quit IRC (Ping timeout: 369 seconds) |
|
23:56
🔗
|
|
W1nterFox has joined #urlteam |
|
23:56
🔗
|
|
joepie91 has joined #urlteam |
|
23:56
🔗
|
|
svchfoo3 sets mode: +o joepie91 |
|
23:56
🔗
|
|
WinterFox has quit IRC (Ping timeout: 1221 seconds) |