Time |
Nickname |
Message |
00:13
🔗
|
|
Start has joined #urlteam |
00:37
🔗
|
|
WinterFox has joined #urlteam |
01:57
🔗
|
cybersec |
Is it worth attempting to map "private" URL shorteners for the project? |
01:58
🔗
|
cybersec |
I just noticed a popular japanese goods website, JList, uses jli.st as a short URL in all of their public social media posts in order to link to their website, jlist.com |
01:58
🔗
|
cybersec |
but is it worth archiving if the links are all made by jlist for their social media accounts? |
02:01
🔗
|
cybersec |
upon further inspection, it appears to be utilizing bit.ly as a backend |
02:02
🔗
|
cybersec |
for example, http://jli.st/1XU5gSE works (it was a URL generated by them) |
02:02
🔗
|
cybersec |
but if you try something random like http://jli.st/sugoi it redirects to a bit.ly error page |
02:09
🔗
|
|
cechk01 has quit IRC (Read error: Connection reset by peer) |
02:48
🔗
|
|
W1nterFox has joined #urlteam |
02:53
🔗
|
|
WinterFox has quit IRC (Read error: Operation timed out) |
03:29
🔗
|
|
JesseW has joined #urlteam |
03:30
🔗
|
|
svchfoo1 sets mode: +o JesseW |
03:42
🔗
|
|
bwn has quit IRC (Ping timeout: 606 seconds) |
03:45
🔗
|
JesseW |
!igset 1cm1ynbk9mm2wnd0o67e3k74p forums |
03:45
🔗
|
JesseW |
oops, wrong channel |
04:09
🔗
|
JesseW |
cybersec: please do add such private shorteners to the wiki page (under http://archiveteam.org/index.php?title=URLTeam#.22Official.22_shorteners ) and bit.ly aliases in particular under their own section, http://archiveteam.org/index.php?title=URLTeam#bit.ly_aliases |
04:10
🔗
|
JesseW |
I certainly think of them as lower priority than multi-source shorteners -- but if/when we have spare time/energy, I think it's certainly worth grabbing them, as they are links that can break if the site decides to stop supporting the shortener (or if the site as a whole goes away, they can provide a useful way to archive *it*) |
04:21
🔗
|
JesseW |
chfoo -- thanks for merging my PRs! |
04:46
🔗
|
|
aaaaaaaaa has quit IRC (Leaving) |
04:50
🔗
|
chfoo |
no problem |
04:52
🔗
|
JesseW |
this should let me add a number of shorteners that I couldn't before |
05:25
🔗
|
JesseW |
started da-gd at a very low rate (2 queue) per phuzion's request |
05:26
🔗
|
JesseW |
chfoo -- http://tracker.archiveteam.org:1337/status is giving a 500 error |
05:27
🔗
|
JesseW |
could you let me know what's blowing up? |
05:28
🔗
|
JesseW |
the api/live_stats websocket seems to still work... |
05:29
🔗
|
chfoo |
JesseW: _tt_tmp = project.location_regex # status.html:55 (via index.html:32, base.html:13) |
05:29
🔗
|
JesseW |
dammit, how did my regex miss that one... :-( |
05:30
🔗
|
JesseW |
PR coming asap |
05:32
🔗
|
JesseW |
https://github.com/ArchiveTeam/terroroftinytown/pull/52 |
05:34
🔗
|
JesseW |
chfoo: ping |
05:35
🔗
|
chfoo |
i also added you to the urlteam team so you can just commit to develop for changes that don't require too much review |
05:35
🔗
|
JesseW |
ah, cool, thanks |
05:35
🔗
|
JesseW |
I think this is one of those. :-) |
05:37
🔗
|
JesseW |
hm, I don't seem to be able to merge that PR... |
05:38
🔗
|
JesseW |
I'm not listed on https://github.com/orgs/ArchiveTeam/people ... |
05:39
🔗
|
chfoo |
you have to list yourself as public |
05:39
🔗
|
chfoo |
oh, you have to accept the invite first |
05:40
🔗
|
JesseW |
ah, that would be it. :-) |
05:41
🔗
|
JesseW |
and merged |
05:42
🔗
|
JesseW |
so are changes to develop (or master) automatically deployed, or? |
05:45
🔗
|
chfoo |
no |
05:47
🔗
|
JesseW |
ok. let me know when you deploy the fix to /status, then. :-) |
05:59
🔗
|
|
dashcloud has quit IRC (Read error: Operation timed out) |
06:02
🔗
|
|
dashcloud has joined #urlteam |
06:02
🔗
|
|
svchfoo1 sets mode: +o dashcloud |
06:03
🔗
|
JesseW |
2-gp started, and working! |
07:14
🔗
|
JesseW |
da-gd has it's first result! (after only 10,950 searches) |
07:14
🔗
|
JesseW |
er, its first |
07:15
🔗
|
JesseW |
http://da.gd/102J9 |
07:22
🔗
|
JesseW |
starting 2.ly, too. |
07:28
🔗
|
JesseW |
ah, I see /status is fixed |
07:39
🔗
|
|
bwn has joined #urlteam |
07:48
🔗
|
|
Infreq has joined #urlteam |
07:53
🔗
|
* |
JesseW is now going through the catalog of 301works restricted files, and identifying ones for dead shorteners, which, according to the 301works rules, should be made available. |
07:53
🔗
|
JesseW |
Hopefully we can get IA to do so. :-) |
08:02
🔗
|
|
JesseW has quit IRC (Leaving.) |
10:35
🔗
|
W1nterFox |
Are we archiving links faster then people make new ones? |
10:43
🔗
|
ersi |
sometimes for some shorteners, yes it has happened |
14:42
🔗
|
|
W1nterFox has quit IRC (Remote host closed the connection) |
16:05
🔗
|
|
Start has quit IRC (Quit: Disconnected.) |
17:13
🔗
|
|
JesseW has joined #urlteam |
17:13
🔗
|
|
svchfoo1 sets mode: +o JesseW |
17:26
🔗
|
|
Start has joined #urlteam |
17:28
🔗
|
JesseW |
WinterFox -- yeah, for the smaller incremental ones, I found it took us a few hours to catch up with all the shorturls added in a year. |
17:29
🔗
|
JesseW |
For the non-incremental ones, it's harder, as we'd have to re-check the whole possibility space, which (assuming 4 or 5 characters) does take a few months. |
17:30
🔗
|
|
JesseW has quit IRC (Leaving.) |
18:19
🔗
|
|
asdf has joined #urlteam |
18:38
🔗
|
|
Start has quit IRC (Quit: Disconnected.) |
18:44
🔗
|
|
Start has joined #urlteam |
18:52
🔗
|
|
Start has quit IRC (Ping timeout: 252 seconds) |
19:08
🔗
|
|
Start has joined #urlteam |
19:15
🔗
|
|
Start has quit IRC (Quit: Disconnected.) |
19:22
🔗
|
|
Start has joined #urlteam |
20:16
🔗
|
|
aaaaaaaaa has joined #urlteam |
20:16
🔗
|
|
swebb sets mode: +o aaaaaaaaa |
20:44
🔗
|
|
Start has quit IRC (Quit: Disconnected.) |
20:47
🔗
|
phuzion |
I've got another url shortener to crawl |
20:47
🔗
|
phuzion |
Oh, it appears to be on the wiki already, nevermind |
20:48
🔗
|
|
Start has joined #urlteam |
20:52
🔗
|
|
bwn has quit IRC (Read error: Operation timed out) |
21:08
🔗
|
JW_work |
phuzion: which one? |
21:08
🔗
|
phuzion |
JW_work: nig.gr |
21:09
🔗
|
JW_work |
Oh, yeah, we've grabbed that multiple times. |
21:10
🔗
|
phuzion |
JW_work: When's the last time it was crawled? |
21:11
🔗
|
JW_work |
check the wiki page |
21:12
🔗
|
phuzion |
Ah, right, apparently this month according to the wiki. Ok, cool. |
21:13
🔗
|
JW_work |
btw, can I increase the load on da-gd? say, up to 10qps = 20 item queue? |
21:13
🔗
|
JW_work |
have you heard any complaints from your friend? |
21:16
🔗
|
phuzion |
I asked, but he didn't respond. I'd say speed it up a bit, and I'll let you know if he hollers |
21:17
🔗
|
JW_work |
ok, I'll boost it to 20 item queue tonight |
21:20
🔗
|
phuzion |
ok, if he shouts, I'll let you or someone else know |
21:21
🔗
|
JW_work |
me, arkiver or chfoo |
21:21
🔗
|
phuzion |
got it |
21:26
🔗
|
|
bwn has joined #urlteam |
21:55
🔗
|
phuzion |
Is there a relatively easy method to test that a URL shortener is easily crawlable before submitting it to the wiki? |
21:57
🔗
|
JW_work |
Please add them to the wiki no matter how easy or hard they are to crawl — I want to make that wiki page as complete a directory of ALL THE SHORTENERS as we can get. |
21:57
🔗
|
phuzion |
Fair enough. |
21:57
🔗
|
JW_work |
That said, here's what I do to research shorteners: |
22:00
🔗
|
JW_work |
1) Check what the homepage looks like — is there an easy way to create a shorturl yourself? If so, make one, and include it on the wiki entry. If not, note that. |
22:00
🔗
|
JW_work |
2) Do a web search for examples of the shortener's short URLs. If you find one, include it. |
22:01
🔗
|
JW_work |
3) If you've found an example, run `curl -I http://the.short.url.you.found` and look at what HTTP status code it returns (and if it gives the destination in the Location: header). |
22:02
🔗
|
JW_work |
4) Do curl -I on a non-existing shorcode, and record what that HTTP status code is. |
22:02
🔗
|
JW_work |
Also, when looking at the homepage, if it happens to mention how many total shorturls they have, note that down. |
22:03
🔗
|
JW_work |
That's about it. |
22:03
🔗
|
phuzion |
Awesome. Thanks. Want me to document that on the wiki page? |
22:03
🔗
|
JW_work |
Yes please! |
22:05
🔗
|
aaaaaaaaa |
another good idea is to check which ip addresses it resolves to. Lets you know if they just have a different domain name for an existing shortner. |
22:06
🔗
|
JW_work |
Ah, good point. Please mention that, too. |
22:07
🔗
|
JW_work |
But even if it's the same IP, it may be a different shortener — it's good to check a few shortcodes and make sure they are consistent, too. |
22:16
🔗
|
|
Start has quit IRC (Quit: Disconnected.) |
22:23
🔗
|
phuzion |
In regards to our work on da.gd: "I haven't noticed anything too crazy" |
22:31
🔗
|
|
cechk01 has joined #urlteam |
22:34
🔗
|
JW_work |
good |
22:37
🔗
|
phuzion |
You didn't crank up the speed on it yet, have you? |
22:39
🔗
|
JW_work |
nope |
22:41
🔗
|
phuzion |
ok, was just wondering. |
22:46
🔗
|
phuzion |
JW_work: I've documented the basics, if there's anything else on there that I should add, let me know. http://archiveteam.org/index.php?title=URLTeam#Researching_URL_Shorteners |
22:47
🔗
|
JW_work |
a working shorturl is *not* required |
22:48
🔗
|
JW_work |
I'm glad for people to add a domain name they /think/ might at some time in the past have given out shorturls, even if they can't find one. |
22:49
🔗
|
JW_work |
otherwise, it looks great! |
22:49
🔗
|
JW_work |
thanks very much for writing it up onto the wiki |
22:50
🔗
|
JW_work |
you might also mention keeping the Alive list alphabetized, and including the current date |
22:50
🔗
|
JW_work |
(which you can do by putting 5 tildes, ~~~~~) |
22:55
🔗
|
phuzion |
JW_work: http://archiveteam.org/index.php?title=URLTeam&diff=24930&oldid=24929 |
22:56
🔗
|
* |
phuzion heads out |
22:56
🔗
|
JW_work |
looks good, thanks again |
23:22
🔗
|
|
Start has joined #urlteam |
23:22
🔗
|
|
WinterFox has joined #urlteam |
23:37
🔗
|
|
bwn_ has joined #urlteam |
23:42
🔗
|
|
Fusl has quit IRC (Ping timeout: 255 seconds) |
23:42
🔗
|
|
Fusl has joined #urlteam |
23:44
🔗
|
|
bwn has quit IRC (Read error: Operation timed out) |
23:48
🔗
|
|
svchfoo1 has quit IRC (Ping timeout: 369 seconds) |
23:54
🔗
|
|
joepie91 has quit IRC (Ping timeout: 369 seconds) |
23:56
🔗
|
|
W1nterFox has joined #urlteam |
23:56
🔗
|
|
joepie91 has joined #urlteam |
23:56
🔗
|
|
svchfoo3 sets mode: +o joepie91 |
23:56
🔗
|
|
WinterFox has quit IRC (Ping timeout: 1221 seconds) |