Time |
Nickname |
Message |
00:48
🔗
|
|
aaaaaaaa_ has joined #urlteam |
00:48
🔗
|
|
aaaaaaaaa has quit IRC (Read error: Connection reset by peer) |
00:48
🔗
|
|
swebb sets mode: +o aaaaaaaa_ |
00:49
🔗
|
|
aaaaaaaa_ is now known as aaaaaaaaa |
01:13
🔗
|
|
dashcloud has quit IRC (Remote host closed the connection) |
01:14
🔗
|
|
aaaaaaaa_ has joined #urlteam |
01:14
🔗
|
|
aaaaaaaaa has quit IRC (Read error: Connection reset by peer) |
01:14
🔗
|
|
swebb sets mode: +o aaaaaaaa_ |
01:18
🔗
|
|
aaaaaaaa_ has quit IRC (Read error: Connection reset by peer) |
01:18
🔗
|
|
aaaaaaaa_ has joined #urlteam |
01:18
🔗
|
|
swebb sets mode: +o aaaaaaaa_ |
01:30
🔗
|
|
aaaaaaaaa has joined #urlteam |
01:30
🔗
|
|
aaaaaaaa_ has quit IRC (Read error: Connection reset by peer) |
01:30
🔗
|
|
swebb sets mode: +o aaaaaaaaa |
01:57
🔗
|
|
Fletcher has joined #urlteam |
02:35
🔗
|
|
svchfoo3 has joined #urlteam |
02:35
🔗
|
|
svchfoo1 sets mode: +o svchfoo3 |
03:42
🔗
|
|
VADemon has quit IRC (left4dead) |
04:15
🔗
|
|
aaaaaaaaa has quit IRC (Leaving) |
05:00
🔗
|
|
Ctrl-S has joined #urlteam |
07:11
🔗
|
|
JesseW has joined #urlteam |
07:24
🔗
|
|
JesseW has quit IRC (Leaving.) |
08:38
🔗
|
|
zhongfu_ is now known as zhongfu |
09:51
🔗
|
|
chazchaz has joined #urlteam |
12:32
🔗
|
|
Smiley has quit IRC (Read error: Operation timed out) |
12:32
🔗
|
|
Coderjoe_ has quit IRC (Read error: Operation timed out) |
12:32
🔗
|
|
Coderjoe has joined #urlteam |
12:34
🔗
|
|
HCross has quit IRC (Read error: Operation timed out) |
12:41
🔗
|
|
Smiley has joined #urlteam |
12:42
🔗
|
|
Silvan has joined #urlteam |
12:43
🔗
|
|
dashcloud has joined #urlteam |
12:43
🔗
|
|
svchfoo3 sets mode: +o dashcloud |
12:45
🔗
|
|
SilSte has quit IRC (Ping timeout: 606 seconds) |
12:49
🔗
|
|
xmc has quit IRC (Ping timeout: 606 seconds) |
12:52
🔗
|
|
xmc has joined #urlteam |
12:52
🔗
|
|
swebb sets mode: +o xmc |
13:01
🔗
|
|
HCross has joined #urlteam |
16:10
🔗
|
|
SimpBrain has quit IRC (Ping timeout: 615 seconds) |
16:22
🔗
|
|
SimpBrain has joined #urlteam |
17:03
🔗
|
|
zerkalo has quit IRC (Write error: Broken pipe) |
17:04
🔗
|
|
phuzion has quit IRC (Read error: Operation timed out) |
17:04
🔗
|
|
phuzion has joined #urlteam |
17:04
🔗
|
|
Domin- has joined #urlteam |
17:05
🔗
|
|
Domin_ has quit IRC (Read error: Operation timed out) |
17:06
🔗
|
|
atlogbot has quit IRC (Read error: Operation timed out) |
17:06
🔗
|
|
chazchaz has quit IRC (Read error: Operation timed out) |
17:07
🔗
|
|
Coderjoe has quit IRC (Read error: Operation timed out) |
17:07
🔗
|
|
svchfoo1 has quit IRC (Read error: Operation timed out) |
17:08
🔗
|
|
zerkalo has joined #urlteam |
17:11
🔗
|
|
SimpBrain has quit IRC (Read error: Operation timed out) |
17:12
🔗
|
|
SimpBrain has joined #urlteam |
17:18
🔗
|
|
Coderjoe has joined #urlteam |
17:33
🔗
|
|
atlogbot has joined #urlteam |
17:33
🔗
|
|
svchfoo1 has joined #urlteam |
17:33
🔗
|
|
svchfoo3 sets mode: +o svchfoo1 |
17:45
🔗
|
|
chazchaz has joined #urlteam |
18:43
🔗
|
|
aaaaaaaaa has joined #urlteam |
18:43
🔗
|
|
swebb sets mode: +o aaaaaaaaa |
18:59
🔗
|
|
JesseW has joined #urlteam |
19:33
🔗
|
|
bzc6p has joined #urlteam |
19:33
🔗
|
|
swebb sets mode: +o bzc6p |
19:34
🔗
|
bzc6p |
So, in response to JesseW |
19:35
🔗
|
bzc6p |
As far as I know, the URLTeam v2 project is handled by chfoo. He keeps track of what has been done, and I think he is the one who regularly exports, or if it is automated, then he is who takes care of its regular operation. |
19:36
🔗
|
bzc6p |
AFAIK again, the URL shortener services are scraped regularly, once in a while. That is, if we scraped e.g. 1r.hu in Dec 2014, it will be scraped again in a year or two. |
19:36
🔗
|
JesseW |
ah, I didn't realize they were intermittently turned on |
19:37
🔗
|
JesseW |
I knew chfoo was the maintainer |
19:37
🔗
|
bzc6p |
Regarding adding new shorteners: I think chfoo regularly selects some new ones from the wiki list and inputs it into the tracker when there is not enough stuff to do. |
19:38
🔗
|
bzc6p |
I bet, because 1r.hu was added to the wiki by me, and I did nothing to make that be scraped by URLTeam. |
19:38
🔗
|
JesseW |
ah, that makes sense |
19:38
🔗
|
JesseW |
hm, I'll look in the wiki to see when you added it. |
19:38
🔗
|
bzc6p |
So if one wants to add new shorteners, they should just add it to the wiki, and they will be seen and added one day, I guess. |
19:39
🔗
|
JesseW |
I'll also add a column to the table specifying when various ones were scraped, which should make it easier to see that there is progress |
19:39
🔗
|
JesseW |
thanks for the answers, bzc6p! |
19:40
🔗
|
bzc6p |
Regarding urlte.am, it may be probably held by SketchCow. It indeed needs some update. |
19:43
🔗
|
JesseW |
It seems like it would be better to just redirect it to the wiki |
19:45
🔗
|
bzc6p |
One should be very careful when poking SketchCow with "Please update this". |
19:46
🔗
|
bzc6p |
So I think I answered all the questions. If I were wrong, I'm gladly corrected. |
19:46
🔗
|
bzc6p |
It seems I added 1r.hu to the wiki in August 2014. |
19:51
🔗
|
JesseW |
hm |
19:51
🔗
|
JesseW |
yep, I think you answered all my questions. :-) |
19:57
🔗
|
|
Smiley has quit IRC (Remote host closed the connection) |
19:58
🔗
|
JesseW |
here's the distribution of dumps by project: http://0bin.net/paste/Zq93JzmZhY0YRAZ9#wtNYGOcDE+1rj5X29+pivT5XFCAO4quGKap1fW9pPQG |
20:00
🔗
|
JesseW |
i.e. how many dumps contain data from each project. there are 10 that are only in one dump, 385 total dumps (I don't have a few donwloaded, because the IA's torrent-generator is borken) |
20:00
🔗
|
JesseW |
isgd_6 is in 367 of the 385 dumps. |
20:04
🔗
|
|
bzc6p_ has joined #urlteam |
20:04
🔗
|
|
swebb sets mode: +o bzc6p_ |
20:05
🔗
|
|
bzc6p has quit IRC (Read error: Operation timed out) |
20:08
🔗
|
|
bzc6p_ has left |
20:10
🔗
|
|
Smiley has joined #urlteam |
20:40
🔗
|
|
VADemon has joined #urlteam |
20:42
🔗
|
|
Start has quit IRC (Read error: Connection reset by peer) |
20:43
🔗
|
|
Start has joined #urlteam |
21:41
🔗
|
|
JesseW has quit IRC (Leaving.) |
22:44
🔗
|
|
Atluxity has joined #urlteam |
22:44
🔗
|
Atluxity |
is the urlteam throtled to ~380 scans pr second? |
23:04
🔗
|
|
JesseW has joined #urlteam |
23:42
🔗
|
Atluxity |
JesseW: do you know? |
23:43
🔗
|
JesseW |
what? |
23:43
🔗
|
Atluxity |
is the urlteam throtled to ~380 scans pr second? |
23:43
🔗
|
JesseW |
I don't know, sorry |
23:43
🔗
|
Atluxity |
ok, ty |