Time |
Nickname |
Message |
01:13
π
|
astrid |
i deeply regret the format "code|http://url..." |
02:26
π
|
Somebody2 |
I don't see what's so terrible about it, personally. |
02:28
π
|
astrid |
the only design consideration at play was "looks good when scrolling up my screen at high speed" |
02:35
π
|
Somebody2 |
sure, but I'm not aware of any particular horrors that have come up afterward... |
02:35
π
|
astrid |
fair enough |
02:36
π
|
Somebody2 |
I mean, it might as well have been tab-separated, but nearly anything that processes csv can specify an arbitary delimiter |
02:36
π
|
Somebody2 |
and aside from that, it seems pretty adequate |
02:36
π
|
astrid |
i suppoooose |
02:36
π
|
Somebody2 |
the nested xz/zip is silly looking, but not actually a big hassle to work with |
03:10
π
|
astrid |
i think we should consider deprecating xz |
03:11
π
|
astrid |
https://www.nongnu.org/lzip/xz_inadequate.html |
03:11
π
|
astrid |
perhaps in favor of bzip or just plain textfiles-in-zip |
03:11
π
|
astrid |
the 20% extra data storage required is irritating but perhaps not worth it? |
03:15
π
|
astrid |
idk, w/e |
04:42
π
|
|
odemg has quit IRC (Ping timeout: 265 seconds) |
04:54
π
|
|
odemg has joined #urlteam |
04:55
π
|
Somebody2 |
I'd prefer to encourage people to hack up something to make the dead shorteners available again (ideally in the Wayback Machine). |
04:55
π
|
Somebody2 |
But if/when someone writes a replacement for terroroftinytown, sure, we should probably just use bzip (rather than zip+xz). |
05:00
π
|
Somebody2 |
I'd read the xz_inadequate note before, but re-reading it now, yeah, I can see the point to patching terroroftinytown to stop using it. |
05:00
π
|
Somebody2 |
I may look into that. |
05:13
π
|
|
hook54321 has quit IRC (Quit: Connection closed for inactivity) |
05:52
π
|
astrid |
Somebody2: yeah someday we might want to look into creating warc's instead of our weird format |
05:52
π
|
astrid |
it'd be a lot bigger but the wayback could ingest it :) |
05:52
π
|
astrid |
(i've been vaguely suggesting this at anyone who will listen for a Long Time) |
06:01
π
|
Somebody2 |
so have I, yes |
06:01
π
|
Somebody2 |
but we can at least stop compressing with xz easily enough |
06:42
π
|
|
odemg has quit IRC (Ping timeout: 265 seconds) |
06:44
π
|
|
odemg has joined #urlteam |
07:30
π
|
|
hook54321 has joined #urlteam |
12:50
π
|
|
VADemon has joined #urlteam |
12:56
π
|
JAA |
URLTeam data is anyway quite small compared to many of our other projects. We probably grabbed more Facebook JS crap through ArchiveBot than the total URLTeam size. |
12:57
π
|
JAA |
WARCs would be great. I've been wondering why that wasn't used from the beginning, to be honest. Seems like the obvious choice. Maybe size was a concern there? |
13:17
π
|
VADemon |
hm guys, there's a problem with u-to shortener: it clearly allows for - (dash) in shortened URLs, there's even an example on the wiki page, but the project settings dont have that symbol in the alphabet |
13:17
π
|
VADemon |
https://tracker.archiveteam.org:1338/api/project_settings?name=u-to |
14:13
π
|
|
hook54321 has quit IRC (Quit: Connection closed for inactivity) |
14:51
π
|
|
hook54321 has joined #urlteam |
15:05
π
|
astrid |
lol |
15:06
π
|
astrid |
the reason we didn't grab warcs is we didn't think of it until after we'd started |
15:06
π
|
astrid |
and inertia |
15:10
π
|
phuzion |
are we talking about warcs of just the redirect, or the page we're redirected to as well? |
15:12
π
|
Kaz |
ΒΏporque no los dos? |
15:15
π
|
JAA |
I would like that, archiving the redirect targets as well. Perhaps that should be separate from the shortener crawling though. |
15:16
π
|
astrid |
urlteam 2, electric boogaloo |
15:23
π
|
JAA |
:-) |
15:24
π
|
JAA |
This is URLTeam 2 already though, apparently. :-P |
15:24
π
|
astrid |
? |
15:25
π
|
astrid |
news to me |
15:25
π
|
astrid |
well i mean if you mean using the tracker, then, yes |
15:25
π
|
astrid |
hey |
15:25
π
|
astrid |
why don't we get rid of the tracker and replace it with a blockchain |
15:28
π
|
JAA |
Oh, sounds great. We'll have to work AI and Big Data into it though, otherwise it can't be good. |
15:34
π
|
astrid |
naturally |
15:47
π
|
psi |
urlteam 2.1 |
16:00
π
|
phuzion |
Since we seem to be brainstorming some future ideas, what's everyone's thoughts on a browser plugin that, for specified URL shorteners, it captures any visited shorturl, and sends that off to a list that we can then ingest into terroroftinytown to be checked? |
16:01
π
|
phuzion |
Idk if Mozilla or Chrome would allow that, since it's essentially a narrowly-scoped browser history capture plugin, but it might be worth looking into. Could help us capture more URLs, more quickly. |
16:03
π
|
phuzion |
Looks like if we are super upfront about it ("THIS EXTENSION WILL SEND CERTAIN URLS YOU VISIT TO ARCHIVETEAM") Mozilla might be OK with it. |
16:18
π
|
astrid |
hm |
16:36
π
|
|
TheTrueBr has joined #urlteam |
16:43
π
|
|
TheTrueBr has quit IRC (Quit: AndroIRC - Android IRC Client ( http://www.androirc.com )) |
17:03
π
|
|
hook54321 has quit IRC (Quit: Connection closed for inactivity) |
17:27
π
|
|
hook54321 has joined #urlteam |
18:53
π
|
|
odemg has quit IRC (Ping timeout: 265 seconds) |
18:56
π
|
|
mtntmnky_ is now known as mtntmnky |
18:58
π
|
|
odemg has joined #urlteam |
21:45
π
|
t3 |
Maybe it's time for URLTeam3 with WARC support. |
23:01
π
|
|
odemg has quit IRC (Ping timeout: 265 seconds) |
23:05
π
|
|
odemg has joined #urlteam |
23:55
π
|
arkiver |
t3: yes! |
23:56
π
|
JAA |
I don't think anyone disagrees with that. Who will implement it though? :-) |