[01:13] i deeply regret the format "code|http://url..." [02:26] I don't see what's so terrible about it, personally. [02:28] the only design consideration at play was "looks good when scrolling up my screen at high speed" [02:35] sure, but I'm not aware of any particular horrors that have come up afterward... [02:35] fair enough [02:36] I mean, it might as well have been tab-separated, but nearly anything that processes csv can specify an arbitary delimiter [02:36] and aside from that, it seems pretty adequate [02:36] i suppoooose [02:36] the nested xz/zip is silly looking, but not actually a big hassle to work with [03:10] i think we should consider deprecating xz [03:11] https://www.nongnu.org/lzip/xz_inadequate.html [03:11] perhaps in favor of bzip or just plain textfiles-in-zip [03:11] the 20% extra data storage required is irritating but perhaps not worth it? [03:15] idk, w/e [04:42] *** odemg has quit IRC (Ping timeout: 265 seconds) [04:54] *** odemg has joined #urlteam [04:55] I'd prefer to encourage people to hack up something to make the dead shorteners available again (ideally in the Wayback Machine). [04:55] But if/when someone writes a replacement for terroroftinytown, sure, we should probably just use bzip (rather than zip+xz). [05:00] I'd read the xz_inadequate note before, but re-reading it now, yeah, I can see the point to patching terroroftinytown to stop using it. [05:00] I may look into that. [05:13] *** hook54321 has quit IRC (Quit: Connection closed for inactivity) [05:52] Somebody2: yeah someday we might want to look into creating warc's instead of our weird format [05:52] it'd be a lot bigger but the wayback could ingest it :) [05:52] (i've been vaguely suggesting this at anyone who will listen for a Long Time) [06:01] so have I, yes [06:01] but we can at least stop compressing with xz easily enough [06:42] *** odemg has quit IRC (Ping timeout: 265 seconds) [06:44] *** odemg has joined #urlteam [07:30] *** hook54321 has joined #urlteam [12:50] *** VADemon has joined #urlteam [12:56] URLTeam data is anyway quite small compared to many of our other projects. We probably grabbed more Facebook JS crap through ArchiveBot than the total URLTeam size. [12:57] WARCs would be great. I've been wondering why that wasn't used from the beginning, to be honest. Seems like the obvious choice. Maybe size was a concern there? [13:17] hm guys, there's a problem with u-to shortener: it clearly allows for - (dash) in shortened URLs, there's even an example on the wiki page, but the project settings dont have that symbol in the alphabet [13:17] https://tracker.archiveteam.org:1338/api/project_settings?name=u-to [14:13] *** hook54321 has quit IRC (Quit: Connection closed for inactivity) [14:51] *** hook54321 has joined #urlteam [15:05] lol [15:06] the reason we didn't grab warcs is we didn't think of it until after we'd started [15:06] and inertia [15:10] are we talking about warcs of just the redirect, or the page we're redirected to as well? [15:12] ¿porque no los dos? [15:15] I would like that, archiving the redirect targets as well. Perhaps that should be separate from the shortener crawling though. [15:16] urlteam 2, electric boogaloo [15:23] :-) [15:24] This is URLTeam 2 already though, apparently. :-P [15:24] ? [15:25] news to me [15:25] well i mean if you mean using the tracker, then, yes [15:25] hey [15:25] why don't we get rid of the tracker and replace it with a blockchain [15:28] Oh, sounds great. We'll have to work AI and Big Data into it though, otherwise it can't be good. [15:34] naturally [15:47] urlteam 2.1 [16:00] Since we seem to be brainstorming some future ideas, what's everyone's thoughts on a browser plugin that, for specified URL shorteners, it captures any visited shorturl, and sends that off to a list that we can then ingest into terroroftinytown to be checked? [16:01] Idk if Mozilla or Chrome would allow that, since it's essentially a narrowly-scoped browser history capture plugin, but it might be worth looking into. Could help us capture more URLs, more quickly. [16:03] Looks like if we are super upfront about it ("THIS EXTENSION WILL SEND CERTAIN URLS YOU VISIT TO ARCHIVETEAM") Mozilla might be OK with it. [16:18] hm [16:36] *** TheTrueBr has joined #urlteam [16:43] *** TheTrueBr has quit IRC (Quit: AndroIRC - Android IRC Client ( http://www.androirc.com )) [17:03] *** hook54321 has quit IRC (Quit: Connection closed for inactivity) [17:27] *** hook54321 has joined #urlteam [18:53] *** odemg has quit IRC (Ping timeout: 265 seconds) [18:56] *** mtntmnky_ is now known as mtntmnky [18:58] *** odemg has joined #urlteam [21:45] Maybe it's time for URLTeam3 with WARC support. [23:01] *** odemg has quit IRC (Ping timeout: 265 seconds) [23:05] *** odemg has joined #urlteam [23:55] t3: yes! [23:56] I don't think anyone disagrees with that. Who will implement it though? :-)