#urlteam 2019-01-07,Mon

↑back Search

Time Nickname Message
01:13 πŸ”— astrid i deeply regret the format "code|http://url..."
02:26 πŸ”— Somebody2 I don't see what's so terrible about it, personally.
02:28 πŸ”— astrid the only design consideration at play was "looks good when scrolling up my screen at high speed"
02:35 πŸ”— Somebody2 sure, but I'm not aware of any particular horrors that have come up afterward...
02:35 πŸ”— astrid fair enough
02:36 πŸ”— Somebody2 I mean, it might as well have been tab-separated, but nearly anything that processes csv can specify an arbitary delimiter
02:36 πŸ”— Somebody2 and aside from that, it seems pretty adequate
02:36 πŸ”— astrid i suppoooose
02:36 πŸ”— Somebody2 the nested xz/zip is silly looking, but not actually a big hassle to work with
03:10 πŸ”— astrid i think we should consider deprecating xz
03:11 πŸ”— astrid https://www.nongnu.org/lzip/xz_inadequate.html
03:11 πŸ”— astrid perhaps in favor of bzip or just plain textfiles-in-zip
03:11 πŸ”— astrid the 20% extra data storage required is irritating but perhaps not worth it?
03:15 πŸ”— astrid idk, w/e
04:42 πŸ”— odemg has quit IRC (Ping timeout: 265 seconds)
04:54 πŸ”— odemg has joined #urlteam
04:55 πŸ”— Somebody2 I'd prefer to encourage people to hack up something to make the dead shorteners available again (ideally in the Wayback Machine).
04:55 πŸ”— Somebody2 But if/when someone writes a replacement for terroroftinytown, sure, we should probably just use bzip (rather than zip+xz).
05:00 πŸ”— Somebody2 I'd read the xz_inadequate note before, but re-reading it now, yeah, I can see the point to patching terroroftinytown to stop using it.
05:00 πŸ”— Somebody2 I may look into that.
05:13 πŸ”— hook54321 has quit IRC (Quit: Connection closed for inactivity)
05:52 πŸ”— astrid Somebody2: yeah someday we might want to look into creating warc's instead of our weird format
05:52 πŸ”— astrid it'd be a lot bigger but the wayback could ingest it :)
05:52 πŸ”— astrid (i've been vaguely suggesting this at anyone who will listen for a Long Time)
06:01 πŸ”— Somebody2 so have I, yes
06:01 πŸ”— Somebody2 but we can at least stop compressing with xz easily enough
06:42 πŸ”— odemg has quit IRC (Ping timeout: 265 seconds)
06:44 πŸ”— odemg has joined #urlteam
07:30 πŸ”— hook54321 has joined #urlteam
12:50 πŸ”— VADemon has joined #urlteam
12:56 πŸ”— JAA URLTeam data is anyway quite small compared to many of our other projects. We probably grabbed more Facebook JS crap through ArchiveBot than the total URLTeam size.
12:57 πŸ”— JAA WARCs would be great. I've been wondering why that wasn't used from the beginning, to be honest. Seems like the obvious choice. Maybe size was a concern there?
13:17 πŸ”— VADemon hm guys, there's a problem with u-to shortener: it clearly allows for - (dash) in shortened URLs, there's even an example on the wiki page, but the project settings dont have that symbol in the alphabet
13:17 πŸ”— VADemon https://tracker.archiveteam.org:1338/api/project_settings?name=u-to
14:13 πŸ”— hook54321 has quit IRC (Quit: Connection closed for inactivity)
14:51 πŸ”— hook54321 has joined #urlteam
15:05 πŸ”— astrid lol
15:06 πŸ”— astrid the reason we didn't grab warcs is we didn't think of it until after we'd started
15:06 πŸ”— astrid and inertia
15:10 πŸ”— phuzion are we talking about warcs of just the redirect, or the page we're redirected to as well?
15:12 πŸ”— Kaz ΒΏporque no los dos?
15:15 πŸ”— JAA I would like that, archiving the redirect targets as well. Perhaps that should be separate from the shortener crawling though.
15:16 πŸ”— astrid urlteam 2, electric boogaloo
15:23 πŸ”— JAA :-)
15:24 πŸ”— JAA This is URLTeam 2 already though, apparently. :-P
15:24 πŸ”— astrid ?
15:25 πŸ”— astrid news to me
15:25 πŸ”— astrid well i mean if you mean using the tracker, then, yes
15:25 πŸ”— astrid hey
15:25 πŸ”— astrid why don't we get rid of the tracker and replace it with a blockchain
15:28 πŸ”— JAA Oh, sounds great. We'll have to work AI and Big Data into it though, otherwise it can't be good.
15:34 πŸ”— astrid naturally
15:47 πŸ”— psi urlteam 2.1
16:00 πŸ”— phuzion Since we seem to be brainstorming some future ideas, what's everyone's thoughts on a browser plugin that, for specified URL shorteners, it captures any visited shorturl, and sends that off to a list that we can then ingest into terroroftinytown to be checked?
16:01 πŸ”— phuzion Idk if Mozilla or Chrome would allow that, since it's essentially a narrowly-scoped browser history capture plugin, but it might be worth looking into. Could help us capture more URLs, more quickly.
16:03 πŸ”— phuzion Looks like if we are super upfront about it ("THIS EXTENSION WILL SEND CERTAIN URLS YOU VISIT TO ARCHIVETEAM") Mozilla might be OK with it.
16:18 πŸ”— astrid hm
16:36 πŸ”— TheTrueBr has joined #urlteam
16:43 πŸ”— TheTrueBr has quit IRC (Quit: AndroIRC - Android IRC Client ( http://www.androirc.com ))
17:03 πŸ”— hook54321 has quit IRC (Quit: Connection closed for inactivity)
17:27 πŸ”— hook54321 has joined #urlteam
18:53 πŸ”— odemg has quit IRC (Ping timeout: 265 seconds)
18:56 πŸ”— mtntmnky_ is now known as mtntmnky
18:58 πŸ”— odemg has joined #urlteam
21:45 πŸ”— t3 Maybe it's time for URLTeam3 with WARC support.
23:01 πŸ”— odemg has quit IRC (Ping timeout: 265 seconds)
23:05 πŸ”— odemg has joined #urlteam
23:55 πŸ”— arkiver t3: yes!
23:56 πŸ”— JAA I don't think anyone disagrees with that. Who will implement it though? :-)

irclogger-viewer