[00:01] Atluxity: JesseW: I can add you as admin tomorrow if you'd like [00:01] arkiver: sounds good. no hurry [00:01] ok! [00:02] JesseW: will also have a look tomorrow at adding more shorteners [00:02] excellent! it'd be good to write up some more documentation on how to identify the structure of a shortner [00:02] i.e. is it incremental or not, what's the alphabet, etc. [00:09] JesseW: I agree, will see about doing that [00:11] *** ersi has quit IRC (Read error: Operation timed out) [00:11] *** ersi_ has joined #urlteam [01:38] *** Coderjoe has quit IRC (Ping timeout: 255 seconds) [01:46] *** Coderjoe has joined #urlteam [02:29] *** FriarGius has quit IRC (Leaving) [02:29] *** FriarGius has joined #urlteam [02:37] *** JesseW has quit IRC (Leaving.) [03:00] *** Ctrl-S__ has joined #urlteam [03:08] *** JesseW has joined #urlteam [03:16] *** FriarGius has quit IRC (Leaving) [03:26] *** Start has quit IRC (Quit: Disconnected.) [03:29] *** Start has joined #urlteam [04:09] arkiver: Ok [04:23] *** aaaaaaaaa has quit IRC (Leaving) [05:15] *** svchfoo1 has quit IRC (Ping timeout: 369 seconds) [05:15] *** chazchaz has quit IRC (Ping timeout: 369 seconds) [05:16] *** atlogbot has quit IRC (Ping timeout: 369 seconds) [05:34] *** FriarGius has joined #urlteam [05:39] *** svchfoo1 has joined #urlteam [05:39] *** atlogbot has joined #urlteam [05:40] *** chazchaz has joined #urlteam [05:40] *** svchfoo3 sets mode: +o svchfoo1 [05:55] *** atlogbot has quit IRC (Read error: Operation timed out) [06:00] *** svchfoo1 has quit IRC (Ping timeout: 369 seconds) [06:00] *** chazchaz has quit IRC (Ping timeout: 369 seconds) [06:05] *** Deewiant has joined #urlteam [06:06] *** svchfoo1 has joined #urlteam [06:06] *** svchfoo3 sets mode: +o svchfoo1 [06:06] *** atlogbot has joined #urlteam [06:06] *** chazchaz has joined #urlteam [06:41] *** chazchaz has quit IRC (Read error: Operation timed out) [06:41] *** atlogbot has quit IRC (Ping timeout: 369 seconds) [06:49] *** svchfoo1 has quit IRC (Ping timeout: 369 seconds) [06:54] *** chazchaz has joined #urlteam [07:07] *** svchfoo1 has joined #urlteam [07:07] *** atlogbot has joined #urlteam [07:07] *** svchfoo3 sets mode: +o svchfoo1 [07:28] *** svchfoo1 has quit IRC (Ping timeout: 369 seconds) [07:35] *** atlogbot has quit IRC (Ping timeout: 369 seconds) [07:41] *** atlogbot has joined #urlteam [07:41] *** svchfoo1 has joined #urlteam [07:42] *** svchfoo3 sets mode: +o svchfoo1 [08:11] *** JesseW has quit IRC (Leaving.) [09:05] *** ersi_ is now known as ersi [09:05] *** svchfoo3 sets mode: +o ersi [10:54] *** Muad-Dib has joined #urlteam [14:00] *** FriarGius has quit IRC (Leaving) [14:09] *** jornane has joined #urlteam [14:14] hei, i've stumbled upon this project on the internets [14:14] I would like to help scraping, but I was wondering how alive this project is… The last torrent is from July 20th, 2013 and the next release is planned around January 2014 [14:15] jornane: we're always scraping URL shorteners. [14:16] If you want to get started with this, follow the instructions on this page to download a Warrior, and select URLTeam as your project: http://archiveteam.org/index.php?title=Warrior [14:17] I've read up on the possibilities, I was planning on running this on an ESXi box on a clean shared gigabit line [14:17] Alternatively, if you have a *nix system available, you can run the urlteam pipeline directly, the instructions are here on the git repo: https://github.com/ArchiveTeam/terroroftinytown-client-grab [14:17] ah that might be easier [14:18] Depends on what your definition of easy is. [14:18] If you've already got the ESX box, it might be easier to just throw the .OVA file at it and boot it. [14:19] but i'm still wondering what happend to the release-cycle ;) [14:19] I'm not exactly sure when the torrents get released, to be honest. [14:19] But I can assure you this project is still active [14:22] so is it possible to see the progress somewhere else? I checked http://www.archiveteam.org/index.php?title=URLTeam, but it refers to the torrents, and some url shorteners are published externally on IA [14:30] jornane: If you want realtime data, check the tracker in the topic (last link) [15:17] *** Start has quit IRC (Quit: Disconnected.) [15:17] URLTeam releases also end up in IA https://archive.org/search.php?query=URLTeam+Release&sort=date [15:42] *** Start has joined #urlteam [17:01] *** JesseW has joined #urlteam [17:02] *** marvinw_ has quit IRC (Read error: Operation timed out) [17:10] *** Start has quit IRC (Read error: Operation timed out) [17:12] *** Start has joined #urlteam [17:15] *** JesseW has quit IRC (Leaving.) [18:36] *** Start has quit IRC (Quit: Disconnected.) [18:42] *** joepie91 has quit IRC (Read error: Operation timed out) [18:45] *** joepie91 has joined #urlteam [18:45] *** svchfoo1 sets mode: +o joepie91 [18:48] *** aaaaaaaaa has joined #urlteam [18:48] *** swebb sets mode: +o aaaaaaaaa [18:51] *** aaaaaaaaa has quit IRC (Client Quit) [19:11] *** Start has joined #urlteam [19:19] *** Start has quit IRC (Quit: Disconnected.) [19:52] *** SimpBrain has quit IRC (Leaving) [19:54] *** aaaaaaaaa has joined #urlteam [19:54] *** swebb sets mode: +o aaaaaaaaa [20:02] More threads thrown at urlteam :) [20:25] *** slang has joined #urlteam [20:33] *** aaaaaaaa_ has joined #urlteam [20:33] *** aaaaaaaaa has quit IRC (Read error: Connection reset by peer) [20:33] *** swebb sets mode: +o aaaaaaaa_ [20:34] *** aaaaaaaa_ is now known as aaaaaaaaa [20:34] *** JW_work has joined #urlteam [20:35] jornane: http://urlte.am is out of date. [20:36] Since last November, new data is released directly to internet archive items, usually about once a day. There are currently over 300 such items. [20:36] They can be downloaded via bittorrent, or directly from IA. [20:37] Each one contains multiple .zip files (one per url shortener scraped during that day). The zip files contain .xz files, which when decompressed, are plain text in BECON format, i.e. shortcode vertical-bar longURL [20:45] *** aaaaaaaaa has quit IRC (Read error: Connection reset by peer) [20:46] *** aaaaaaaaa has joined #urlteam [20:46] *** swebb sets mode: +o aaaaaaaaa [20:47] *** aaaaaaaaa has quit IRC (Client Quit) [20:55] *** Start has joined #urlteam [21:17] *** aaaaaaaaa has joined #urlteam [21:17] *** swebb sets mode: +o aaaaaaaaa [21:58] *** SimpBrain has joined #urlteam [22:09] *** aaaaaaaa_ has joined #urlteam [22:09] *** aaaaaaaaa has quit IRC (Read error: Connection reset by peer) [22:09] *** swebb sets mode: +o aaaaaaaa_ [22:09] *** aaaaaaaa_ is now known as aaaaaaaaa [22:16] *** Start has quit IRC (Quit: Disconnected.) [22:35] *** marvinw has joined #urlteam [23:21] *** Start has joined #urlteam