[00:31] *** dashcloud has quit IRC (Read error: Operation timed out) [00:36] *** dashcloud has joined #urlteam [00:37] *** svchfoo3 sets mode: +o dashcloud [00:40] *** JesseW has joined #urlteam [00:41] *** svchfoo3 sets mode: +o JesseW [00:57] JesseW, I got all the torrent files downloaded but I cant find an easy way to add them to transmission. [00:58] cool; you can add them to transmission using transmission-remote; let me paste the command I used. [00:58] So if you have the paths to the torrents (either local paths or URLs) in a file called urlteam_torrents.txt ... [00:59] xargs -t -L 1 transmission-remote --auth=transmission:transmission -a < urlteam_torrents.txt [00:59] should work [01:00] -t prints the command line, -L 1 says to run each one separately (just makes it easier to view, you can leave it out) [01:01] --auth should be changed to whatever your transmission username and password are [01:01] -a means add this torrent [01:13] That seems to be working [01:16] cool [01:16] which ones are you currently downloading? [01:20] The whole lot [01:33] I'm just surprised I don't see you in any of the swarms [01:37] Its only downloading a few at a time [01:37] I need to raise the queue limit [01:38] which ones is downloading right now? [01:38] urlteam_xrlus_20150113 [01:39] I think its only downloading from the webseeds [01:39] ah, that would be why. I haven't downloaded xrlus [01:49] *** VADemon has joined #urlteam [02:02] WinterFox: I think I see you in the swarm now. :-) [02:13] *** dashcloud has quit IRC (Read error: Operation timed out) [02:18] *** dashcloud has joined #urlteam [02:18] *** svchfoo1 sets mode: +o dashcloud [03:16] *** bwn has quit IRC (Read error: Operation timed out) [03:31] Its going to take forever to get these torrents :L [03:33] eh, not forever -- but it took me about a day, I think. [03:36] I might have to pause it if I get close to my data cap [03:36] ah, makes sense. I'm really lucky in my ISP. [03:37] Internet in Australia sucks [03:38] it's about 91 GB for the incremental ones, plus the 88 GB big old one. [03:38] how does that compare to your data cap? [03:44] I get 400GB/month [03:45] Im 97GB used 7 days into the month [03:46] What are you planning to do with the files once you get them? [03:46] If you're not in a hurry, just wait until the last day of the month, and use up the *rest* of your limit on them, then repeat next month. [03:51] Im going to write a script to search them quickly. Then probably make a website to search it online [03:59] Ah, cool. [03:59] Well, you should be able to write the script with just a few of them. [03:59] Then just extend it with new ones as you get around to downloading them [04:00] The trick will be what to do with the old dump, as it has a (somewhat) different format. [04:01] You should probably focus on downloading that, then. https://archive.org/download/URLTeamTorrentRelease2013July [04:12] *** VADemon has quit IRC (Read error: Connection reset by peer) [04:14] I actually had the old dump but I removed it a while ago :L [04:25] ha. [04:26] *** aaaaaaaaa has quit IRC (Leaving) [04:27] Regarding the bit.ly aliases section on http://archiveteam.org/index.php?title=URLTeam#bit.ly_aliases -- it seems like for some of them (like nyti.ms ) any shortcode that works on them also works on bit.ly, so will get picked up (eventually) in our existing bit.ly project. But is that true for all of them? [04:46] Are the meta files in the torrent just from the internet archive? [04:52] Yep; they aren't particularly relevant. [05:17] Im thinking of writing a script that will extract the urls text file and append it to the end of a file for each shortener [05:18] if you have the space, I suppose that could speed up searches. [05:18] It'd be quite a bit of space, though. [05:23] How much do you think it would take? [05:26] well, this random file I checked had a compression ratio of 0.293 [05:27] I might just extract it all write/test the scripts then remove the extracted data and publish the scripts [05:28] so that would give about 310 GB for the incremental ones [05:28] and 300 GB for the big dump [05:28] or about 600 GB. Not too bad. [06:11] How does the naming scheme of the dumps work? Is bitly_6 the 6th run of bitly? [06:15] bitly_6 is the name of a urlteam project -- I think the 6 refers to 6 characters, i.e. the length of the shortcode being looked through. But I'm not sure. [06:16] Ah ok. So that should stay the same then [06:17] Yep, everything up to the timestamp is the name of the project. If you want to extract the URL, you'll need to pull that either out of the header of the txt files (where it's called #PREFIX) or the meta.json file (where it's called url_template) [06:17] the actual URL of the shortner, I mean [07:03] So far my script can extract all the zips and remove the meta files [07:03] :-) [07:05] https://bpaste.net/show/4e19075dfa9b [07:09] you should probably just avoid extracting the ones you are going to throw away. [07:10] e.g. use namelist() first, filter them out there, then pass the resulting list to extractall [07:11] https://docs.python.org/2/library/zipfile.html#zipfile.ZipFile.extractall [07:11] The files inside the zip are still compressed so there would be very little performance impact but I could do that. [07:11] Once I get it working I will make it work better [07:12] makes sense [07:34] *** SmileyG has joined #urlteam [07:35] chfoo, arkiver, (maybe other people) -- so, if I made a PR adding some features to the admin side of the tracker, would I be able to get that merged soonish? [07:35] I want to add the ability to filter the results display by project. [07:36] That way I can check that a project is working without having to wait until the next export (and without it getting swamped by the constantly running projects) [07:37] merged and deployed [07:38] *** swebb has quit IRC (Ping timeout: 253 seconds) [07:38] *** swebb has joined #urlteam [07:38] *** Smiley has quit IRC (Read error: Operation timed out) [07:39] *** svchfoo3 sets mode: +o swebb [07:54] *** Start has quit IRC (Quit: Disconnected.) [07:56] *** Start has joined #urlteam [07:58] Well, for what (not much) use it is, we are now gathering short URLs from a random recruiter's vanity shortner, bull.hn... [08:03] JesseW: i don't see any pull request in github? [08:03] I haven't made it yet. [08:04] I wanted to check what the process/delay there might be on getting it deployed before I started. [08:05] *** bwn has joined #urlteam [08:06] it depends on whether i'm around [08:06] it if the pull request works i just merge it and pull it on the server [08:07] ok, sounds good. I'll make it, hopefully by tomorrow. [08:15] chfoo is there an example config file in the repo? [08:15] ah, found it [08:41] OK, here's a tiny PR: https://github.com/ArchiveTeam/terroroftinytown/pull/37 [08:41] or, you know, GitHub108 will tell you... [08:56] *** JesseW has quit IRC (Read error: Operation timed out) [10:15] *** bwn has quit IRC (Read error: Connection reset by peer) [10:15] *** bwn has joined #urlteam [11:04] *** bwn has quit IRC (Read error: Operation timed out) [11:45] *** bwn has joined #urlteam [12:06] *** WinterFox has quit IRC (Remote host closed the connection) [13:00] *** VADemon has joined #urlteam [13:08] *** bwn has quit IRC (Read error: Operation timed out) [13:19] *** SimpBrain has quit IRC (Leaving) [18:01] *** JesseW has joined #urlteam [18:01] *** svchfoo3 sets mode: +o JesseW [18:06] chfoo: Any chance I could get a copy of the production database to test with? While I could populate one with fake data, it would be easier to just use the real one... [18:06] For testing things like performance of queries (e.g. results display) [18:40] *** SimpBrain has joined #urlteam [18:54] *** aaaaaaaaa has joined #urlteam [18:54] *** swebb sets mode: +o aaaaaaaaa [19:05] *** dashcloud has quit IRC (Read error: Operation timed out) [19:13] *** dashcloud has joined #urlteam [19:13] *** svchfoo3 sets mode: +o dashcloud [20:03] *** Boppen has quit IRC (Read error: Connection reset by peer) [20:04] *** Boppen has joined #urlteam [20:07] *** dashcloud has quit IRC (Read error: Operation timed out) [20:10] *** dashcloud has joined #urlteam [20:11] *** svchfoo1 sets mode: +o dashcloud [21:04] *** dashcloud has quit IRC (Read error: Operation timed out) [21:08] *** dashcloud has joined #urlteam [21:08] *** svchfoo3 sets mode: +o dashcloud [21:22] *** VADemon has quit IRC (left4dead) [21:53] *** JesseW has quit IRC (Leaving.) [21:58] *** dashcloud has quit IRC (Read error: Operation timed out) [22:01] *** dashcloud has joined #urlteam [22:01] *** svchfoo1 sets mode: +o dashcloud [23:02] *** JesseW has joined #urlteam [23:02] *** svchfoo3 sets mode: +o JesseW [23:08] *** Boppen has quit IRC (Read error: Connection reset by peer) [23:09] *** Boppen has joined #urlteam [23:51] *** bwn has joined #urlteam