#urlteam 2015-11-15,Sun

↑back Search

Time Nickname Message
00:31 🔗 dashcloud has quit IRC (Read error: Operation timed out)
00:36 🔗 dashcloud has joined #urlteam
00:37 🔗 svchfoo3 sets mode: +o dashcloud
00:40 🔗 JesseW has joined #urlteam
00:41 🔗 svchfoo3 sets mode: +o JesseW
00:57 🔗 WinterFox JesseW, I got all the torrent files downloaded but I cant find an easy way to add them to transmission.
00:58 🔗 JesseW cool; you can add them to transmission using transmission-remote; let me paste the command I used.
00:58 🔗 JesseW So if you have the paths to the torrents (either local paths or URLs) in a file called urlteam_torrents.txt ...
00:59 🔗 JesseW xargs -t -L 1 transmission-remote --auth=transmission:transmission -a < urlteam_torrents.txt
00:59 🔗 JesseW should work
01:00 🔗 JesseW -t prints the command line, -L 1 says to run each one separately (just makes it easier to view, you can leave it out)
01:01 🔗 JesseW --auth should be changed to whatever your transmission username and password are
01:01 🔗 JesseW -a means add this torrent
01:13 🔗 WinterFox That seems to be working
01:16 🔗 JesseW cool
01:16 🔗 JesseW which ones are you currently downloading?
01:20 🔗 WinterFox The whole lot
01:33 🔗 JesseW I'm just surprised I don't see you in any of the swarms
01:37 🔗 WinterFox Its only downloading a few at a time
01:37 🔗 WinterFox I need to raise the queue limit
01:38 🔗 JesseW which ones is downloading right now?
01:38 🔗 WinterFox urlteam_xrlus_20150113
01:39 🔗 WinterFox I think its only downloading from the webseeds
01:39 🔗 JesseW ah, that would be why. I haven't downloaded xrlus
01:49 🔗 VADemon has joined #urlteam
02:02 🔗 JesseW WinterFox: I think I see you in the swarm now. :-)
02:13 🔗 dashcloud has quit IRC (Read error: Operation timed out)
02:18 🔗 dashcloud has joined #urlteam
02:18 🔗 svchfoo1 sets mode: +o dashcloud
03:16 🔗 bwn has quit IRC (Read error: Operation timed out)
03:31 🔗 WinterFox Its going to take forever to get these torrents :L
03:33 🔗 JesseW eh, not forever -- but it took me about a day, I think.
03:36 🔗 WinterFox I might have to pause it if I get close to my data cap
03:36 🔗 JesseW ah, makes sense. I'm really lucky in my ISP.
03:37 🔗 WinterFox Internet in Australia sucks
03:38 🔗 JesseW it's about 91 GB for the incremental ones, plus the 88 GB big old one.
03:38 🔗 JesseW how does that compare to your data cap?
03:44 🔗 WinterFox I get 400GB/month
03:45 🔗 WinterFox Im 97GB used 7 days into the month
03:46 🔗 JesseW What are you planning to do with the files once you get them?
03:46 🔗 JesseW If you're not in a hurry, just wait until the last day of the month, and use up the *rest* of your limit on them, then repeat next month.
03:51 🔗 WinterFox Im going to write a script to search them quickly. Then probably make a website to search it online
03:59 🔗 JesseW Ah, cool.
03:59 🔗 JesseW Well, you should be able to write the script with just a few of them.
03:59 🔗 JesseW Then just extend it with new ones as you get around to downloading them
04:00 🔗 JesseW The trick will be what to do with the old dump, as it has a (somewhat) different format.
04:01 🔗 JesseW You should probably focus on downloading that, then. https://archive.org/download/URLTeamTorrentRelease2013July
04:12 🔗 VADemon has quit IRC (Read error: Connection reset by peer)
04:14 🔗 WinterFox I actually had the old dump but I removed it a while ago :L
04:25 🔗 JesseW ha.
04:26 🔗 aaaaaaaaa has quit IRC (Leaving)
04:27 🔗 JesseW Regarding the bit.ly aliases section on http://archiveteam.org/index.php?title=URLTeam#bit.ly_aliases -- it seems like for some of them (like nyti.ms ) any shortcode that works on them also works on bit.ly, so will get picked up (eventually) in our existing bit.ly project. But is that true for all of them?
04:46 🔗 WinterFox Are the meta files in the torrent just from the internet archive?
04:52 🔗 JesseW Yep; they aren't particularly relevant.
05:17 🔗 WinterFox Im thinking of writing a script that will extract the urls text file and append it to the end of a file for each shortener
05:18 🔗 JesseW if you have the space, I suppose that could speed up searches.
05:18 🔗 JesseW It'd be quite a bit of space, though.
05:23 🔗 WinterFox How much do you think it would take?
05:26 🔗 JesseW well, this random file I checked had a compression ratio of 0.293
05:27 🔗 WinterFox I might just extract it all write/test the scripts then remove the extracted data and publish the scripts
05:28 🔗 JesseW so that would give about 310 GB for the incremental ones
05:28 🔗 JesseW and 300 GB for the big dump
05:28 🔗 JesseW or about 600 GB. Not too bad.
06:11 🔗 WinterFox How does the naming scheme of the dumps work? Is bitly_6 the 6th run of bitly?
06:15 🔗 JesseW bitly_6 is the name of a urlteam project -- I think the 6 refers to 6 characters, i.e. the length of the shortcode being looked through. But I'm not sure.
06:16 🔗 WinterFox Ah ok. So that should stay the same then
06:17 🔗 JesseW Yep, everything up to the timestamp is the name of the project. If you want to extract the URL, you'll need to pull that either out of the header of the txt files (where it's called #PREFIX) or the meta.json file (where it's called url_template)
06:17 🔗 JesseW the actual URL of the shortner, I mean
07:03 🔗 WinterFox So far my script can extract all the zips and remove the meta files
07:03 🔗 JesseW :-)
07:05 🔗 WinterFox https://bpaste.net/show/4e19075dfa9b
07:09 🔗 JesseW you should probably just avoid extracting the ones you are going to throw away.
07:10 🔗 JesseW e.g. use namelist() first, filter them out there, then pass the resulting list to extractall
07:11 🔗 JesseW https://docs.python.org/2/library/zipfile.html#zipfile.ZipFile.extractall
07:11 🔗 WinterFox The files inside the zip are still compressed so there would be very little performance impact but I could do that.
07:11 🔗 WinterFox Once I get it working I will make it work better
07:12 🔗 JesseW makes sense
07:34 🔗 SmileyG has joined #urlteam
07:35 🔗 JesseW chfoo, arkiver, (maybe other people) -- so, if I made a PR adding some features to the admin side of the tracker, would I be able to get that merged soonish?
07:35 🔗 JesseW I want to add the ability to filter the results display by project.
07:36 🔗 JesseW That way I can check that a project is working without having to wait until the next export (and without it getting swamped by the constantly running projects)
07:37 🔗 JesseW merged and deployed
07:38 🔗 swebb has quit IRC (Ping timeout: 253 seconds)
07:38 🔗 swebb has joined #urlteam
07:38 🔗 Smiley has quit IRC (Read error: Operation timed out)
07:39 🔗 svchfoo3 sets mode: +o swebb
07:54 🔗 Start has quit IRC (Quit: Disconnected.)
07:56 🔗 Start has joined #urlteam
07:58 🔗 JesseW Well, for what (not much) use it is, we are now gathering short URLs from a random recruiter's vanity shortner, bull.hn...
08:03 🔗 chfoo JesseW: i don't see any pull request in github?
08:03 🔗 JesseW I haven't made it yet.
08:04 🔗 JesseW I wanted to check what the process/delay there might be on getting it deployed before I started.
08:05 🔗 bwn has joined #urlteam
08:06 🔗 chfoo it depends on whether i'm around
08:06 🔗 chfoo it if the pull request works i just merge it and pull it on the server
08:07 🔗 JesseW ok, sounds good. I'll make it, hopefully by tomorrow.
08:15 🔗 JesseW chfoo is there an example config file in the repo?
08:15 🔗 JesseW ah, found it
08:41 🔗 JesseW OK, here's a tiny PR: https://github.com/ArchiveTeam/terroroftinytown/pull/37
08:41 🔗 JesseW or, you know, GitHub108 will tell you...
08:56 🔗 JesseW has quit IRC (Read error: Operation timed out)
10:15 🔗 bwn has quit IRC (Read error: Connection reset by peer)
10:15 🔗 bwn has joined #urlteam
11:04 🔗 bwn has quit IRC (Read error: Operation timed out)
11:45 🔗 bwn has joined #urlteam
12:06 🔗 WinterFox has quit IRC (Remote host closed the connection)
13:00 🔗 VADemon has joined #urlteam
13:08 🔗 bwn has quit IRC (Read error: Operation timed out)
13:19 🔗 SimpBrain has quit IRC (Leaving)
18:01 🔗 JesseW has joined #urlteam
18:01 🔗 svchfoo3 sets mode: +o JesseW
18:06 🔗 JesseW chfoo: Any chance I could get a copy of the production database to test with? While I could populate one with fake data, it would be easier to just use the real one...
18:06 🔗 JesseW For testing things like performance of queries (e.g. results display)
18:40 🔗 SimpBrain has joined #urlteam
18:54 🔗 aaaaaaaaa has joined #urlteam
18:54 🔗 swebb sets mode: +o aaaaaaaaa
19:05 🔗 dashcloud has quit IRC (Read error: Operation timed out)
19:13 🔗 dashcloud has joined #urlteam
19:13 🔗 svchfoo3 sets mode: +o dashcloud
20:03 🔗 Boppen has quit IRC (Read error: Connection reset by peer)
20:04 🔗 Boppen has joined #urlteam
20:07 🔗 dashcloud has quit IRC (Read error: Operation timed out)
20:10 🔗 dashcloud has joined #urlteam
20:11 🔗 svchfoo1 sets mode: +o dashcloud
21:04 🔗 dashcloud has quit IRC (Read error: Operation timed out)
21:08 🔗 dashcloud has joined #urlteam
21:08 🔗 svchfoo3 sets mode: +o dashcloud
21:22 🔗 VADemon has quit IRC (left4dead)
21:53 🔗 JesseW has quit IRC (Leaving.)
21:58 🔗 dashcloud has quit IRC (Read error: Operation timed out)
22:01 🔗 dashcloud has joined #urlteam
22:01 🔗 svchfoo1 sets mode: +o dashcloud
23:02 🔗 JesseW has joined #urlteam
23:02 🔗 svchfoo3 sets mode: +o JesseW
23:08 🔗 Boppen has quit IRC (Read error: Connection reset by peer)
23:09 🔗 Boppen has joined #urlteam
23:51 🔗 bwn has joined #urlteam

irclogger-viewer