#archiveteam 2013-10-17,Thu

โ†‘back Search

Time Nickname Message
08:07 ๐Ÿ”— Nemo_bis https://archive.org/post/1002569/requested-download-is-not-authorized-for-use-with-this-tracker
10:17 ๐Ÿ”— Nemo_bis Ah. How stupid I am. SketchCow needs to get book_op create the torrents on Wikimediacommons* (uppercase W) items too, just that. Ideally they should be renamed (it was Calc messing up with uppercase in csv... when calligra is better).
17:58 ๐Ÿ”— ivan` http://variety.com/2013/biz/news/isohunt-to-shut-down-as-part-of-settlement-with-studios-1200734509/
18:11 ๐Ÿ”— balrog time to scrape all their ids?
18:11 ๐Ÿ”— balrog I think it's shutting down today though
18:27 ๐Ÿ”— godane i'm grabbing isohunt.com forums
18:30 ๐Ÿ”— RedType_ isohunt is shutting down?
18:30 ๐Ÿ”— RedType_ what the fuck
18:30 ๐Ÿ”— ivan` it's been troubled for a long time
18:30 ๐Ÿ”— godane i know it linked to tons of archive.org torrents
18:33 ๐Ÿ”— RedType_ i know but it's sudden
19:56 ๐Ÿ”— lemonkey had an older bookmark open in my browser and refreshed it and saw this.. not sure when cue died.. http://cueup.com/
19:56 ๐Ÿ”— lemonkey looks like early this month http://techcrunch.com/2013/10/02/cue-greplin/
19:58 ๐Ÿ”— lemonkey ah apple bought it
19:59 ๐Ÿ”— RedType_ they took down cue up adventure :(
21:21 ๐Ÿ”— lysobit http://torrentfreak.com/isohunt-shuts-down-after-110-million-settlement-with-the-mpaa-131017/
21:23 ๐Ÿ”— lysobit Sites: 555 รขย€ยข Trackers: 235,842 รขย€ยข Active Torrents: 13,737,689 รขย€ยข Files: 285.58M รขย€ยข Size: 17,371.74 TB รขย€ยข Peers: 52.83M
21:23 ๐Ÿ”— lysobit :(
21:26 ๐Ÿ”— Nemo_bis I'd also love if http://publicbt.com/all.txt.bz2 worked again
21:27 ๐Ÿ”— lysobit There are 13,000,000 torrents. If each .torrent file is 50kb on average, then the total storage required to store all torrents would be < 700mb
21:27 ๐Ÿ”— lysobit actually disregard that
21:27 ๐Ÿ”— lysobit I mean 700gb
21:33 ๐Ÿ”— Nemo_bis well it's not particularly useful to store torrents anyway
21:34 ๐Ÿ”— Nemo_bis publicbt.com gave you all you needed; but now it doesn't work
21:43 ๐Ÿ”— DFJustin well it's not just the torrent files, they also have uploader comments (i.e. metadata)
21:44 ๐Ÿ”— omf_ The metadata is interesting
21:59 ๐Ÿ”— joepie91 Nemo_bis, omf_, DFJustin, etc: http://pastie.org/private/cbryvcdrxpf7dod4vfla
21:59 ๐Ÿ”— joepie91 that will at least grab all the torrents
21:59 ๐Ÿ”— joepie91 or nearly all anyway
22:01 ๐Ÿ”— joepie91 their JSON search API is really restrictive :(
22:01 ๐Ÿ”— joepie91 max 1000 results per query
22:01 ๐Ÿ”— joepie91 I mean, I *could* write another bruteforce searcher again...
22:02 ๐Ÿ”— joepie91 but their forum thread suggested that they monitor search request rate
22:02 ๐Ÿ”— joepie91 useful: the numeric IDs for their torrents are the same as for the detail pagse
22:02 ๐Ÿ”— joepie91 pages *
22:02 ๐Ÿ”— * joepie91 feels like this would be a good Warrior project
22:04 ๐Ÿ”— DFJustin yeah just iterate through https://isohunt.com/torrent_details/xxxxxx/
22:04 ๐Ÿ”— joepie91 DFJustin: I'm intentionally iterating through the .torrents actually
22:04 ๐Ÿ”— joepie91 instead of the details pages
22:04 ๐Ÿ”— joepie91 I feel like a static torrent serving backend would be faster
22:04 ๐Ÿ”— lysobit note that they have already settled in court; who knows when they're going shut the site down
22:04 ๐Ÿ”— joepie91 thus you can get 404d before it does any db queries
22:05 ๐Ÿ”— DFJustin the .torrent files are mirrored everywhere already though, the unique stuff is all the "what the hell is this" text
22:05 ๐Ÿ”— joepie91 you'd be downloading the .torrents anyway
22:05 ๐Ÿ”— joepie91 so might as well start with those, and for the non-404s then fetch the /details/ pages
22:05 ๐Ÿ”— DFJustin that makes sense I guess
22:05 ๐Ÿ”— joepie91 DFJustin: many isohunt torrents are no longer in their original location
22:05 ๐Ÿ”— joepie91 in my experience
22:05 ๐Ÿ”— joepie91 :P
22:05 ๐Ÿ”— joepie91 lysobit: that's why I'm optimizing for speed
22:06 ๐Ÿ”— joepie91 do we have any awake warrior devs?
22:06 ๐Ÿ”— lysobit make it multithreaded
22:06 ๐Ÿ”— joepie91 that are familiar with the pipeline stuff etc.
22:06 ๐Ÿ”— lysobit using python threads
22:06 ๐Ÿ”— joepie91 lysobit: meh, might as well
22:07 ๐Ÿ”— lysobit stick on a dedi with a 1gbit pipe; done
22:07 ๐Ÿ”— lysobit and a 1tb hd
22:07 ๐Ÿ”— lysobit thought, easier said than done :P
22:07 ๐Ÿ”— DFJustin fwiw even if the torrent is no longer in its original location the details page has the info hash which is all you really need
22:08 ๐Ÿ”— lysobit metadata would be nice to have
22:08 ๐Ÿ”— lysobit as infohash doesn't contain what files are in the torrent
22:09 ๐Ÿ”— lysobit and would be even better if you can store the name of the torrent too
22:10 ๐Ÿ”— DFJustin yeah but there are other projects mass-downloading torrent files for infohashes and every torrent site under the sun will have the torrent fiel as well
22:11 ๐Ÿ”— DFJustin so that stuff is not really at risk
22:16 ๐Ÿ”— joepie91 hmm this multithreaded version actually works pretty well it seems
22:16 ๐Ÿ”— joepie91 :P
22:17 ๐Ÿ”— joepie91 http://pastie.org/private/agybnuru8digavvhdagt1w
22:19 ๐Ÿ”— joepie91 not blocked yet
22:19 ๐Ÿ”— joepie91 running 5 threads
22:19 ๐Ÿ”— joepie91 roughly 10-15 torrents checked per second
22:20 ๐Ÿ”— joepie91 400 days
22:20 ๐Ÿ”— joepie91 not fast enough
22:23 ๐Ÿ”— joepie91 hrm
22:27 ๐Ÿ”— joepie91 We know you love isoHunt, but you shouldn't hit us this fast. You are banned for 1200 seconds.
22:27 ๐Ÿ”— joepie91 :(
22:30 ๐Ÿ”— joepie91 well at least we now know that they have rate limiting in place lol
22:41 ๐Ÿ”— joepie91 right, script has throttling now..
22:50 ๐Ÿ”— joepie91 sooooooooo
22:50 ๐Ÿ”— joepie91 5 threads was apparently also too much
22:51 ๐Ÿ”— joepie91 http://pastie.org/private/jlaqklfwjkhznbx4bdrnrw
22:51 ๐Ÿ”— joepie91 feel free to change the range to a subset of the current range (newest and oldest vars) and run
22:52 ๐Ÿ”— joepie91 :)
22:52 ๐Ÿ”— joepie91 (given absence of a warrior project)
23:38 ๐Ÿ”— joepie91 right
23:38 ๐Ÿ”— joepie91 3 threads seems the max
23:38 ๐Ÿ”— joepie91 to not get throttled

irclogger-viewer