| Time |
Nickname |
Message |
|
08:07
๐
|
Nemo_bis |
https://archive.org/post/1002569/requested-download-is-not-authorized-for-use-with-this-tracker |
|
10:17
๐
|
Nemo_bis |
Ah. How stupid I am. SketchCow needs to get book_op create the torrents on Wikimediacommons* (uppercase W) items too, just that. Ideally they should be renamed (it was Calc messing up with uppercase in csv... when calligra is better). |
|
17:58
๐
|
ivan` |
http://variety.com/2013/biz/news/isohunt-to-shut-down-as-part-of-settlement-with-studios-1200734509/ |
|
18:11
๐
|
balrog |
time to scrape all their ids? |
|
18:11
๐
|
balrog |
I think it's shutting down today though |
|
18:27
๐
|
godane |
i'm grabbing isohunt.com forums |
|
18:30
๐
|
RedType_ |
isohunt is shutting down? |
|
18:30
๐
|
RedType_ |
what the fuck |
|
18:30
๐
|
ivan` |
it's been troubled for a long time |
|
18:30
๐
|
godane |
i know it linked to tons of archive.org torrents |
|
18:33
๐
|
RedType_ |
i know but it's sudden |
|
19:56
๐
|
lemonkey |
had an older bookmark open in my browser and refreshed it and saw this.. not sure when cue died.. http://cueup.com/ |
|
19:56
๐
|
lemonkey |
looks like early this month http://techcrunch.com/2013/10/02/cue-greplin/ |
|
19:58
๐
|
lemonkey |
ah apple bought it |
|
19:59
๐
|
RedType_ |
they took down cue up adventure :( |
|
21:21
๐
|
lysobit |
http://torrentfreak.com/isohunt-shuts-down-after-110-million-settlement-with-the-mpaa-131017/ |
|
21:23
๐
|
lysobit |
Sites: 555 รขยยข Trackers: 235,842 รขยยข Active Torrents: 13,737,689 รขยยข Files: 285.58M รขยยข Size: 17,371.74 TB รขยยข Peers: 52.83M |
|
21:23
๐
|
lysobit |
:( |
|
21:26
๐
|
Nemo_bis |
I'd also love if http://publicbt.com/all.txt.bz2 worked again |
|
21:27
๐
|
lysobit |
There are 13,000,000 torrents. If each .torrent file is 50kb on average, then the total storage required to store all torrents would be < 700mb |
|
21:27
๐
|
lysobit |
actually disregard that |
|
21:27
๐
|
lysobit |
I mean 700gb |
|
21:33
๐
|
Nemo_bis |
well it's not particularly useful to store torrents anyway |
|
21:34
๐
|
Nemo_bis |
publicbt.com gave you all you needed; but now it doesn't work |
|
21:43
๐
|
DFJustin |
well it's not just the torrent files, they also have uploader comments (i.e. metadata) |
|
21:44
๐
|
omf_ |
The metadata is interesting |
|
21:59
๐
|
joepie91 |
Nemo_bis, omf_, DFJustin, etc: http://pastie.org/private/cbryvcdrxpf7dod4vfla |
|
21:59
๐
|
joepie91 |
that will at least grab all the torrents |
|
21:59
๐
|
joepie91 |
or nearly all anyway |
|
22:01
๐
|
joepie91 |
their JSON search API is really restrictive :( |
|
22:01
๐
|
joepie91 |
max 1000 results per query |
|
22:01
๐
|
joepie91 |
I mean, I *could* write another bruteforce searcher again... |
|
22:02
๐
|
joepie91 |
but their forum thread suggested that they monitor search request rate |
|
22:02
๐
|
joepie91 |
useful: the numeric IDs for their torrents are the same as for the detail pagse |
|
22:02
๐
|
joepie91 |
pages * |
|
22:02
๐
|
* |
joepie91 feels like this would be a good Warrior project |
|
22:04
๐
|
DFJustin |
yeah just iterate through https://isohunt.com/torrent_details/xxxxxx/ |
|
22:04
๐
|
joepie91 |
DFJustin: I'm intentionally iterating through the .torrents actually |
|
22:04
๐
|
joepie91 |
instead of the details pages |
|
22:04
๐
|
joepie91 |
I feel like a static torrent serving backend would be faster |
|
22:04
๐
|
lysobit |
note that they have already settled in court; who knows when they're going shut the site down |
|
22:04
๐
|
joepie91 |
thus you can get 404d before it does any db queries |
|
22:05
๐
|
DFJustin |
the .torrent files are mirrored everywhere already though, the unique stuff is all the "what the hell is this" text |
|
22:05
๐
|
joepie91 |
you'd be downloading the .torrents anyway |
|
22:05
๐
|
joepie91 |
so might as well start with those, and for the non-404s then fetch the /details/ pages |
|
22:05
๐
|
DFJustin |
that makes sense I guess |
|
22:05
๐
|
joepie91 |
DFJustin: many isohunt torrents are no longer in their original location |
|
22:05
๐
|
joepie91 |
in my experience |
|
22:05
๐
|
joepie91 |
:P |
|
22:05
๐
|
joepie91 |
lysobit: that's why I'm optimizing for speed |
|
22:06
๐
|
joepie91 |
do we have any awake warrior devs? |
|
22:06
๐
|
lysobit |
make it multithreaded |
|
22:06
๐
|
joepie91 |
that are familiar with the pipeline stuff etc. |
|
22:06
๐
|
lysobit |
using python threads |
|
22:06
๐
|
joepie91 |
lysobit: meh, might as well |
|
22:07
๐
|
lysobit |
stick on a dedi with a 1gbit pipe; done |
|
22:07
๐
|
lysobit |
and a 1tb hd |
|
22:07
๐
|
lysobit |
thought, easier said than done :P |
|
22:07
๐
|
DFJustin |
fwiw even if the torrent is no longer in its original location the details page has the info hash which is all you really need |
|
22:08
๐
|
lysobit |
metadata would be nice to have |
|
22:08
๐
|
lysobit |
as infohash doesn't contain what files are in the torrent |
|
22:09
๐
|
lysobit |
and would be even better if you can store the name of the torrent too |
|
22:10
๐
|
DFJustin |
yeah but there are other projects mass-downloading torrent files for infohashes and every torrent site under the sun will have the torrent fiel as well |
|
22:11
๐
|
DFJustin |
so that stuff is not really at risk |
|
22:16
๐
|
joepie91 |
hmm this multithreaded version actually works pretty well it seems |
|
22:16
๐
|
joepie91 |
:P |
|
22:17
๐
|
joepie91 |
http://pastie.org/private/agybnuru8digavvhdagt1w |
|
22:19
๐
|
joepie91 |
not blocked yet |
|
22:19
๐
|
joepie91 |
running 5 threads |
|
22:19
๐
|
joepie91 |
roughly 10-15 torrents checked per second |
|
22:20
๐
|
joepie91 |
400 days |
|
22:20
๐
|
joepie91 |
not fast enough |
|
22:23
๐
|
joepie91 |
hrm |
|
22:27
๐
|
joepie91 |
We know you love isoHunt, but you shouldn't hit us this fast. You are banned for 1200 seconds. |
|
22:27
๐
|
joepie91 |
:( |
|
22:30
๐
|
joepie91 |
well at least we now know that they have rate limiting in place lol |
|
22:41
๐
|
joepie91 |
right, script has throttling now.. |
|
22:50
๐
|
joepie91 |
sooooooooo |
|
22:50
๐
|
joepie91 |
5 threads was apparently also too much |
|
22:51
๐
|
joepie91 |
http://pastie.org/private/jlaqklfwjkhznbx4bdrnrw |
|
22:51
๐
|
joepie91 |
feel free to change the range to a subset of the current range (newest and oldest vars) and run |
|
22:52
๐
|
joepie91 |
:) |
|
22:52
๐
|
joepie91 |
(given absence of a warrior project) |
|
23:38
๐
|
joepie91 |
right |
|
23:38
๐
|
joepie91 |
3 threads seems the max |
|
23:38
๐
|
joepie91 |
to not get throttled |