Time |
Nickname |
Message |
08:07
๐
|
Nemo_bis |
https://archive.org/post/1002569/requested-download-is-not-authorized-for-use-with-this-tracker |
10:17
๐
|
Nemo_bis |
Ah. How stupid I am. SketchCow needs to get book_op create the torrents on Wikimediacommons* (uppercase W) items too, just that. Ideally they should be renamed (it was Calc messing up with uppercase in csv... when calligra is better). |
17:58
๐
|
ivan` |
http://variety.com/2013/biz/news/isohunt-to-shut-down-as-part-of-settlement-with-studios-1200734509/ |
18:11
๐
|
balrog |
time to scrape all their ids? |
18:11
๐
|
balrog |
I think it's shutting down today though |
18:27
๐
|
godane |
i'm grabbing isohunt.com forums |
18:30
๐
|
RedType_ |
isohunt is shutting down? |
18:30
๐
|
RedType_ |
what the fuck |
18:30
๐
|
ivan` |
it's been troubled for a long time |
18:30
๐
|
godane |
i know it linked to tons of archive.org torrents |
18:33
๐
|
RedType_ |
i know but it's sudden |
19:56
๐
|
lemonkey |
had an older bookmark open in my browser and refreshed it and saw this.. not sure when cue died.. http://cueup.com/ |
19:56
๐
|
lemonkey |
looks like early this month http://techcrunch.com/2013/10/02/cue-greplin/ |
19:58
๐
|
lemonkey |
ah apple bought it |
19:59
๐
|
RedType_ |
they took down cue up adventure :( |
21:21
๐
|
lysobit |
http://torrentfreak.com/isohunt-shuts-down-after-110-million-settlement-with-the-mpaa-131017/ |
21:23
๐
|
lysobit |
Sites: 555 รขยยข Trackers: 235,842 รขยยข Active Torrents: 13,737,689 รขยยข Files: 285.58M รขยยข Size: 17,371.74 TB รขยยข Peers: 52.83M |
21:23
๐
|
lysobit |
:( |
21:26
๐
|
Nemo_bis |
I'd also love if http://publicbt.com/all.txt.bz2 worked again |
21:27
๐
|
lysobit |
There are 13,000,000 torrents. If each .torrent file is 50kb on average, then the total storage required to store all torrents would be < 700mb |
21:27
๐
|
lysobit |
actually disregard that |
21:27
๐
|
lysobit |
I mean 700gb |
21:33
๐
|
Nemo_bis |
well it's not particularly useful to store torrents anyway |
21:34
๐
|
Nemo_bis |
publicbt.com gave you all you needed; but now it doesn't work |
21:43
๐
|
DFJustin |
well it's not just the torrent files, they also have uploader comments (i.e. metadata) |
21:44
๐
|
omf_ |
The metadata is interesting |
21:59
๐
|
joepie91 |
Nemo_bis, omf_, DFJustin, etc: http://pastie.org/private/cbryvcdrxpf7dod4vfla |
21:59
๐
|
joepie91 |
that will at least grab all the torrents |
21:59
๐
|
joepie91 |
or nearly all anyway |
22:01
๐
|
joepie91 |
their JSON search API is really restrictive :( |
22:01
๐
|
joepie91 |
max 1000 results per query |
22:01
๐
|
joepie91 |
I mean, I *could* write another bruteforce searcher again... |
22:02
๐
|
joepie91 |
but their forum thread suggested that they monitor search request rate |
22:02
๐
|
joepie91 |
useful: the numeric IDs for their torrents are the same as for the detail pagse |
22:02
๐
|
joepie91 |
pages * |
22:02
๐
|
* |
joepie91 feels like this would be a good Warrior project |
22:04
๐
|
DFJustin |
yeah just iterate through https://isohunt.com/torrent_details/xxxxxx/ |
22:04
๐
|
joepie91 |
DFJustin: I'm intentionally iterating through the .torrents actually |
22:04
๐
|
joepie91 |
instead of the details pages |
22:04
๐
|
joepie91 |
I feel like a static torrent serving backend would be faster |
22:04
๐
|
lysobit |
note that they have already settled in court; who knows when they're going shut the site down |
22:04
๐
|
joepie91 |
thus you can get 404d before it does any db queries |
22:05
๐
|
DFJustin |
the .torrent files are mirrored everywhere already though, the unique stuff is all the "what the hell is this" text |
22:05
๐
|
joepie91 |
you'd be downloading the .torrents anyway |
22:05
๐
|
joepie91 |
so might as well start with those, and for the non-404s then fetch the /details/ pages |
22:05
๐
|
DFJustin |
that makes sense I guess |
22:05
๐
|
joepie91 |
DFJustin: many isohunt torrents are no longer in their original location |
22:05
๐
|
joepie91 |
in my experience |
22:05
๐
|
joepie91 |
:P |
22:05
๐
|
joepie91 |
lysobit: that's why I'm optimizing for speed |
22:06
๐
|
joepie91 |
do we have any awake warrior devs? |
22:06
๐
|
lysobit |
make it multithreaded |
22:06
๐
|
joepie91 |
that are familiar with the pipeline stuff etc. |
22:06
๐
|
lysobit |
using python threads |
22:06
๐
|
joepie91 |
lysobit: meh, might as well |
22:07
๐
|
lysobit |
stick on a dedi with a 1gbit pipe; done |
22:07
๐
|
lysobit |
and a 1tb hd |
22:07
๐
|
lysobit |
thought, easier said than done :P |
22:07
๐
|
DFJustin |
fwiw even if the torrent is no longer in its original location the details page has the info hash which is all you really need |
22:08
๐
|
lysobit |
metadata would be nice to have |
22:08
๐
|
lysobit |
as infohash doesn't contain what files are in the torrent |
22:09
๐
|
lysobit |
and would be even better if you can store the name of the torrent too |
22:10
๐
|
DFJustin |
yeah but there are other projects mass-downloading torrent files for infohashes and every torrent site under the sun will have the torrent fiel as well |
22:11
๐
|
DFJustin |
so that stuff is not really at risk |
22:16
๐
|
joepie91 |
hmm this multithreaded version actually works pretty well it seems |
22:16
๐
|
joepie91 |
:P |
22:17
๐
|
joepie91 |
http://pastie.org/private/agybnuru8digavvhdagt1w |
22:19
๐
|
joepie91 |
not blocked yet |
22:19
๐
|
joepie91 |
running 5 threads |
22:19
๐
|
joepie91 |
roughly 10-15 torrents checked per second |
22:20
๐
|
joepie91 |
400 days |
22:20
๐
|
joepie91 |
not fast enough |
22:23
๐
|
joepie91 |
hrm |
22:27
๐
|
joepie91 |
We know you love isoHunt, but you shouldn't hit us this fast. You are banned for 1200 seconds. |
22:27
๐
|
joepie91 |
:( |
22:30
๐
|
joepie91 |
well at least we now know that they have rate limiting in place lol |
22:41
๐
|
joepie91 |
right, script has throttling now.. |
22:50
๐
|
joepie91 |
sooooooooo |
22:50
๐
|
joepie91 |
5 threads was apparently also too much |
22:51
๐
|
joepie91 |
http://pastie.org/private/jlaqklfwjkhznbx4bdrnrw |
22:51
๐
|
joepie91 |
feel free to change the range to a subset of the current range (newest and oldest vars) and run |
22:52
๐
|
joepie91 |
:) |
22:52
๐
|
joepie91 |
(given absence of a warrior project) |
23:38
๐
|
joepie91 |
right |
23:38
๐
|
joepie91 |
3 threads seems the max |
23:38
๐
|
joepie91 |
to not get throttled |