That looks good, nice and simple!
There are some minor issues with the way you balance it currently though, which means that some of the threads will still finish a lot earlier than others. For example, in your code, the 0th and 2nd threads will both not have any prime numbers (except 0), since they're only checking even numbers. So you can just remove those threads, and have it run a lot faster (though you need to add 2 as a prime number in after).
The optimum way to split the threads (for small numbers of threads) is as follows:
1 2 [1]
2 6 [1, 5]
4 12 [1, 5, 7, 11]
6 18 [1, 5, 7, 11, 13, 17]
8 30 [1, 7, 11, 13, 17, 19, 23, 29]
12 42 [1, 5, 11, 13, 17, 19, 23, 25, 29, 31, 37, 41]
The first column is the number of threads, the second column is the increment in the loop, and the last column is the starting offset (I used a starting value instead of an ending value like you do). So for example, if you're using 4 threads, the optimum way is to check all the numbers of the form 1 + 12*i, 5 + 12*i, 7 + 12*i and 11 + 12*i (each in a different thread). Some of the numbers in the first column are missing, and that's because it's actually faster to use a smaller number of threads in that case.
Anyways, this is all a lot more complicated to implement, especially for a variable number of threads, but if you need speed it would definitely help a lot (eg. for 4 threads it should run about 3 times faster). Good luck!