How are you calculating using the GPU, via CUDA or something else? (also wouldn't that be classed as an external lib?
)
Indeed. Given the speeds that people have this running at, it's unlikely to be faster (GPU needs far more maths to make it viable). It's the sheer stupidity of how parallel you can make it.
For example, you can kick start the prime number processing whilst doing other stages in parallel. It also allows the numbers to be built up and then summed for the later stage. Then each of the N stream processors calculates a prime number (note this is completely sub-optimal as the GPUs stream processors are joined at the group's program counter so looping etc is not good but I have just thought of another way todo it with ping-ponging that may go faster).
(note there was no rule about timings only being per stage originally - scope creep!)
Yes it wouldn't be eligible for submission as it's OpenCL but I've given up on submitting something within the rules.. just something for the doing something different because you can
Oh.. it makes the machine freeze completely until it's finished
I could use, if I had apple servers, grid net too with the mac mini and the mbp running.. but at least OpenCL is available on linux.