Lol @ Musical Mooning
It's possible. A simple implementation could use optic flow analysis to see the shifts in the scene. So if you shift you hands it will appear - up/down=volume and left/right=pitch for example. Violently waving your hand (as you do in kinect) would allow the system to sense a high speed changing area. As the arm is a length.. it can then determine a joint from that... (hence you could get two hands working).
The setup that the kinect gets you todo is probably just that - looking for the changes in the movements so it can build up a skeleton based off the flow analysis. Once it knows your relative bone sizes then it could make it easier to be more selective of the movements.
More complex analysis of skeletal position (I believe the Xbox does this step) is probably in the works by some cunning linux boffins.. queue air finger pianos..