I recently went to a presentation at the local Atlanta .NET Users’ Group on the Microsoft Kinect. The device has been one of the fastest selling game devices in history (a Guinness World Record for selling 8 million units in the first 60 days). The technology behind the device is based on machine learning algorithms.
The word “hacking” is actually inaccurate since Microsoft has been encouraging people to develop drivers. Kinect has a USB port, which allows the machine to be used with a regular computer. Nevertheless, there’s something edgy and dangerous about using the word “hacking” and when people talk about Kinect “development” (a comparatively boring and commercial word) the “hacking” term is sticking. I believe the gaming community likes the word “hacking”.
I see that the slides for the presentation I saw in March 2011 are not posted online. Thus, in this blog post I will have some comments on the device and some links to video. I will not be talking about the Kinect as a consumer game device, but I will have a link where you can explore that topic on your own.
See this video first:
The machine learning algorithm behind the Kinect is the decision tree. In an article on this topic, they talk about randomized decision trees and forests. A forest is a group of decision trees, each of which may or may not be randomly generated. In this case, I was entertained that the article describing the technology merged the terms together to become the header Randomized Decision Forest. Both classification and training are required to make the system work, and in the presentation I heard, the speaker mentioned what I suspected: Kinect guesses where people are.
Some might think that a guess could never deliver a reliable system. And, even in the first review of Kinect as a gaming device, some critics have been quick to point out the lack of responsiveness. I have to measure their criticism against gaming community standards, which often involve high-end local processing, overclocking, and fast refresh rates for video. Yes, this audience is tough to please with robust responses, especially when the Kinect does not require wearing special gloves or clothing (which I would expect some are developing to improve responsiveness for future add-on gaming or commercial software).
Let me speak to the mathematics of guessing. A completely random guess has a uniform distribution over all possible values, with equal probability of picking any of those values. In the case of body location, the possible reults are on a continuous distribution (not discrete) and therefore there are an infinite number of possible locations. Technically, a human body at rest is actually moving, since the atoms move around. And by the Heisenberg Uncertainty Principle, we can know the position or velocity of these particles but not both. Realistically, gaming devices only capture approximate location. That statement holds true for every game controller that has ever existed, and even the more common business controllers: keyboards and mice (though I prefer the trackball).
Training the Kinect amounts to allowing the system to learn your shape. Training changes the random distribution of all possible points to instead be a subset of all possible points. Even a subset of infinity is still infinity. However, the more limited range of possible human motion is enough to communicate with Kinect. The current “hacking” (actually the more boring word “development” is accurate) interest will extend the use of the Kinect from beyond the gaming world into commercial applications. I liken this business model to the Ferrari automobile, a brand built on the racing community, and later licensed into other cars (sometimes similar to the racing vehicles, sometimes not) for consumer purchase. In the computer world, the gaming community is like the racing community, and is a place where people individually spend as much time and much more money than the commercial applications. Commercial applications however, have a comparatively lower cost and more basic function, but because these applications reach a much larger audience, total expected revenue is much higher than for gaming.
A popular website which is keeping this “hacking” term alive: http://kinecthacks.net/ You can see a other videos on this website on what people have done with the Microsoft Kinect, and don’t miss Kinect hack videos on Bing. In the presentation I saw, there was reference to the science-fiction movie Minority Report (starring Tom Cruise) where some scenes showed gesture-based interfaces. Those scenes were informed by latest science, and during this development presentation, we had a number of videos to inspire the audience on gesture-based technology. Microsoft Research has announced that they will be releasing a Microsoft Kinect SDK in Spring 2011 (could Microsoft announce this SDK during TechEd NA 2011, where I will be presenting on data mining?).
Kinect has definitely gone mainstream — and you can see many examples by searching on YouTube.