StevenBos.com - about (me)aning, understanding, robots

The Microsoft Kinect version 2

posted on: 6 march 2014

I LOVE BEING ON THE CUTTING EDGE of human computer interaction. The FIN, a ring based interaction device, the MYO an arm based interaction device or the EPOC a brain based one. Or a full body sensor suit like PRIOVR or the more mainstream novelties like the GLASS and RIFT.

I wish I had the time to take these wildly crazy inventions to new User Experiences. They help solve one of the biggest challenges of this digital millennia: "how to make sense of unbounded and unstructured Big Data" . How? They allow to interact with data in natural ways using gestures, speech and even thoughts. And just as interestingly, convert the body or its states to strategic data points in the process. It's called "the quantized self", a term to remember. Another such tool is the Kinect, a 3D camera. The first generation Kinect was so revolutionary - bringing the price point of > E10.000 to E200 - that one can wonder what prevented 3D camera technology from consumer introduction. The Kinect sparked a new age for gamers and interaction researchers.

But that was 2010. This year, Microsoft will release the Kinect V2 which is a huge upgrade in almost every aspect. HD resolution, better depth sensing technology (ToF vs Structured Light) and feature streams that are beyond thrilling - heart rate detection, physics models, etc. Compared to the new PS4 Eye, which uses a different tech to determine depth, it is superior in every way but one. The framerate of the Kinect remains at 30fps, while the Eye can do up to 320x192 @ 240fps. In Computer Vision research the general complexity level can be viewed as: Detection > Recognition > Tracking. With Tracking being the hardest task. So for tracking objects, faster is better. With 30fps fast motions will suffer from motion blur and thus require some sort of post-processing (a research field in itself). Ah but there still needs to be room for a V3 right ;)

To be honest up to this point I haven't played a lot with the V2. The Speech Recognition API is not out yet, but might be with the next Developer Preview SDK update. I'm only interested in the whole picture - the multimodal bells and whistles. The moment it hits my computer, I will start integrating the V2 in my RTC Framework (see other post) and show off the results here. I'll keep you posted!

Steven Bos