Canesta, Inc. is the inventor of revolutionary, low-cost electronic perception technology that enables ordinary electronic devices in consumer, security, industrial, medical, automotive, factory automation, gaming, military, and many other applications to perceive and react to objects or individuals in real time.
In Fall 2008, Canesta approached Kicker Studio to create a demonstration of their latest camera technology for the Consumer Electronics Show 2009. The prototype was to be of an entertainment center controlled by gestures alone, and powered, of course, by a Canesta camera.
Discovery
Before we began design we needed to understand the technology: its limitations, possibilities, and how it would link up with the interface. The Canesta camera offered a very specific set of benefits and limitations. Within those parameters, we established a safe zone for accurate gesture recognition. In the meantime, we worked with engineers to ensure the interface would seamlessly link into the “gestural library,” which examined data coming in from the camera and recognized it as specific gestures.
We also set about to understand the activity and context of watching TV. We recorded subjects while watching video on their TV or computer. We looked for the types of casual gestures a user would make in order to limit the number of accidental triggers cause by non-deliberate gestures. We also noted the type of commands necessary specifically for operation of video playback. We looked for similar patterns of control to reduce the size of the vocabulary for easy retention. For example: changing the volume and changing the channel.
From these investigations, we quickly established a list of metrics to measure success. We wanted to clearly “beat the remote” while creating a fun, engaging experience.
Marketing and Product Strategy
Technology wouldn’t tell the story by itself. We developed scenarios that would help illustrate the ways in which the product itself could not only redefine the way we relate to our media, but to our environments as well. We did this by understanding the ways that the product could be situated within current markets as well as how it would fit into the product landscape of the next 5 or 10 years.
Defining the Gestural Language
From a practical standpoint (because it required the most development time), our first design task was to define the gestures that would control the entertainment center. This was a combination of brainstorming (making a lot of crazy gestures) and then comparing them to three things: the technical constraints of the camera, the off-limits casual gestures we found in research, and our design principles.
Our first day of brainstorming, we came up with a list of guiding principles. One of them was that users should be able to feel comfortable doing these gestures while on a date. That is: nothing difficult, nothing embarrassing, and nothing that was too cartoon-like. You often see a lot of “Minority Report”-style gestural interfaces, but those are far too tiring, too challenging, and often far too dramatic for a task like watching videos in your living room. You’d soon be begging for your remote back.
We spent several days in a small room with a whiteboard and lots of post-it notes and hand waving, which led us to a couple distinct gesture sets that we wanted to test with an audience.
Prototyping/Testing/Iterating
With a set of research subjects, we did scenario-based prototyping, with paper and simulated screens. After watching people attempt our gestural set, we quickly added to our list of principles No emphatic gestures. We found that the more elaborate gestures made some users feel like they were “angry” at their TV. We also eliminated a number of gestures that seemed comfortable in our small room, but when put to the test seemed overly tiring.

During a rapid prototyping session we went through several iterations and were able to build on prototypes until we hit upon a successful solution that centered around circling (“Wax on, Wax off”) gestures, simple waves, and a small number of specific gestures for important actions you don’t want to do accidentally (like turn off the TV).
Interaction and Interface Design
With the absence of direct contact (as with a mouse or touchscreen) freeform gestures rely heavily on visual and audio cues to help guide the user. Once the gesture set was established, we created a unique user interface which helped enhance the mental connection between the user’s actions and the response of the interface. We focused on developing visual cues to help reinforce the types of movements that would be clear and natural. For example, our gestures relied heavily on a circular hand movements, so, rather than have items scroll top to bottom in a list, we created “dial” like lists.
Implementation and Development
Now we were ready to put all the pieces together. We worked closely with Canesta’s engineering team to ensure our interface synced with all the cues detected by the camera and pattern matched via the gestural library. Then, once the front- and back-end were cleanly married, we again did a series of user tests to ensure the proper events occurred when expected. After polishing up a few small areas, we were ready to present the demo to the public, first at CES and then at the 2009 TV of Tomorrow conference.
Physical Device
While the Canesta camera can (and will) be embedded into new televisions (as well as appliances and other consumer electronics), we could easily imagine customers who might want to supplement their existing televisions with stand-alone cameras so that they too could control their sets with gestures. We sketched and modeled dozens of different camera configurations before settling on one we liked.
This camera can perch on top of the TV and rotate to get a better view of the room. Soft white and blue LEDs indicate whether the camera is on, when it is observing a user, and when it is accepting gestural commands from the user.
Read the Canesta press release.
Learn more about what we think about gestural interfaces or how Kicker can help your next project.
ABOUT KICKER STUDIO
14 Comments
Thanx for sharing this great case study. Must reading for anyone thinking in this space. Kudos and can’t wait to hear about what else is coming! – -dave
Great case study guys. What was most challenging to figure out throughout this process?
What’s most impressive is that the way you made your physical prototypes out of cheese. It’s an underused prototyping tool.
Love those gloves, Jenn!
Definitely the challenging portion was figuring out the correct (for users and context) and technically-feasible gesture set. Gestures from 10′ away are tricky, and it is kudos to the Canesta team and the camera technology to be able to detect and recognize what they can.
This is perhaps a true test of a concept that has been in labs and the minds of interaction designers for a long time. I wonder if the market will accept it. My first guess is that it requires too much work for the average TV viewer.
How would this scale to the oncoming wave of highly interactive, service heavy TVs? It’s seems like a low bandwidth input method. If I must augment this with another input device, does it become a gimmick?
Jason
Very interesting case study.
I wonder what the relationship between the camera resolution is to the complexity of gestures. If the camera had a very high resolution, and the subject was the most optimum lighting conditions, perhaps the camera could even pick up finger gestures.
Interesting case study – however upon seeing your youtube demonstration, I’m really doubting people are going to remember all the gestures you demonstrated – did you test these in a focus group? What was the response? As a designer I would keep those gestures literally to NO MORE than 2 types: select and activate. That’s it – you can’t expect people to be memorizing a whole set of gestures like you showed.
The camera resolution is very important, as is the pattern matching algorithm. But we also didn’t design for optimum lighting conditions, even though the camera would supports even being used in the dark (see Colin’s comment below).
The number of gestures is relatively small and see my post on How Many Gestures Can Users Remember?
http://www.kickerstudio.com/blog/2009/01/how-many-gestures-can-users-remember/
The gesture set is about five gestures: waves, circles, two hand press (pause), two hands down (mute), two hands “close curtains” (off). It’s really not that many to learn.
To remark on the camera resolution and room lighting requirements: The Canesta camera is a unique technology which will operate in any lighting conditions, even a completely dark room. The camera system includes the Canesta 3D sensor chip, and an infrared light source. Since the chip is only sensitive to light emitted from the system light source, the ambient room lightning is not a factor.
The pixel resolution is a factor, however. In order to recognize small hand gestures such as individual fingers from a distance of 10′ or more, a relatively high number of pixels are needed. The current system has too few pixels for individual finger recognition, but the Canesta 3D chip is built with standard CMOS processes (like the digital camera in your cell phone or web cam), so it is very scalable.
I’ve seen this technology in labs for a long time. It’s nice to see it at this level of refinement. One thing it makes me wonder though, is if this kind of technology gets some market penetration, how would a user deal with more than one device in the same environment. Multiple sensors to gestures and possibly overlapping gestures across products could really become a challenge. Curious to see how the market develops.
WOW!!! This was very interesting and I feel it would be great. Yet I wonder how it will be for elderly people and will they be able to remember the signs? kWhat if more than one person creates a sign??
It might work well for elderly, who often have trouble with the small buttons on a remote.
The camera can only track one person at once, but anyone can gain control of the system by waving at it.
9 Trackbacks/Pingbacks
[...] read more here about the entire case study which includes GD, IxD and ID components for a holistic [...]
[...] Bove, a former Interaction-Ivrea student, sent me a link to a case study on a gestural entertainment center that she and a team at Kicker Studio developed for camera maker [...]
[...] Case Study: Gestural Entertainment Center for Canesta – In Fall 2008, Canesta approached Kicker Studio to create a demonstration of their latest camera technology for the Consumer Electronics Show 2009. The prototype was to be of an entertainment center controlled by gestures alone, and powered, of course, by a Canesta camera. [...]
[...] innovative ideas over there at Kicker (no, seriously – check out their case study on the device here) and I personally would check out Kicker employee Dan Saffer. Dan Saffer’s ideas and work in [...]
[...] Case Study: Gestural Entertainment Center for Canesta [...]
[...] We’re developing for a lot more screens, which means thinking about new conventions for interface design. Good post on Core77 calling out a fancy user interface for a gestural entertainment center created by UX studio Kicker and electronics manufacturer Canesta. Check it out. [...]
[...] разбор проекта от идеи до реализации читайте в заметке в блоге Kick it Comment (RSS) | Обратная ссылка [...]
[...] Here is the original: Case Study: Gestural Entertainment Center for Canesta [...]
[...] addition to Kicker’s own Gestural Entertainment Center for Canesta, there have been a lot of gestural interfaces that have launched recently. Here are a few that I [...]