Gesture recognition is a novel approach to
human-computer interaction that allows you to use your
natural body movement to interact with computers.
Because gestures are a form of human communication
that is natural and expressive, they allow you to concentrate
on the task itself, using what you already do, rather
than having to learn new ways to interact. Our goal is to
enable unmanned vehicles to recognize the aircraft
handling gestures already made by deck crews. The
aircraft handling gestures use both body posture and
hand shapes; so it is important for our system to know
both information. My research concentrates on developing
a vision-based system that recognizes body and hand
gestures from a continuous input stream. My system uses
a single stereo camera to track body motion and hand
shapes simultaneously and combines this information
together to recognize body-and-hand gestures. We use
machine learning to train the system with lots of examples
allowing the system to learn how to recognize each gesture.
There are four steps that our system takes to recognize
gestures. First, from the input image obtained from a stereo
camera, we calculate 3D images and remove the
background. The second, our system estimates 3D body
posture by fitting a skeletal body model to the input image.
We extract various visual features, including 3D point cloud,
contour lines and the history of motion. These features are
computed both from the image and the skeletal model.
Then, the two sets are features are compared allowing our
program to come up with the most probable posture.
The third [step], once we know the body posture, we know
approximately where the hands are located. We search
around each of the estimated wrist positions, compute
visual features in that region and estimate the probability
that what we see there is one of the known hand shapes
used in aircraft handling. For example: palm open, closed,
and thumb up and thumb down. As the last step, we
combine the estimated body posture and hand shape to
determine gestures. We collected twenty-four aircraft
handling gestures from twenty people, giving us four
hundred sample gestures to use to teach the system to
recognize the gestures. We use a probabilistic graphical
model called a Latent Dynamic Conditional Random Field.
This model learns the distribution of the patterns of each
gesture as well as the transition between gestures. We use
this with a sliding window to recognize gestures continously
and apply the multi-layered filtering technique we developed
to make the recognition more robust. There is still a
considerable amount of work to be done in the field of
gesture recognition. Things we continue to work on include
improving the reliability, adaptability to new gestures and
developing appropriate feedback mechanisms; for example
the system can say, "I get it" or, "I don't get it."