HUMANOID TECHNOLOGY


Our current efforts are mainly devoted to developing humanoid technologies. The concrete projects are as follows:

(1) Spoken language and dialogue understanding
(2) Recognition and learning for mixed-media information


Multi-modal Dialogue Processing

Speech conversation is a comfortable medium for human-to-human daily communication. We are constructing a spoken dialog system that offers effortless human-computer information interchange.

Diverse research projects are carried out to achieve this. Dialog coordination principles derived from human dialog observations give user-adaptability to the spoken dialog system. The system estimates the user's knowledge level (novice or expert) by checking user's responses, and takes the appropriate explanation strategy.

Incremental speech recognition, understanding, and production technologies give the spoken dialog system a real-time response. The system responds to and interrupts the user's utterances as in real-world speech communication.
Spoken dialog system DUG-1


Knowledge, Cognition, Understanding

We are studying the understanding process of humans by using protocol analysis and an eye camera (Cognitive Research with Eyemark Recorder). Our task domains include cognitive processes in keyboard typing (Cognitive Model of Typing), understanding of the game board of Go (Cognitive Study of the Game of Go), understanding the structure of a cognitive map which we use when we move around in a city (Cognitive Factors in Human Navigation), and understanding others as information sources in social context (Communication in a Society as Database).
Eyemark Recorder: EMR-NC
New Computer Interface: Eyepointer


Recognition and Learning for Mixed-media Information

Our objective is to research new recognition algorithms for mixed-media information, and to develop surveillance and information-retrieval systems. Media include video data, images, speech, and non-speech sounds. However, it is very difficult for computers to process such data due to several factors such as data, volume, noise, and media combinations. We aim to develop a general framework to address such issues.

The following topics are currently being investigated:
(1) Fast visual/audio search and retrieval
(2) recognition of written characters in a scene
(3) segregation of mixed speech data
(4) music recognition
Activesearch
Time-series Active Search Method

Media Info. Lab. | NTT CS Labs. | Events | Visitor Info.