June 26, 2001

Japanese version.

Japan Science and Technology Corporation(JST)
Core Research for Evolutional Science and Technology (CREST)
"Creating the Brain"
Research Project (1998-2002)

Task Planning Mechanism of
Speech Motor Control


     Introduction
     Research Project Plan
     Research Theme
     Project Member

      CREST worksop on Speech Motor Cotrol and Modeling

Introduction

Articulatory motions are not only physical motions of vocal organs but are also movements peculiar to human beings, which generate acoustic signals received by ears via perception as linguistic information. The aim of this research project is to construct a computational model of the motor planning of speech production according to a hierarchical task structure to clarify the speech motor control mechanism in the brain. Motor control signals sent to articulatory organs are specified according to the computational model. Furthermore, the feedback control mechanism of the proprioceptive information in dynamical interactions such as contact of vocal organs or the auditory information, will be investigated with the goal of creating a motor planning model which incorporates a control mechanism. The ultimate goal of this research is to develop a mechanical model of speech production , i.e., a speaking machine that mimics the articulatory behavior of human beings. In the future, we will construct a new speech information processing system that incorporates computational model of speech production, according to the speech information generation mechanism in the brain.




Research Project Plan

We have studied a computational model of speech production, which generates acoustic signal from articulatory motions. We have studied a computational model of speech production, which generates acoustic signal from articulatory motions. Regarding the trajectory formation of articulatory motions, we found the motor control criteria for articulatory movements when a speech system executes sequential motion tasks. A movement trajectory formation model for the configuration of a phoneme-specific vocal tract, which is regarded as a motion task of articulatory motions, and a similar model that expresses the motion task in terms of a probability have been developed. Additionally, speech production model that generates speecacoustic characteristics of vocal tract has been constructed, in which the acoustic characteristics of vocal tract are calculated from the articulatory motions by referring to the articulatory-acoustic codebook. An understanding of the articulatory motor planning processing mechanism in the brain from a computational viewpoint requires knowledge of the location of the movement goal of articulatory motions. By focusing on the hierarchical task structure that covers actions from articulatory motions to the configuration of acoustic events, a physio-dynamical model can be constructed that represents three-dimensional, aero-dynamical and acoustic phenomena of both vocal cords and vocal tract, and incorporates a dynamic motor control model that has a feedback mechanism, according to which a motor planning model can be constructed to determine motor control of speech organs. Regarding the movement goal of articulatory motions, which has been arbitrarily assumed until now, shall be located in terms of the physical quantity to be achieved during articulatory motions at the sub-task level, i.e.,the level that accounts for motion, configuration of vocal tract, and acoustics. It is also necessary to find the acceptable range for each sub-task required to achieve the movement goal from the viewpoint that transmitting linguistic information is the final goal of speech. Regarding the control mechanism of a speech dynamic system, we should clarify the motor control mechanism and construct a relevant model by carefully looking at the idea that articulatory motions with an extremely high precision, which are required to realize acoustic events, are achieved by the feedback control of the the proprioceptive system under dynamic interactions such as contact of vocal organs. Regarding the physio-dynamical model and the aero-dynamical and acoustic model, mechanical models as well as computerized simulation models, which run in real time and accounts for phenomena which are difficult to simulate, should be developed. The speaking machine should operate in an environment that can be controlled by the motor control generated using the motor planning model and the feedback control mechanism.


 



Research Theme

Current Research Circumstances

In this research on the trajectory-formation model of articulatory movements of hierarchical motion tasks, Saltzman (1989) proposed a task dynamic model that determines the movement trajectory of articulatory motions by treating the specific configuration of the vocal tract as a motion task, on the assumption that the motion task of articulatory motions is regarded as a configuration of vocal tract to be achieved by coordinating motions of vocal organs. Bailly (1996) proposed another model that determines the movement trajectory of articulatory motions by paying special attention to redundant degrees-of-freedom held by the articulatory system against formants, on the assumption that generating the formant track featuring voice spectrums is a motion task. In addition, in motion task-related research, Abbs (1984) discussed compensatory motions and muscle activity of the lip when movement of the under jaw is forced to stop during the articulatory motions for lip-closed consonants. The physiological model of articulatory organs include the tongue models of Kiritani (1976), Hirai (1995) and Wilhelms (1995), and the jaw model of Laboissiere (1996).

Research Goals

Motion task of articulatory movements

The hierarchical task structure of articulatory movements shall be clarified at each task level of movement, vocal tract configuration, aero-dynamics and acoustics, and linguistic symbols. In this context, we pay attention to the behavior of articulatory motions when their environment is changed. The compensatory effects produced at each sub-task level, depending upon the change of speech environment and the physical quantity maintained by compensatory effects, shall be clarified by varying the speech environment using an external control. Whether articulatory motions exist not only as a static movement goal value or also a dynamical pattern should also be clarified through experiments. Furthermore, from the viewpoint that the final movement goal of articulatory motions is to transmit linguistic information, the acceptable range of physical quantity at each task level that offers categorical phonemic perception shall be clarified, based on phonemic perception experiments for voices in articulatory motions various environments . In order to carry out these experiments, a three-dimensional motion measuring apparatus will be developed by improving the conventional two-dimensional apparatus; an alternate current magnetic measurement will be used to construct a new, simultaneous observation system for motions, aero-dynamics, and acoustic signals of speech. In addition, to control the speech environment, new devices will be developed, including an air-controlled, artificial palate that is capable of dynamically changing its shape, and units which enable dynamic perturbation of jaw, lips, and tongue motions through use of a magnetic force. Constructing a computational motor planning model Based on a computational model of motor planning for articulatory motions, motor control for the articulatory system to generate acoustic signals for transmitting the desired linguistic information shall be specified. This process starts with mimicking a given voices, then shifts to the problem how to determine both the movement trajectory of articulatory motions that generates the same acoustic signals and the motor control of the articulatory system that generates such motions when all of the acoustic signals during articulatory motions are given. As the level of motor control with regard to acoustic events goes from vocal tract shaping to the motions of the respective vocal organs and the muscle that moves vocal organs, there is an increase in the system's redundant degrees-of-freedom. Therefore, the relationship between acoustic events and each task level shows that a single acoustic event generally corresponds to more than one level. In this context, the restriction on indeterminacy in the inverse problem, which specifies the articulatory system for each acoustic event, shall be clarified in terms of dynamic and acoustic structures of the articulatory system and the time structure of articulatory motions.


A computational motor planning

A computational motor planning model should be developed if the transmission of linguistic information is regarded as a movement goal of the articulatory motions. In our research, the task at the acoustic level in response to the linguistic information is expressed for local domains in terms of time and task space. Regarding articulatory motions in which these motion tasks are sequentially executed, a computational model will be constructed that decides the most appropriate articulatory motions in terms of the chronological performance of acoustic events and the energy cost of the speech dynamic system for such a performance by chronologically configuring acoustic events according to the probability description of the acoustic task.


Constructing a speech motor control model

Skilled motions such as articulatory motions are considered to be principally carried out by a feed forward control, based on intra-cerebral motor planning. In order to instantaneously achieve the proper configuration of a narrow vocal tract that requires extremely high acoustic precision, articulatory movement is controlled by the feedback from intrinsic reception system to dynamic interactions at vocal organs, such as contact with the palate protecting vocal organs and vocal tract and contact of the upper and lower lips. Acoustic feedback via perception should also be employed at the monitoring level of articulatory motions. In our research, dynamic characteristics of the feedback control for articulatory motions are clarified through the two experiments below. The first applies an instantaneous perturbative external force to the speech dynamic system, and the other is feedback acoustic signals in the speaking movement modified via signal processing to the ears. Moreover, the intrinsic reception system model, which reflects dynamic interactions with external force, and the feedback control system model through perception, will be clarified in order to construct a speech dynamic system control model composed of feed forward and feedback controls.

Constructing physiological and physical models of the speech mechanism

A physiological model of the speech dynamic system and an aero-acoustic system for speech formation will be constructed in order to elucidate the intra cerebral model that performs motor planning for articulatory motions. Regarding the former, a three-dimensional dynamic model of the vocal organs that simulates control by muscle tension shall be constructed in terms of the elastic body models of tongue and lip, which are both formed with soft tissues. This model should reflect dynamic interactions such as contact of the tongue and palate and contact of the upper and lower lips, which are produced in articulatory motions. Regarding the aero-acoustic model, our research will mainly be focused on the three-dimensional acoustic model of the vocal tract and the model of the sound source generation mechanism for consonants. These speech production models will be constructed as computerized simulation models and also as mechanical models operable by mechanical or aero-dynamical control. Constructing mechanical models will help clarify the physical phenomena of articulatory system. Real-time mechanical models may also be able to account for physical phenomena that are difficult to simulate by computer. Furthermore, our plan is to construct a speaking machine that is operable in a real environment by integrating the motor planning model of articulatory motions with the motor control model.

Generating articulatory motions based on para-linguistic information

The motor planning of articulatory motions is influenced by para-linguistic information such as stressed articulation, emotional state, and the intention of the speaker. In our research, the effects of such para-linguistic information on articulatory motions will be clarified at each level from motion level to acoustic level. The internal parameters of the computational model for motor planning controlled by para-linguistic information will be clarified in order to construct a model that generates articulatory motions in response to para-linguistic information.

Future Research Plans, Creation of Intellectual Property, and Contribution to Society

The aim of this research project is to search for the intra cerebral information processing mechanism. To achieve this goal, it will use computational models for motor planning of the articulatory motions that are peculiar to human beings. We can expect to acquire such intellectual properties as the computational model of motor planning of articulatory motions as well as the computerized and mechanical models of speech dynamic system and aero-dynamical and acoustic system that are its components. The computational motor planning model that simulates voices via an articulatory system gives us a way to estimate oral motions from voices when speech is uttered. Such a method would be very helpful for training and mastering good pronunciation in foreign languages. Clarifying the motor planning of articulatory motions will also contribute considerably to elucidating the mechanism of learning a language. Furthermore, it will become possible to construct a new voice information processing system which adequately responds to human functions by reconstructing the voice information processing system, which has been statistically processed at an acoustic level, when the nature of a voice information source is highly clarified with regard to the structure of the speech formation system of human beings.



Project Member

NTT Communication Science Laboratories, Human & Infor Lab.

 Masaaki Honda

 Hiroaki Gomi

 Takemi Mochida

 Akinori Fujino

 Naoki Saijo

 Sadao Hiroya

 Takayuki Ito (CREST)

 Mayumi Ikeda (CREST-Secretary)

 Emi Zuiki Murano (CREST/University of Tokyo, Graduate School of Medicine)

The National Institute for Japanese Language

 Kikuo Maekawa

Tokyo Metropolitan Institute of Gerontology

 Itaru Tatsumi

 Takao Fushimi

 Yoichi Kureta

Advanced Telecommunications Research Institute International, ISD

 Kiyoshi Honda

 Eric V. Bateson

 Shinobu Masaki

 Dang Jianwu (ATR-I ISD./JAIST)

 Takeshi Okadome

 Masahiko Wakumoto (ATR-I ISD/Showa University)

 Soyoko Takano (CREST)

The Future University of Hakodate

 Nobuhiro Miki

 Eiichi Yoshikawa

Hokkai Gakuen University

 Kunitoshi Motoki

 Hiroki Matsuzaki

Waseda University, School of Science & Engineering

 Atsuo Takanishi

 Kazufumi Nishikawa

 Koki Hayashi

 Akihiro Imai

 Takayuki Ogawara

 Shunji Kuwae

 Kunihiro Tanahashi

Kogakuin University

 Hideaki Takanobu

Kyushu Institute of Design

 Tokihiko Kaburagi

Asahikawa Medical College

 Takashi Sakamoto

Kyoto University, Graduate School of Informatics

 Koichi Osuka