Inohara, Ken, Sumita, Yuka I., Ohbayashi, Naoto, Ino, Shuichi, Kurabayashi, Tohru, Ifukube, Tohru, and Taniguchi, Hisashi
Journal of Voice. Jul2010, Vol. 24 Issue 4, p503-509. 7p.
Vocal tract, Voice disorders, Tomography, Rapid prototyping, and Head & neck cancer patients
Summary: Postoperative head and neck cancer patients suffer from speech disorders, which are the result of changes in their vocal tracts. Making a solid vocal tract model and measuring its transmission characteristics will provide one of the most useful tools to resolve the problem. In binary conversion of X-ray computed tomographic (CT) images for vocal tract reconstruction, nonobjective methods have been used by many researchers. We hypothesized that a standardized vocal tract model could be reconstructed by adopting the Hounsfield number of fat tissue as a criterion for thresholding of binary conversion, because its Hounsfield number is the nearest to air in the human body. The purpose of this study was to establish a new standardized method for binary conversion in reconstructing three-dimensional (3-D) vocal tract models. CT images for postoperative diagnosis were secondarily obtained from a CT scanner. Each patient''s minimum settings of Hounsfield number for the buccal fat-pad regions were measured. Thresholds were set every 50 Hounsfield units (HU) from the bottom line of the buccal fat-pad region to −1024HU, the images were converted into binary values, and were evaluated according to the three-grade system based on anatomically defined criteria. The optimal threshold between tissue and air was determined by nonlinear multiple regression analyses. Each patient''s minimum settings of the buccal fat-pad regions were obtained. The optimal threshold was determined to be −165HU from each patient''s minimum settings of the Hounsfield number for the buccal fat-pad regions. To conclude, a method of 3-D standardized vocal tract modeling was established. [Copyright &y& Elsevier]
Access to information, Oral communication, Communication models, DIALOG (Information retrieval system), Speech research, and Information storage & retrieval systems -- Technology
This paper proposes a generic dialog modeling framework for a multi-domain dialog system to simultaneously manage goal-oriented and chat dialogs for both information access and entertainment. We developed a dialog modeling technique using an example-based approach to implement multiple applications such as car navigation, weather information, TV program guidance, and chatbot. Example-based dialog modeling (EBDM) is a simple and effective method for prototyping and deploying of various dialog systems. This paper also introduces the system architecture of multi-domain dialog systems using the EBDM framework and the domain spotting technique. In our experiments, we evaluate our system using both simulated and real users. We expect that our approach can support flexible management of multi-domain dialogs on the same framework. [Copyright &y& Elsevier]
Journal of Voice. May2017, Vol. 31 Issue 3, p389.e1-389.e8. 1p.
Summary Objective To determine the impact of jitter and shimmer on the degree of naturalness perception of synthesized vowels produced by acoustical simulation with glottal pulses (GP) and with solid model of the vocal tract (SMVT). Study Design Prospective study. Methods Synthesized vowels were produced in three steps: 1. Eighty GP were developed (20 with jitter, 20 with shimmer, 20 with jitter+shimmer, 20 without perturbation); 2. A SMVT was produced based on magnetic resonance imaging (MRI) from a woman during phonation-/ε/ and using rapid prototyping technology; 3. Acoustic simulations were performed to obtain eighty synthesized vowels-/ε /. Two experiments were performed. First Experiment : three judges rated 120 vowels (20 humans+80 synthesized+20% repetition) as “human” or “synthesized”. Second Experiment : twenty PowerPoint slide sequences were created. Each slide had 4 synthesized vowels produced with the four perturbation condition. Evaluators were asked to rate the vowels from the most natural to the most artificial. Results First Experiment : all the human vowels were classified as human; 27 out of eighty synthesized vowels were rated as human, 15 of those were produced with jitter+shimmer, 10 with jitter, 2 without perturbation and none with shimmer. Second Experiment : Vowels produced with jitter+shimmer were considered as the most natural. Vowels with shimmer and without perturbation were considered as the most artificial. Conclusions The association of jitter and shimmer increased the degree of naturalness of synthesized vowels. Acoustic simulations performed with GP and using SMVT demonstrated a possible method to test the effect of the perturbation measurements on synthesized voices. [ABSTRACT FROM AUTHOR]