Search results for: Near-Field Acoustical Holography (NAH)

3 Automatic Distance Compensation for Robust Voice-based Human-Computer Interaction

Authors: Randy Gomez, Keisuke Nakamura, Kazuhiro Nakadai

Abstract:

Distant-talking voice-based HCI system suffers from performance degradation due to mismatch between the acoustic speech (runtime) and the acoustic model (training). Mismatch is caused by the change in the power of the speech signal as observed at the microphones. This change is greatly influenced by the change in distance, affecting speech dynamics inside the room before reaching the microphones. Moreover, as the speech signal is reflected, its acoustical characteristic is also altered by the room properties. In general, power mismatch due to distance is a complex problem. This paper presents a novel approach in dealing with distance-induced mismatch by intelligently sensing instantaneous voice power variation and compensating model parameters. First, the distant-talking speech signal is processed through microphone array processing, and the corresponding distance information is extracted. Distance-sensitive Gaussian Mixture Models (GMMs), pre-trained to capture both speech power and room property are used to predict the optimal distance of the speech source. Consequently, pre-computed statistic priors corresponding to the optimal distance is selected to correct the statistics of the generic model which was frozen during training. Thus, model combinatorics are post-conditioned to match the power of instantaneous speech acoustics at runtime. This results to an improved likelihood in predicting the correct speech command at farther distances. We experiment using real data recorded inside two rooms. Experimental evaluation shows voice recognition performance using our method is more robust to the change in distance compared to the conventional approach. In our experiment, under the most acoustically challenging environment (i.e., Room 2: 2.5 meters), our method achieved 24.2% improvement in recognition performance against the best-performing conventional method.

Keywords: Human Machine Interaction, Human Computer Interaction, Voice Recognition, Acoustic Model Compensation, Acoustic Speech Enhancement.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1885

2 Ultrasonic Investigation of Molecular Interaction in Binary Liquid Mixture of Polyethylene Glycol with Ethanol

Authors: S. Grace Sahaya Sheba, R. Omegala Priakumari

Abstract:

Polyethylene glycol (PEG) is a condensation polymer of ethylene oxide and water. It is soluble in water and in many organic solvents. PEG is used to make emulsifying agents, detergents, soaps, plasticizers, ointments etc. Ethanol (C₂H₅OH) also known as ethyl alcohol is a well-known organic compound and has wide applications in chemical industry as it is used as a solvent for paint, varnish, in preserving biological specimens, used as a fuel mixed with petrol etc. Though their chemical and physical properties are already studied, still because of their uses in day to day life the authors thought it is better to study some more of their physical properties like ultrasonic velocity and hence adiabatic compressibility, free length, etc. A detailed study of such properties and some excess parameters like excess adiabatic compressibility, excess free volume and few more in the liquid mixtures of these two compounds with PEG as a solute and Ethanol as a solvent at various mole fractions may throw some light on deeper understanding of molecular interaction between the solute and the solvent supported by NMR, IR etc. Hence the present research work is on ultrasonics/allied studies on these two liquid mixtures. Ultrasonic velocity (U), density (ρ) and viscosity (η) at room temperature and at different mole fraction from 0 to 0.055 of ethanol in PEG have been experimentally carried out by the authors. Acoustical parameters such as adiabatic compressibility (β), free volume (V_f), acoustic impedance (Z), internal pressure (π_i), intermolecular free length (L_f) and relaxation time (τ) were calculated from the experimental data. We have calculated excess parameters like excess adiabatic compressibility (β^E), excess internal pressure (π_i^E) free length (L_f^E) and excess acoustic impedance (Z^E) etc for these two chosen liquid mixtures. The excess compressibility is positive and maximum around a mole fraction 0.007 and excess internal pressure is negative and maximum at the same mole fraction and longer free length. The results are analyzed and it may be concluded that the molecular interactions between the solute and the solvent is not strong and it may be weak. Appropriate graphs are drawn.

Keywords: Adiabatic Compressibility, Binary mixture, Induce dipole, Polarizability, Ultrasonic.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2784

1 Mikrophonie I (1964) by Karlheinz Stockhausen - Between Idea and Auditory Image

Authors: Justyna Humięcka-Jakubowska

Abstract:

Background in music analysis: Traditionally, when we think about a composer’s sketches, the chances are that we are thinking in terms of the working out of detail, rather than the evolution of an overall concept. Since music is a “time art,” it follows that questions of a form cannot be entirely detached from considerations of time. One could say that composers tend to regard time either as a place gradually and partially intuitively filled, or they can look for a specific strategy to occupy it. It seems that the one thing that sheds light on Stockhausen’s compositional thinking is his frequent use of “form schemas,” that is often a single-page representation of the entire structure of a piece. Background in music technology: Sonic Visualiser is a program used to study a musical recording. It is an open source application for viewing, analyzing, and annotating music audio files. It contains a number of visualisation tools, which are designed with useful default parameters for musical analysis. Additionally, the Vamp plugin format of SV supports to provide analysis such as for example structural segmentation. Aims: The aim of paper is to show how SV may be used to obtain a better understanding of the specific musical work, and how the compositional strategy does impact on musical structures and musical surfaces. It is known that “traditional” music analytic methods don’t allow indicating interrelationships between musical surface (which is perceived) and underlying musical/acoustical structure. Main Contribution: Stockhausen had dealt with the most diverse musical problems by the most varied methods. A characteristic which he had never ceased to be placed at the center of his thought and works, it was the quest for a new balance founded upon an acute connection between speculation and intuition. In the case with Mikrophonie I (1964) for tam-tam and 6 players Stockhausen makes a distinction between the “connection scheme,” which indicates the ground rules underlying all versions, and the form scheme, which is associated with a particular version. The preface to the published score includes both the connection scheme, and a single instance of a “form scheme,” which is what one can hear on the CD recording. In the current study, the insight into the compositional strategy chosen by Stockhausen was been compared with auditory image, that is, with the perceived musical surface. Stockhausen’s musical work is analyzed both in terms of melodic/voice and timbre evolution. Implications: The current study shows how musical structures have determined of musical surface. The general assumption is this, that while listening to music we can extract basic kinds of musical information from musical surfaces. It is shown that interactive strategies of musical structure analysis can offer a very fruitful way of looking directly into certain structural features of music.

Keywords: Automated analysis, composer's strategy, Mikrophonie I, musical surface, Stockhausen.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1948