Search results for: Direct voice input
2300 Recognition by Online Modeling – a New Approach of Recognizing Voice Signals in Linear Time
Authors: Jyh-Da Wei, Hsin-Chen Tsai
Abstract:
This work presents a novel means of extracting fixedlength parameters from voice signals, such that words can be recognized in linear time. The power and the zero crossing rate are first calculated segment by segment from a voice signal; by doing so, two feature sequences are generated. We then construct an FIR system across these two sequences. The parameters of this FIR system, used as the input of a multilayer proceptron recognizer, can be derived by recursive LSE (least-square estimation), implying that the complexity of overall process is linear to the signal size. In the second part of this work, we introduce a weighting factor λ to emphasize recent input; therefore, we can further recognize continuous speech signals. Experiments employ the voice signals of numbers, from zero to nine, spoken in Mandarin Chinese. The proposed method is verified to recognize voice signals efficiently and accurately.Keywords: Speech Recognition, FIR system, Recursive LSE, Multilayer Perceptron
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 14172299 Trusting Smart Speakers: Analysing the Different Levels of Trust between Technologies
Authors: Alec Wells, Aminu Bello Usman, Justin McKeown
Abstract:
The growing usage of smart speakers raises many privacy and trust concerns compared to other technologies such as smart phones and computers. In this study, a proxy measure of trust is used to gauge users’ opinions on three different technologies based on an empirical study, and to understand which technology most people are most likely to trust. The collected data were analysed using the Kruskal-Wallis H test to determine the statistical differences between the users’ trust level of the three technologies: smart speaker, computer and smart phone. The findings of the study revealed that despite the wide acceptance, ease of use and reputation of smart speakers, people find it difficult to trust smart speakers with their sensitive information via the Direct Voice Input (DVI) and would prefer to use a keyboard or touchscreen offered by computers and smart phones. Findings from this study can inform future work on users’ trust in technology based on perceived ease of use, reputation, perceived credibility and risk of using technologies via DVI.
Keywords: Direct voice input, risk, security, technology and trust.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 5912298 Speaker Recognition Using LIRA Neural Networks
Authors: Nestor A. Garcia Fragoso, Tetyana Baydyk, Ernst Kussul
Abstract:
This article contains information from our investigation in the field of voice recognition. For this purpose, we created a voice database that contains different phrases in two languages, English and Spanish, for men and women. As a classifier, the LIRA (Limited Receptive Area) grayscale neural classifier was selected. The LIRA grayscale neural classifier was developed for image recognition tasks and demonstrated good results. Therefore, we decided to develop a recognition system using this classifier for voice recognition. From a specific set of speakers, we can recognize the speaker’s voice. For this purpose, the system uses spectrograms of the voice signals as input to the system, extracts the characteristics and identifies the speaker. The results are described and analyzed in this article. The classifier can be used for speaker identification in security system or smart buildings for different types of intelligent devices.
Keywords: Extreme learning, LIRA neural classifier, speaker identification, voice recognition.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 7642297 Speech Activated Automation
Authors: Rui Antunes
Abstract:
This article presents a simple way to perform programmed voice commands for the interface with commercial Digital and Analogue Input/Output PCI cards, used in Robotics and Automation applications. Robots and Automation equipment can "listen" to voice commands and perform several different tasks, approaching to the human behavior, and improving the human- machine interfaces for the Automation Industry. Since most PCI Digital and Analogue Input/Output cards are sold with several DLLs included (for use with different programming languages), it is possible to add speech recognition capability, using a standard speech recognition engine, compatible with the programming languages used. It was created in this work a Visual Basic 6 (the world's most popular language) application, that listens to several voice commands, and is capable to communicate directly with several standard 128 Digital I/O PCI Cards, used to control complete Automation Systems, with up to (number of boards used) x 128 Sensors and/or Actuators.
Keywords: Speech Recognition, Automation, Robotics.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 18352296 The Effect of the Hemispheres of the Brain and the Tone of Voice on Persuasion
Authors: Rica Jell de Laza, Jose Alberto Fernandez, Andrea Marie Mendoza, Qristin Jeuel Regalado
Abstract:
This study investigates whether participants experience different levels of persuasion depending on the hemisphere of the brain and the tone of voice. The experiment was performed on 96 volunteer undergraduate students taking an introductory course in psychology. The participants took part in a 2 x 3 (Hemisphere: left, right x Tone of Voice: positive, neutral, negative) Mixed Factorial Design to measure how much a person was persuaded. Results showed that the hemisphere of the brain and the tone of voice used did not significantly affect the results individually. Furthermore, there was no interaction effect. Therefore, the hemispheres of the brain and the tone of voice employed play insignificant roles in persuading a person.
Keywords: Dichotic listening, brain hemisphere, tone of voice, persuasion.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 14122295 Transformation of Vocal Characteristics: A Review of Literature
Authors: Dong-Yan Huang, Ee Ping Ong, Susanto Rahardja, Minghui Dong, Haizhou Li
Abstract:
The transformation of vocal characteristics aims at modifying voice such that the intelligibility of aphonic voice is increased or the voice characteristics of a speaker (source speaker) to be perceived as if another speaker (target speaker) had uttered it. In this paper, the current state-of-the-art voice characteristics transformation methodology is reviewed. Special emphasis is placed on voice transformation methodology and issues for improving the transformed speech quality in intelligibility and naturalness are discussed. In particular, it is suggested to use the modulation theory of speech as a base for research on high quality voice transformation. This approach allows one to separate linguistic, expressive, organic and perspective information of speech, based on an analysis of how they are fused when speech is produced. Therefore, this theory provides the fundamentals not only for manipulating non-linguistic, extra-/paralinguistic and intra-linguistic variables for voice transformation, but also for paving the way for easily transposing the existing voice transformation methods to emotion-related voice quality transformation and speaking style transformation. From the perspectives of human speech production and perception, the popular voice transformation techniques are described and classified them based on the underlying principles either from the speech production or perception mechanisms or from both. In addition, the advantages and limitations of voice transformation techniques and the experimental manipulation of vocal cues are discussed through examples from past and present research. Finally, a conclusion and road map are pointed out for more natural voice transformation algorithms in the future.Keywords: Voice transformation, Voice Quality, Emotion, Individuality, Speaking Style, Speech Production, Speech Perception.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 20432294 The Functions of the Student Voice and Student-Centered Teaching Practices in Classroom-Based Music Education
Authors: Sofia Douklia
Abstract:
The present context paper aims to present the important role of ‘student voice’ and the music teacher in the classroom, which contributes to more student-centered music education. The aim is to focus on the functions of the student voice through the music spectrum, which has been born in the music classroom, and the teacher’s methodologies and techniques used in the music classroom. The music curriculum, the principles of student-centered music education, and the role of students and teachers as music ambassadors have been considered the major music parameters of student voice. The student- voice is a worth-mentioning aspect of a student-centered education, and all teachers should consider and promote its existence in their classroom.
Keywords: Student’s voice, student-centered education, music ambassadors, music teachers.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2102293 The Performance of an 802.11g/Wi-Fi Network Whilst Streaming Voice Content
Authors: P. O. Umenne, Odhiambo Marcel O.
Abstract:
A simple network model is developed in OPNET to study the performance of the Wi-Fi protocol. The model is simulated in OPNET and performance factors such as load, throughput and delay are analysed from the model. Four applications such as oracle, http, ftp and voice are applied over the Wireless LAN network to determine the throughput. The voice application utilises a considerable amount of bandwidth of up to 5Mbps, as a result the 802.11g standard of the Wi-Fi protocol was chosen which can support a data rate of up to 54Mbps. Results indicate that when the load in the Wi-Fi network is increased the queuing delay on the point-to-point links in the Wi-Fi network significantly reduces until it is comparable to that of WiMAX. In conclusion, the queuing delay of the Wi-Fi protocol for the network model simulated was about 0.00001secs comparable to WiMAX network values.Keywords: WLAN-Wireless Local Area Network, MIMO-Multiple Input Multiple Output, Queuing delay, Throughput, AP-Access Point, IP-Internet protocol, TOS-Type of Service.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 21312292 Automatic Voice Classification System Based on Traditional Korean Medicine
Authors: Jaehwan Kang, Haejung Lee
Abstract:
This paper introduces an automatic voice classification system for the diagnosis of individual constitution based on Sasang Constitutional Medicine (SCM) in Traditional Korean Medicine (TKM). For the developing of this algorithm, we used the voices of 309 female speakers and extracted a total of 134 speech features from the voice data consisting of 5 sustained vowels and one sentence. The classification system, based on a rule-based algorithm that is derived from a non parametric statistical method, presents 3 types of decisions: reserved, positive and negative decisions. In conclusion, 71.5% of the voice data were diagnosed by this system, of which 47.7% were correct positive decisions and 69.7% were correct negative decisions.Keywords: Voice Classifier, Sasang Constitution Medicine, Traditional Korean Medicine, SCM, TKM.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 13892291 Minimum Data of a Speech Signal as Special Indicators of Identification in Phonoscopy
Authors: Nazaket Gazieva
Abstract:
Voice biometric data associated with physiological, psychological and other factors are widely used in forensic phonoscopy. There are various methods for identifying and verifying a person by voice. This article explores the minimum speech signal data as individual parameters of a speech signal. Monozygotic twins are believed to be genetically identical. Using the minimum data of the speech signal, we came to the conclusion that the voice imprint of monozygotic twins is individual. According to the conclusion of the experiment, we can conclude that the minimum indicators of the speech signal are more stable and reliable for phonoscopic examinations.
Keywords: Biometric voice prints, fundamental frequency, phonogram, speech signal, temporal characteristics.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 5742290 A Survey on Voice over IP over Wireless LANs
Authors: Haniyeh Kazemitabar, Sameha Ahmed, Kashif Nisar, Abas B Said, Halabi B Hasbullah
Abstract:
Voice over Internet Protocol (VoIP) is a form of voice communication that uses audio data to transmit voice signals to the end user. VoIP is one of the most important technologies in the World of communication. Around, 20 years of research on VoIP, some problems of VoIP are still remaining. During the past decade and with growing of wireless technologies, we have seen that many papers turn their concentration from Wired-LAN to Wireless-LAN. VoIP over Wireless LAN (WLAN) faces many challenges due to the loose nature of wireless network. Issues like providing Quality of Service (QoS) at a good level, dedicating capacity for calls and having secure calls is more difficult rather than wired LAN. Therefore VoIP over WLAN (VoWLAN) remains a challenging research topic. In this paper we consolidate and address major VoWLAN issues. This research is helpful for those researchers wants to do research in Voice over IP technology over WLAN network.Keywords: Capacity, QoS, Security, VoIP Issues, WLAN.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 22452289 VoIP Source Model based on the Hyperexponential Distribution
Authors: Arkadiusz Biernacki
Abstract:
In this paper we present a statistical analysis of Voice over IP (VoIP) packet streams produced by the G.711 voice coder with voice activity detection (VAD). During telephone conversation, depending whether the interlocutor speaks (ON) or remains silent (OFF), packets are produced or not by a voice coder. As index of dispersion for both ON and OFF times distribution was greater than one, we used hyperexponential distribution for approximation of streams duration. For each stage of the hyperexponential distribution, we tested goodness of our fits using graphical methods, we calculated estimation errors, and performed Kolmogorov-Smirnov test. Obtained results showed that the precise VoIP source model can be based on the five-state Markov process.Keywords: VoIP source modelling, distribution approximation, hyperexponential distribution.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 17102288 Comparing Emotion Recognition from Voice and Facial Data Using Time Invariant Features
Authors: Vesna Kirandziska, Nevena Ackovska, Ana Madevska Bogdanova
Abstract:
The problem of emotion recognition is a challenging problem. It is still an open problem from the aspect of both intelligent systems and psychology. In this paper, both voice features and facial features are used for building an emotion recognition system. A Support Vector Machine classifiers are built by using raw data from video recordings. In this paper, the results obtained for the emotion recognition are given, and a discussion about the validity and the expressiveness of different emotions is presented. A comparison between the classifiers build from facial data only, voice data only and from the combination of both data is made here. The need for a better combination of the information from facial expression and voice data is argued.
Keywords: Emotion recognition, facial recognition, signal processing, machine learning.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 20182287 Non-Isolated Direct AC-DC Converter Design with BCM-PFC Circuit
Authors: Y. Kobori, L. Xing, H. Gao, N.Onozawa, S. Wu, S. N. Mohyar, Z. Nosker, H. Kobayashi, N. Takai, K. Niitsu
Abstract:
This paper proposes two types of non-isolated direct AC-DC converters. First, it shows a buck-boost converter with an H-bridge, which requires few components (three switches, two diodes, one inductor and one capacitor) to convert AC input to DC output directly. This circuit can handle a wide range of output voltage. Second, a direct AC-DC buck converter is proposed for lower output voltage applications. This circuit is analyzed with output voltage of 12V. We describe circuit topologies, operation principles and simulation results for both circuits.Keywords: AC-DC converter, Buck-boost converter, Buck converter, PFC, BCM PFC circuit.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 47922286 Secure peerTalk Using PEERT System
Authors: Nebu Tom John, N. Dhinakaran
Abstract:
Multiparty voice over IP (MVoIP) systems allows a group of people to freely communicate each other via the internet, which have many applications such as online gaming, teleconferencing, online stock trading etc. Peertalk is a peer to peer multiparty voice over IP system (MVoIP) which is more feasible than existing approaches such as p2p overlay multicast and coupled distributed processing. Since the stream mixing and distribution are done by the peers, it is vulnerable to major security threats like nodes misbehavior, eavesdropping, Sybil attacks, Denial of Service (DoS), call tampering, Man in the Middle attacks etc. To thwart the security threats, a security framework called PEERTS (PEEred Reputed Trustworthy System for peertalk) is implemented so that efficient and secure communication can be carried out between peers.
Keywords: Key management system, peer-to-peer voice streaming, reputed trust management system, voice-over-IP.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 18822285 Search Engine Module in Voice Recognition Browser to Facilitate the Visually Impaired in Virtual Learning (MGSYS VISI-VL)
Authors: Nurulisma Ismail, Halimah Badioze Zaman
Abstract:
Nowadays, web-based technologies influence in people-s daily life such as in education, business and others. Therefore, many web developers are too eager to develop their web applications with fully animation graphics and forgetting its accessibility to its users. Their purpose is to make their web applications look impressive. Thus, this paper would highlight on the usability and accessibility of a voice recognition browser as a tool to facilitate the visually impaired and blind learners in accessing virtual learning environment. More specifically, the objectives of the study are (i) to explore the challenges faced by the visually impaired learners in accessing virtual learning environment (ii) to determine the suitable guidelines for developing a voice recognition browser that is accessible to the visually impaired. Furthermore, this study was prepared based on an observation conducted with the Malaysian visually impaired learners. Finally, the result of this study would underline on the development of an accessible voice recognition browser for the visually impaired.Keywords: Accessibility, Usability, Virtual Learning, Visually Impaired, Voice Recognition.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 20402284 Computationally Efficient Signal Quality Improvement Method for VoIP System
Authors: H. P. Singh, S. Singh
Abstract:
The voice signal in Voice over Internet protocol (VoIP) system is processed through the best effort policy based IP network, which leads to the network degradations including delay, packet loss jitter. The work in this paper presents the implementation of finite impulse response (FIR) filter for voice quality improvement in the VoIP system through distributed arithmetic (DA) algorithm. The VoIP simulations are conducted with AMR-NB 6.70 kbps and G.729a speech coders at different packet loss rates and the performance of the enhanced VoIP signal is evaluated using the perceptual evaluation of speech quality (PESQ) measurement for narrowband signal. The results show reduction in the computational complexity in the system and significant improvement in the quality of the VoIP voice signal.
Keywords: VoIP, Signal Quality, Distributed Arithmetic, Packet Loss, Speech Coder.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 18302283 Analysis of Vocal Fold Vibrations from High-Speed Digital Images Based On Dynamic Time Warping
Authors: A. I. A. Rahman, Sh-Hussain Salleh, K. Ahmad, K. Anuar
Abstract:
Analysis of vocal fold vibration is essential for understanding the mechanism of voice production and for improving clinical assessment of voice disorders. This paper presents a Dynamic Time Warping (DTW) based approach to analyze and objectively classify vocal fold vibration patterns. The proposed technique was designed and implemented on a Glottal Area Waveform (GAW) extracted from high-speed laryngeal images by delineating the glottal edges for each image frame. Feature extraction from the GAW was performed using Linear Predictive Coding (LPC). Several types of voice reference templates from simulations of clear, breathy, fry, pressed and hyperfunctional voice productions were used. The patterns of the reference templates were first verified using the analytical signal generated through Hilbert transformation of the GAW. Samples from normal speakers’ voice recordings were then used to evaluate and test the effectiveness of this approach. The classification of the voice patterns using the technique of LPC and DTW gave the accuracy of 81%.
Keywords: Dynamic Time Warping, Glottal Area Waveform, Linear Predictive Coding, High-Speed Laryngeal Images, Hilbert Transform.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 23342282 An Observer-Based Direct Adaptive Fuzzy Sliding Control with Adjustable Membership Functions
Authors: Alireza Gholami, Amir H. D. Markazi
Abstract:
In this paper, an observer-based direct adaptive fuzzy sliding mode (OAFSM) algorithm is proposed. In the proposed algorithm, the zero-input dynamics of the plant could be unknown. The input connection matrix is used to combine the sliding surfaces of individual subsystems, and an adaptive fuzzy algorithm is used to estimate an equivalent sliding mode control input directly. The fuzzy membership functions, which were determined by time consuming try and error processes in previous works, are adjusted by adaptive algorithms. The other advantage of the proposed controller is that the input gain matrix is not limited to be diagonal, i.e. the plant could be over/under actuated provided that controllability and observability are preserved. An observer is constructed to directly estimate the state tracking error, and the nonlinear part of the observer is constructed by an adaptive fuzzy algorithm. The main advantage of the proposed observer is that, the measured outputs is not limited to the first entry of a canonical-form state vector. The closed-loop stability of the proposed method is proved using a Lyapunov-based approach. The proposed method is applied numerically on a multi-link robot manipulator, which verifies the performance of the closed-loop control. Moreover, the performance of the proposed algorithm is compared with some conventional control algorithms.
Keywords: Adaptive algorithm, fuzzy systems, membership functions, observer.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 7792281 Voice Command Recognition System Based on MFCC and VQ Algorithms
Authors: Mahdi Shaneh, Azizollah Taheri
Abstract:
The goal of this project is to design a system to recognition voice commands. Most of voice recognition systems contain two main modules as follow “feature extraction" and “feature matching". In this project, MFCC algorithm is used to simulate feature extraction module. Using this algorithm, the cepstral coefficients are calculated on mel frequency scale. VQ (vector quantization) method will be used for reduction of amount of data to decrease computation time. In the feature matching stage Euclidean distance is applied as similarity criterion. Because of high accuracy of used algorithms, the accuracy of this voice command system is high. Using these algorithms, by at least 5 times repetition for each command, in a single training session, and then twice in each testing session zero error rate in recognition of commands is achieved.Keywords: MFCC, Vector quantization, Vocal tract, Voicecommand.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 31572280 A Classification Scheme for Game Input and Output
Authors: P. Prema, B. Ramadoss
Abstract:
Computer game industry has experienced exponential growth in recent years. A game is a recreational activity involving one or more players. Game input is information such as data, commands, etc., which is passed to the game system at run time from an external source. Conversely, game outputs are information which are generated by the game system and passed to an external target, but which is not used internally by the game. This paper identifies a new classification scheme for game input and output, which is based on player-s input and output. Using this, relationship table for game input classifier and output classifier is developed.Keywords: Game Classification, Game Input, Game Output, Game Testing.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 19832279 Vocal Training and Practice Methods: A Glimpse on the South Indian Carnatic Music
Authors: Raghavi Janaswamy, Saraswathi K. Vasudev
Abstract:
Music is one of the supreme arts of expressions, next to the speech itself. Its evolution over centuries has paved the way with a variety of training protocols and performing methods. Indian classical music is one of the most elaborate and refined systems with immense emphasis on the voice culture related to range, breath control, quality of the tone, flexibility and diction. Several exercises namely saraliswaram, jantaswaram, dhatuswaram, upper stayi swaram, alamkaras and varnams lay the required foundation to gain the voice culture and deeper understanding on the voice development and further on to the intricacies of the raga system. This article narrates a few of the Carnatic music training methods with an emphasis on the advanced practice methods for articulating the vocal skills, continuity in the voice, ability to produce gamakams, command in the multiple speeds of rendering with reasonable volume. The creativity on these exercises and their impact on the voice production are discussed. The articulation of the outlined conscious practice methods and vocal exercises bestow the optimum use of the natural human vocal system to not only enhance the signing quality but also to gain health benefits.Keywords: Carnatic music, Saraliswaram, Varnam, Vocal training.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 7852278 Independent Encryption Technique for Mobile Voice Calls
Authors: Nael Hirzalla
Abstract:
The legality of some countries or agencies’ acts to spy on personal phone calls of the public became a hot topic to many social groups’ talks. It is believed that this act is considered an invasion to someone’s privacy. Such act may be justified if it is singling out specific cases but to spy without limits is very unacceptable. This paper discusses the needs for not only a simple and light weight technique to secure mobile voice calls but also a technique that is independent from any encryption standard or library. It then presents and tests one encrypting algorithm that is based of Frequency scrambling technique to show fair and delay-free process that can be used to protect phone calls from such spying acts.Keywords: Frequency Scrambling, Mobile Applications, Real- Time Voice Encryption, Spying on Calls.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 25572277 A Security Model of Voice Eavesdropping Protection over Digital Networks
Authors: Supachai Tangwongsan, Sathaporn Kassuvan
Abstract:
The purpose of this research is to develop a security model for voice eavesdropping protection over digital networks. The proposed model provides an encryption scheme and a personal secret key exchange between communicating parties, a so-called voice data transformation system, resulting in a real-privacy conversation. The operation of this system comprises two main steps as follows: The first one is the personal secret key exchange for using the keys in the data encryption process during conversation. The key owner could freely make his/her choice in key selection, so it is recommended that one should exchange a different key for a different conversational party, and record the key for each case into the memory provided in the client device. The next step is to set and record another personal option of encryption, either taking all frames or just partial frames, so-called the figure of 1:M. Using different personal secret keys and different sets of 1:M to different parties without the intervention of the service operator, would result in posing quite a big problem for any eavesdroppers who attempt to discover the key used during the conversation, especially in a short period of time. Thus, it is quite safe and effective to protect the case of voice eavesdropping. The results of the implementation indicate that the system can perform its function accurately as designed. In this regard, the proposed system is suitable for effective use in voice eavesdropping protection over digital networks, without any requirements to change presently existing network systems, mobile phone network and VoIP, for instance.
Keywords: Computer Security, Encryption, Key Exchange, Security Model, Voice Eavesdropping.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 15812276 High-Individuality Voice Conversion Based on Concatenative Speech Synthesis
Authors: Kei Fujii, Jun Okawa, Kaori Suigetsu
Abstract:
Concatenative speech synthesis is a method that can make speech sound which has naturalness and high-individuality of a speaker by introducing a large speech corpus. Based on this method, in this paper, we propose a voice conversion method whose conversion speech has high-individuality and naturalness. The authors also have two subjective evaluation experiments for evaluating individuality and sound quality of conversion speech. From the results, following three facts have be confirmed: (a) the proposal method can convert the individuality of speakers well, (b) employing the framework of unit selection (especially join cost) of concatenative speech synthesis into conventional voice conversion improves the sound quality of conversion speech, and (c) the proposal method is robust against the difference of genders between a source speaker and a target speaker.Keywords: concatenative speech synthesis, join cost, speaker individuality, unit selection, voice conversion
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 19392275 Automotive 3-Microphone Noise Canceller in a Frequently Moving Noise Source Environment
Authors: Z. Qi, T. J. Moir
Abstract:
A combined three-microphone voice activity detector (VAD) and noise-canceling system is studied to enhance speech recognition in an automobile environment. A previous experiment clearly shows the ability of the composite system to cancel a single noise source outside of a defined zone. This paper investigates the performance of the composite system when there are frequently moving noise sources (noise sources are coming from different locations but are not always presented at the same time) e.g. there is other passenger speech or speech from a radio when a desired speech is presented. To work in a frequently moving noise sources environment, whilst a three-microphone voice activity detector (VAD) detects voice from a “VAD valid zone", the 3-microphone noise canceller uses a “noise canceller valid zone" defined in freespace around the users head. Therefore, a desired voice should be in the intersection of the noise canceller valid zone and VAD valid zone. Thus all noise is suppressed outside this intersection of area. Experiments are shown for a real environment e.g. all results were recorded in a car by omni-directional electret condenser microphones.
Keywords: Signal processing, voice activity detection, noise canceller, microphone array beam forming.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 16112274 Voice Driven Applications in Non-stationary and Chaotic Environment
Authors: C. Kwan, X. Li, D. Lao, Y. Deng, Z. Ren, B. Raj, R. Singh, R. Stern
Abstract:
Automated operations based on voice commands will become more and more important in many applications, including robotics, maintenance operations, etc. However, voice command recognition rates drop quite a lot under non-stationary and chaotic noise environments. In this paper, we tried to significantly improve the speech recognition rates under non-stationary noise environments. First, 298 Navy acronyms have been selected for automatic speech recognition. Data sets were collected under 4 types of noisy environments: factory, buccaneer jet, babble noise in a canteen, and destroyer. Within each noisy environment, 4 levels (5 dB, 15 dB, 25 dB, and clean) of Signal-to-Noise Ratio (SNR) were introduced to corrupt the speech. Second, a new algorithm to estimate speech or no speech regions has been developed, implemented, and evaluated. Third, extensive simulations were carried out. It was found that the combination of the new algorithm, the proper selection of language model and a customized training of the speech recognizer based on clean speech yielded very high recognition rates, which are between 80% and 90% for the four different noisy conditions. Fourth, extensive comparative studies have also been carried out.
Keywords: Non-stationary, speech recognition, voice commands.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 15332273 Through Biometric Card in Romania: Person Identification by Face, Fingerprint and Voice Recognition
Authors: Hariton N. Costin, Iulian Ciocoiu, Tudor Barbu, Cristian Rotariu
Abstract:
In this paper three different approaches for person verification and identification, i.e. by means of fingerprints, face and voice recognition, are studied. Face recognition uses parts-based representation methods and a manifold learning approach. The assessment criterion is recognition accuracy. The techniques under investigation are: a) Local Non-negative Matrix Factorization (LNMF); b) Independent Components Analysis (ICA); c) NMF with sparse constraints (NMFsc); d) Locality Preserving Projections (Laplacianfaces). Fingerprint detection was approached by classical minutiae (small graphical patterns) matching through image segmentation by using a structural approach and a neural network as decision block. As to voice / speaker recognition, melodic cepstral and delta delta mel cepstral analysis were used as main methods, in order to construct a supervised speaker-dependent voice recognition system. The final decision (e.g. “accept-reject" for a verification task) is taken by using a majority voting technique applied to the three biometrics. The preliminary results, obtained for medium databases of fingerprints, faces and voice recordings, indicate the feasibility of our study and an overall recognition precision (about 92%) permitting the utilization of our system for a future complex biometric card.Keywords: Biometry, image processing, pattern recognition, speech analysis.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 19442272 Measurement of CES Production Functions Considering Energy as an Input
Authors: Donglan Zha, Jiansong Si
Abstract:
Because of its flexibility, CES attracts much interest in economic growth and programming models, and the macroeconomics or micro-macro models. This paper focuses on the development, estimating methods of CES production function considering energy as an input. We leave for future research work of relaxing the assumption of constant returns to scale, the introduction of potential input factors, and the generalization method of the optimal nested form of multi-factor production functions.
Keywords: Bias of technical change, CES production function, elasticity of substitution, energy input.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 9602271 Symbolic Analysis of Input Impedance of CMOS Floating Active Inductors with Application in Fully Differential Bandpass Amplifier
Authors: Kittipong Tripetch
Abstract:
This paper proposes a study of input impedance of 2 types of CMOS active inductors. It derives 2 input impedance formulas. The first formula is the input impedance of the grounded active inductor. The second formula is the input impedance of the floating active inductor. After that, these formulas can be used to simulate magnitude and phase response of input impedance as a function of current consumption with MATLAB. Common mode rejection ratio (CMRR) of the fully differential bandpass amplifier is derived based on superposition principle. CMRR as a function of input frequency is plotted as a function of current consumption.
Keywords: Grounded active inductor, floating active inductor, Fully differential bandpass amplifier.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1688