Search results for: Vivek V. Ranade

3 Audio-Visual Co-Data Processing Pipeline

Authors: Rita Chattopadhyay, Vivek Anand Thoutam

Abstract:

Speech is the most acceptable means of communication where we can quickly exchange our feelings and thoughts. Quite often, people can communicate orally but cannot interact or work with computers or devices. It’s easy and quick to give speech commands than typing commands to computers. In the same way, it’s easy listening to audio played from a device than extract output from computers or devices. Especially with Robotics being an emerging market with applications in warehouses, the hospitality industry, consumer electronics, assistive technology, etc., speech-based human-machine interaction is emerging as a lucrative feature for robot manufacturers. Considering this factor, the objective of this paper is to design the “Audio-Visual Co-Data Processing Pipeline.” This pipeline is an integrated version of Automatic speech recognition, a Natural language model for text understanding, object detection, and text-to-speech modules. There are many Deep Learning models for each type of the modules mentioned above, but OpenVINO Model Zoo models are used because the OpenVINO toolkit covers both computer vision and non-computer vision workloads across Intel hardware and maximizes performance, and accelerates application development. A speech command is given as input that has information about target objects to be detected and start and end times to extract the required interval from the video. Speech is converted to text using the Automatic speech recognition QuartzNet model. The summary is extracted from text using a natural language model Generative Pre-Trained Transformer-3 (GPT-3). Based on the summary, essential frames from the video are extracted, and the You Only Look Once (YOLO) object detection model detects You Only Look Once (YOLO) objects on these extracted frames. Frame numbers that have target objects (specified objects in the speech command) are saved as text. Finally, this text (frame numbers) is converted to speech using text to speech model and will be played from the device. This project is developed for 80 You Only Look Once (YOLO) labels, and the user can extract frames based on only one or two target labels. This pipeline can be extended for more than two target labels easily by making appropriate changes in the object detection module. This project is developed for four different speech command formats by including sample examples in the prompt used by Generative Pre-Trained Transformer-3 (GPT-3) model. Based on user preference, one can come up with a new speech command format by including some examples of the respective format in the prompt used by the Generative Pre-Trained Transformer-3 (GPT-3) model. This pipeline can be used in many projects like human-machine interface, human-robot interaction, and surveillance through speech commands. All object detection projects can be upgraded using this pipeline so that one can give speech commands and output is played from the device.

Keywords: OpenVINO, automatic speech recognition, natural language processing, object detection, text to speech

Procedia PDF Downloads 53

2 Seismic Perimeter Surveillance System (Virtual Fence) for Threat Detection and Characterization Using Multiple ML Based Trained Models in Weighted Ensemble Voting

Authors: Vivek Mahadev, Manoj Kumar, Neelu Mathur, Brahm Dutt Pandey

Abstract:

Perimeter guarding and protection of critical installations require prompt intrusion detection and assessment to take effective countermeasures. Currently, visual and electronic surveillance are the primary methods used for perimeter guarding. These methods can be costly and complicated, requiring careful planning according to the location and terrain. Moreover, these methods often struggle to detect stealthy and camouflaged insurgents. The object of the present work is to devise a surveillance technique using seismic sensors that overcomes the limitations of existing systems. The aim is to improve intrusion detection, assessment, and characterization by utilizing seismic sensors. Most of the similar systems have only two types of intrusion detection capability viz., human or vehicle. In our work we could even categorize further to identify types of intrusion activity such as walking, running, group walking, fence jumping, tunnel digging and vehicular movements. A virtual fence of 60 meters at GCNEP, Bahadurgarh, Haryana, India, was created by installing four underground geophones at a distance of 15 meters each. The signals received from these geophones are then processed to find unique seismic signatures called features. Various feature optimization and selection methodologies, such as LightGBM, Boruta, Random Forest, Logistics, Recursive Feature Elimination, Chi-2 and Pearson Ratio were used to identify the best features for training the machine learning models. The trained models were developed using algorithms such as supervised support vector machine (SVM) classifier, kNN, Decision Tree, Logistic Regression, Naïve Bayes, and Artificial Neural Networks. These models were then used to predict the category of events, employing weighted ensemble voting to analyze and combine their results. The models were trained with 1940 training events and results were evaluated with 831 test events. It was observed that using the weighted ensemble voting increased the efficiency of predictions. In this study we successfully developed and deployed the virtual fence using geophones. Since these sensors are passive, do not radiate any energy and are installed underground, it is impossible for intruders to locate and nullify them. Their flexibility, quick and easy installation, low costs, hidden deployment and unattended surveillance make such systems especially suitable for critical installations and remote facilities with difficult terrain. This work demonstrates the potential of utilizing seismic sensors for creating better perimeter guarding and protection systems using multiple machine learning models in weighted ensemble voting. In this study the virtual fence achieved an intruder detection efficiency of over 97%.

Keywords: geophone, seismic perimeter surveillance, machine learning, weighted ensemble method

Procedia PDF Downloads 44

1 An Investigation on the Suitability of Dual Ion Beam Sputtered GMZO Thin Films: For All Sputtered Buffer-Less Solar Cells

Authors: Vivek Garg, Brajendra S. Sengar, Gaurav Siddharth, Nisheka Anadkat, Amitesh Kumar, Shailendra Kumar, Shaibal Mukherjee

Abstract:

CuInGaSe (CIGSe) is the dominant thin film solar cell technology. The band alignment of Buffer/CIGSe interface is one of the most crucial parameters for solar cell performance. In this article, the valence band offset (VBOff) and conduction band offset (CBOff) values of Cu(In0.70Ga0.30)Se/ 1 at.% Ga: Mg0.25Zn0.75O (GMZO) heterojunction, grown by dual ion beam sputtering system (DIBS), are calculated to understand the carrier transport mechanism at the heterojunction for the realization of all sputtered buffer-less solar cells. To determine the valence band offset (VBOff), ∆E_V at GMZO/CIGSe heterojunction interface, the standard method based on core-level photoemission is utilized. The value of ∆E_V can be evaluated by considering common core-level peaks. In our study, the values of (Valence band onset)VBOn, obtained by linear extrapolation method for GMZO and CIGSe films are calculated to be 2.86 and 0.76 eV. In the UPS spectra peak positions of Se 3d is observed in UPS spectra at 54.82 and 54.7 eV for CIGSe film and GMZO/CIGSe interface respectively, while the peak position of Mg 2p is observed at 50.09 and 50.12 eV for GMZO and GMZO/CIGSe interface respectively. The optical band gap of CIGSe and GMZO are obtained from absorption spectra procured from spectroscopic ellipsometry are 1.26 and 3.84 eV respectively. The calculated average values of ∆E_v and ∆E_C are estimated to be 2.37 and 0.21 eV, respectively, at room temperature. The calculated positive conduction band offset termed as a spike at the absorber junction is the required criterion for the high-efficiency solar cells for the efficient charge extraction from the junction. So we can conclude that the above study confirms GMZO thin films grown by the dual ion beam sputtering system are the suitable candidate for the CIGSe thin films based ultra-thin buffer-less solar cells. We investigated the band-offset properties at the GMZO/CIGSe heterojunction to verify the suitability of the GMZO for the realization of the buffer-less solar cells. The calculated average values of ∆E_V and ∆E_C are estimated to be 2.37 and 0.21 eV, respectively, at room temperature. The calculated positive conduction band offset termed as a spike at the absorber junction is the required criterion for the high-efficiency solar cells for the efficient charge extraction from the junction. So we can conclude that the above study confirms GMZO thin films grown by the dual ion beam sputtering system are the suitable candidate for the CIGSe thin films based ultra-thin buffer-less solar cells. Acknowledgment: We are thankful to DIBS, EDX, and XRD facility equipped at Sophisticated Instrument Centre (SIC) at IIT Indore. The authors B.S.S and A.K acknowledge CSIR and V.G acknowledge UGC, India for their fellowships. B.S.S is thankful to DST and IUSSTF for BASE Internship Award. Prof. Shaibal Mukherjee is thankful to DST and IUSSTF for BASE Fellowship and MEITY YFRF award. This work is partially supported by DAE BRNS, DST CERI, and DST-RFBR Project under India-Russia Programme of Cooperation in Science and Technology. We are thankful to Mukul Gupta for SIMS facility equipped at UGC-DAE Indore.

Keywords: CIGSe, DIBS, GMZO, solar cells, UPS

Procedia PDF Downloads 250