Search results for: data processing
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 27207

Search results for: data processing

26757 Analysis of Big Data

Authors: Sandeep Sharma, Sarabjit Singh

Abstract:

As per the user demand and growth trends of large free data the storage solutions are now becoming more challenge-able to protect, store and to retrieve data. The days are not so far when the storage companies and organizations are start saying 'no' to store our valuable data or they will start charging a huge amount for its storage and protection. On the other hand as per the environmental conditions it becomes challenge-able to maintain and establish new data warehouses and data centers to protect global warming threats. A challenge of small data is over now, the challenges are big that how to manage the exponential growth of data. In this paper we have analyzed the growth trend of big data and its future implications. We have also focused on the impact of the unstructured data on various concerns and we have also suggested some possible remedies to streamline big data.

Keywords: big data, unstructured data, volume, variety, velocity

Procedia PDF Downloads 542
26756 Mobile Augmented Reality for Collaboration in Operation

Authors: Chong-Yang Qiao

Abstract:

Mobile augmented reality (MAR) tracking targets from the surroundings and aids operators for interactive data and procedures visualization, potential equipment and system understandably. Operators remotely communicate and coordinate with each other for the continuous tasks, information and data exchange between control room and work-site. In the routine work, distributed control system (DCS) monitoring and work-site manipulation require operators interact in real-time manners. The critical question is the improvement of user experience in cooperative works through applying Augmented Reality in the traditional industrial field. The purpose of this exploratory study is to find the cognitive model for the multiple task performance by MAR. In particular, the focus will be on the comparison between different tasks and environment factors which influence information processing. Three experiments use interface and interaction design, the content of start-up, maintenance and stop embedded in the mobile application. With the evaluation criteria of time demands and human errors, and analysis of the mental process and the behavior action during the multiple tasks, heuristic evaluation was used to find the operators performance with different situation factors, and record the information processing in recognition, interpretation, judgment and reasoning. The research will find the functional properties of MAR and constrain the development of the cognitive model. Conclusions can be drawn that suggest MAR is easy to use and useful for operators in the remote collaborative works.

Keywords: mobile augmented reality, remote collaboration, user experience, cognition model

Procedia PDF Downloads 195
26755 Contextual Toxicity Detection with Data Augmentation

Authors: Julia Ive, Lucia Specia

Abstract:

Understanding and detecting toxicity is an important problem to support safer human interactions online. Our work focuses on the important problem of contextual toxicity detection, where automated classifiers are tasked with determining whether a short textual segment (usually a sentence) is toxic within its conversational context. We use “toxicity” as an umbrella term to denote a number of variants commonly named in the literature, including hate, abuse, offence, among others. Detecting toxicity in context is a non-trivial problem and has been addressed by very few previous studies. These previous studies have analysed the influence of conversational context in human perception of toxicity in controlled experiments and concluded that humans rarely change their judgements in the presence of context. They have also evaluated contextual detection models based on state-of-the-art Deep Learning and Natural Language Processing (NLP) techniques. Counterintuitively, they reached the general conclusion that computational models tend to suffer performance degradation in the presence of context. We challenge these empirical observations by devising better contextual predictive models that also rely on NLP data augmentation techniques to create larger and better data. In our study, we start by further analysing the human perception of toxicity in conversational data (i.e., tweets), in the absence versus presence of context, in this case, previous tweets in the same conversational thread. We observed that the conclusions of previous work on human perception are mainly due to data issues: The contextual data available does not provide sufficient evidence that context is indeed important (even for humans). The data problem is common in current toxicity datasets: cases labelled as toxic are either obviously toxic (i.e., overt toxicity with swear, racist, etc. words), and thus context does is not needed for a decision, or are ambiguous, vague or unclear even in the presence of context; in addition, the data contains labeling inconsistencies. To address this problem, we propose to automatically generate contextual samples where toxicity is not obvious (i.e., covert cases) without context or where different contexts can lead to different toxicity judgements for the same tweet. We generate toxic and non-toxic utterances conditioned on the context or on target tweets using a range of techniques for controlled text generation(e.g., Generative Adversarial Networks and steering techniques). On the contextual detection models, we posit that their poor performance is due to limitations on both of the data they are trained on (same problems stated above) and the architectures they use, which are not able to leverage context in effective ways. To improve on that, we propose text classification architectures that take the hierarchy of conversational utterances into account. In experiments benchmarking ours against previous models on existing and automatically generated data, we show that both data and architectural choices are very important. Our model achieves substantial performance improvements as compared to the baselines that are non-contextual or contextual but agnostic of the conversation structure.

Keywords: contextual toxicity detection, data augmentation, hierarchical text classification models, natural language processing

Procedia PDF Downloads 167
26754 SQL Generator Based on MVC Pattern

Authors: Chanchai Supaartagorn

Abstract:

Structured Query Language (SQL) is the standard de facto language to access and manipulate data in a relational database. Although SQL is a language that is simple and powerful, most novice users will have trouble with SQL syntax. Thus, we are presenting SQL generator tool which is capable of translating actions and displaying SQL commands and data sets simultaneously. The tool was developed based on Model-View-Controller (MVC) pattern. The MVC pattern is a widely used software design pattern that enforces the separation between the input, processing, and output of an application. Developers take full advantage of it to reduce the complexity in architectural design and to increase flexibility and reuse of code. In addition, we use White-Box testing for the code verification in the Model module.

Keywords: MVC, relational database, SQL, White-Box testing

Procedia PDF Downloads 414
26753 Effects of Temperature and the Use of Bacteriocins on Cross-Contamination from Animal Source Food Processing: A Mathematical Model

Authors: Benjamin Castillo, Luis Pastenes, Fernando Cerdova

Abstract:

The contamination of food by microbial agents is a common problem in the industry, especially regarding the elaboration of animal source products. Incorrect manipulation of the machinery or on the raw materials can cause a decrease in production or an epidemiological outbreak due to intoxication. In order to improve food product quality, different methods have been used to reduce or, at least, to slow down the growth of the pathogens, especially deteriorated, infectious or toxigenic bacteria. These methods are usually carried out under low temperatures and short processing time (abiotic agents), along with the application of antibacterial substances, such as bacteriocins (biotic agents). This, in a controlled and efficient way that fulfills the purpose of bacterial control without damaging the final product. Therefore, the objective of the present study is to design a secondary mathematical model that allows the prediction of both the biotic and abiotic factor impact associated with animal source food processing. In order to accomplish this objective, the authors propose a three-dimensional differential equation model, whose components are: bacterial growth, release, production and artificial incorporation of bacteriocins and changes in pH levels of the medium. These three dimensions are constantly being influenced by the temperature of the medium. Secondly, this model adapts to an idealized situation of cross-contamination animal source food processing, with the study agents being both the animal product and the contact surface. Thirdly, the stochastic simulations and the parametric sensibility analysis are compared with referential data. The main results obtained from the analysis and simulations of the mathematical model were to discover that, although bacterial growth can be stopped in lower temperatures, even lower ones are needed to eradicate it. However, this can be not only expensive, but counterproductive as well in terms of the quality of the raw materials and, on the other hand, higher temperatures accelerate bacterial growth. In other aspects, the use and efficiency of bacteriocins are an effective alternative in the short and medium terms. Moreover, an indicator of bacterial growth is a low-level pH, since lots of deteriorating bacteria are lactic acids. Lastly, the processing times are a secondary agent of concern when the rest of the aforementioned agents are under control. Our main conclusion is that when acclimating a mathematical model within the context of the industrial process, it can generate new tools that predict bacterial contamination, the impact of bacterial inhibition, and processing method times. In addition, the mathematical modeling proposed logistic input of broad application, which can be replicated on non-meat food products, other pathogens or even on contamination by crossed contact of allergen foods.

Keywords: bacteriocins, cross-contamination, mathematical model, temperature

Procedia PDF Downloads 141
26752 Automatic Tagging and Accuracy in Assamese Text Data

Authors: Chayanika Hazarika Bordoloi

Abstract:

This paper is an attempt to work on a highly inflectional language called Assamese. This is also one of the national languages of India and very little has been achieved in terms of computational research. Building a language processing tool for a natural language is not very smooth as the standard and language representation change at various levels. This paper presents inflectional suffixes of Assamese verbs and how the statistical tools, along with linguistic features, can improve the tagging accuracy. Conditional random fields (CRF tool) was used to automatically tag and train the text data; however, accuracy was improved after linguistic featured were fed into the training data. Assamese is a highly inflectional language; hence, it is challenging to standardizing its morphology. Inflectional suffixes are used as a feature of the text data. In order to analyze the inflections of Assamese word forms, a list of suffixes is prepared. This list comprises suffixes, comprising of all possible suffixes that various categories can take is prepared. Assamese words can be classified into inflected classes (noun, pronoun, adjective and verb) and un-inflected classes (adverb and particle). The corpus used for this morphological analysis has huge tokens. The corpus is a mixed corpus and it has given satisfactory accuracy. The accuracy rate of the tagger has gradually improved with the modified training data.

Keywords: CRF, morphology, tagging, tagset

Procedia PDF Downloads 189
26751 A Systematic Review of Sensory Processing Patterns of Children with Autism Spectrum Disorders

Authors: Ala’a F. Jaber, Bara’ah A. Bsharat, Noor T. Ismael

Abstract:

Background: Sensory processing is a fundamental skill needed for the successful performance of daily living activities. These skills are impaired as parts of the neurodevelopmental process issues among children with autism spectrum disorder (ASD). This systematic review aimed to summarize the evidence on the differences in sensory processing and motor characteristic between children with ASD and children with TD. Method: This systematic review followed the guidelines of the preferred reporting items for systematic reviews and meta-analysis. The search terms included sensory, motor, condition, and child-related terms or phrases. The electronic search utilized Academic Search Ultimate, CINAHL Plus with Full Text, ERIC, MEDLINE, MEDLINE Complete, Psychology, and Behavioral Sciences Collection, and SocINDEX with full-text databases. The hand search included looking for potential studies in the references of related studies. The inclusion criteria included studies published in English between years 2009-2020 that included children aged 3-18 years with a confirmed ASD diagnosis, according to the DSM-V criteria, included a control group of typical children, included outcome measures related to the sensory processing and/or motor functions, and studies available in full-text. The review of included studies followed the Oxford Centre for Evidence-Based Medicine guidelines, and the Guidelines for Critical Review Form of Quantitative Studies, and the guidelines for conducting systematic reviews by the American Occupational Therapy Association. Results: Eighty-eight full-text studies related to the differences between children with ASD and children with TD in terms of sensory processing and motor characteristics were reviewed, of which eighteen articles were included in the quantitative synthesis. The results reveal that children with ASD had more extreme sensory processing patterns than children with TD, like hyper-responsiveness and hypo-responsiveness to sensory stimuli. Also, children with ASD had limited gross and fine motor abilities and lower strength, endurance, balance, eye-hand coordination, movement velocity, cadence, dexterity with a higher rate of gait abnormalities than children with TD. Conclusion: This systematic review provided preliminary evidence suggesting that motor functioning should be addressed in the evaluation and intervention for children with ASD, and sensory processing should be supported among children with TD. More future research should investigate whether how the performance and engagement in daily life activities are affected by sensory processing and motor skills.

Keywords: sensory processing, occupational therapy, children, motor skills

Procedia PDF Downloads 127
26750 Robust and Real-Time Traffic Counting System

Authors: Hossam M. Moftah, Aboul Ella Hassanien

Abstract:

In the recent years the importance of automatic traffic control has increased due to the traffic jams problem especially in big cities for signal control and efficient traffic management. Traffic counting as a kind of traffic control is important to know the road traffic density in real time. This paper presents a fast and robust traffic counting system using different image processing techniques. The proposed system is composed of the following four fundamental building phases: image acquisition, pre-processing, object detection, and finally counting the connected objects. The object detection phase is comprised of the following five steps: subtracting the background, converting the image to binary, closing gaps and connecting nearby blobs, image smoothing to remove noises and very small objects, and detecting the connected objects. Experimental results show the great success of the proposed approach.

Keywords: traffic counting, traffic management, image processing, object detection, computer vision

Procedia PDF Downloads 288
26749 Development of an EEG-Based Real-Time Emotion Recognition System on Edge AI

Authors: James Rigor Camacho, Wansu Lim

Abstract:

Over the last few years, the development of new wearable and processing technologies has accelerated in order to harness physiological data such as electroencephalograms (EEGs) for EEG-based applications. EEG has been demonstrated to be a source of emotion recognition signals with the highest classification accuracy among physiological signals. However, when emotion recognition systems are used for real-time classification, the training unit is frequently left to run offline or in the cloud rather than working locally on the edge. That strategy has hampered research, and the full potential of using an edge AI device has yet to be realized. Edge AI devices are computers with high performance that can process complex algorithms. It is capable of collecting, processing, and storing data on its own. It can also analyze and apply complicated algorithms like localization, detection, and recognition on a real-time application, making it a powerful embedded device. The NVIDIA Jetson series, specifically the Jetson Nano device, was used in the implementation. The cEEGrid, which is integrated to the open-source brain computer-interface platform (OpenBCI), is used to collect EEG signals. An EEG-based real-time emotion recognition system on Edge AI is proposed in this paper. To perform graphical spectrogram categorization of EEG signals and to predict emotional states based on input data properties, machine learning-based classifiers were used. Until the emotional state was identified, the EEG signals were analyzed using the K-Nearest Neighbor (KNN) technique, which is a supervised learning system. In EEG signal processing, after each EEG signal has been received in real-time and translated from time to frequency domain, the Fast Fourier Transform (FFT) technique is utilized to observe the frequency bands in each EEG signal. To appropriately show the variance of each EEG frequency band, power density, standard deviation, and mean are calculated and employed. The next stage is to identify the features that have been chosen to predict emotion in EEG data using the K-Nearest Neighbors (KNN) technique. Arousal and valence datasets are used to train the parameters defined by the KNN technique.Because classification and recognition of specific classes, as well as emotion prediction, are conducted both online and locally on the edge, the KNN technique increased the performance of the emotion recognition system on the NVIDIA Jetson Nano. Finally, this implementation aims to bridge the research gap on cost-effective and efficient real-time emotion recognition using a resource constrained hardware device, like the NVIDIA Jetson Nano. On the cutting edge of AI, EEG-based emotion identification can be employed in applications that can rapidly expand the research and implementation industry's use.

Keywords: edge AI device, EEG, emotion recognition system, supervised learning algorithm, sensors

Procedia PDF Downloads 103
26748 [Keynote]: No-Trust-Zone Architecture for Securing Supervisory Control and Data Acquisition

Authors: Michael Okeke, Andrew Blyth

Abstract:

Supervisory Control And Data Acquisition (SCADA) as the state of the art Industrial Control Systems (ICS) are used in many different critical infrastructures, from smart home to energy systems and from locomotives train system to planes. Security of SCADA systems is vital since many lives depend on it for daily activities and deviation from normal operation could be disastrous to the environment as well as lives. This paper describes how No-Trust-Zone (NTZ) architecture could be incorporated into SCADA Systems in order to reduce the chances of malicious intent. The architecture is made up of two distinctive parts which are; the field devices such as; sensors, PLCs pumps, and actuators. The second part of the architecture is designed following lambda architecture, which is made up of a detection algorithm based on Particle Swarm Optimization (PSO) and Hadoop framework for data processing and storage. Apache Spark will be a part of the lambda architecture for real-time analysis of packets for anomalies detection.

Keywords: industrial control system (ics, no-trust-zone (ntz), particle swarm optimisation (pso), supervisory control and data acquisition (scada), swarm intelligence (SI)

Procedia PDF Downloads 339
26747 Predicting Polyethylene Processing Properties Based on Reaction Conditions via a Coupled Kinetic, Stochastic and Rheological Modelling Approach

Authors: Kristina Pflug, Markus Busch

Abstract:

Being able to predict polymer properties and processing behavior based on the applied operating reaction conditions in one of the key challenges in modern polymer reaction engineering. Especially, for cost-intensive processes such as the high-pressure polymerization of low-density polyethylene (LDPE) with high safety-requirements, the need for simulation-based process optimization and product design is high. A multi-scale modelling approach was set-up and validated via a series of high-pressure mini-plant autoclave reactor experiments. The approach starts with the numerical modelling of the complex reaction network of the LDPE polymerization taking into consideration the actual reaction conditions. While this gives average product properties, the complex polymeric microstructure including random short- and long-chain branching is calculated via a hybrid Monte Carlo-approach. Finally, the processing behavior of LDPE -its melt flow behavior- is determined in dependence of the previously determined polymeric microstructure using the branch on branch algorithm for randomly branched polymer systems. All three steps of the multi-scale modelling approach can be independently validated against analytical data. A triple-detector GPC containing an IR, viscosimetry and multi-angle light scattering detector is applied. It serves to determine molecular weight distributions as well as chain-length dependent short- and long-chain branching frequencies. 13C-NMR measurements give average branching frequencies, and rheological measurements in shear and extension serve to characterize the polymeric flow behavior. The accordance of experimental and modelled results was found to be extraordinary, especially taking into consideration that the applied multi-scale modelling approach does not contain parameter fitting of the data. This validates the suggested approach and proves its universality at the same time. In the next step, the modelling approach can be applied to other reactor types, such as tubular reactors or industrial scale. Moreover, sensitivity analysis for systematically varying process conditions is easily feasible. The developed multi-scale modelling approach finally gives the opportunity to predict and design LDPE processing behavior simply based on process conditions such as feed streams and inlet temperatures and pressures.

Keywords: low-density polyethylene, multi-scale modelling, polymer properties, reaction engineering, rheology

Procedia PDF Downloads 122
26746 Crater Detection Using PCA from Captured CMOS Camera Data

Authors: Tatsuya Takino, Izuru Nomura, Yuji Kageyama, Shin Nagata, Hiroyuki Kamata

Abstract:

We propose a method of detecting the craters from the image of the lunar surface. This proposal assumes that it is applied to SLIM (Smart Lander for Investigating Moon) working group aiming at the pinpoint landing on the lunar surface and investigating scientific research. It is difficult to equip and use high-performance computers for the small space probe. So, it is necessary to use a small computer with an exclusive hardware such as FPGA. We have studied the crater detection using principal component analysis (PCA), In this paper, We implement detection algorithm into the FPGA, and the detection is performed on the data that was captured from the CMOS camera.

Keywords: crater detection, PCA, FPGA, image processing

Procedia PDF Downloads 544
26745 Natural Language Processing for the Classification of Social Media Posts in Post-Disaster Management

Authors: Ezgi Şendil

Abstract:

Information extracted from social media has received great attention since it has become an effective alternative for collecting people’s opinions and emotions based on specific experiences in a faster and easier way. The paper aims to put data in a meaningful way to analyze users’ posts and get a result in terms of the experiences and opinions of the users during and after natural disasters. The posts collected from Reddit are classified into nine different categories, including injured/dead people, infrastructure and utility damage, missing/found people, donation needs/offers, caution/advice, and emotional support, identified by using labelled Twitter data and four different machine learning (ML) classifiers.

Keywords: disaster, NLP, postdisaster management, sentiment analysis

Procedia PDF Downloads 72
26744 Development of a Software System for Management and Genetic Analysis of Biological Samples for Forensic Laboratories

Authors: Mariana Lima, Rodrigo Silva, Victor Stange, Teodiano Bastos

Abstract:

Due to the high reliability reached by DNA tests, since the 1980s this kind of test has allowed the identification of a growing number of criminal cases, including old cases that were unsolved, now having a chance to be solved with this technology. Currently, the use of genetic profiling databases is a typical method to increase the scope of genetic comparison. Forensic laboratories must process, analyze, and generate genetic profiles of a growing number of samples, which require time and great storage capacity. Therefore, it is essential to develop methodologies capable to organize and minimize the spent time for both biological sample processing and analysis of genetic profiles, using software tools. Thus, the present work aims the development of a software system solution for laboratories of forensics genetics, which allows sample, criminal case and local database management, minimizing the time spent in the workflow and helps to compare genetic profiles. For the development of this software system, all data related to the storage and processing of samples, workflows and requirements that incorporate the system have been considered. The system uses the following software languages: HTML, CSS, and JavaScript in Web technology, with NodeJS platform as server, which has great efficiency in the input and output of data. In addition, the data are stored in a relational database (MySQL), which is free, allowing a better acceptance for users. The software system here developed allows more agility to the workflow and analysis of samples, contributing to the rapid insertion of the genetic profiles in the national database and to increase resolution of crimes. The next step of this research is its validation, in order to operate in accordance with current Brazilian national legislation.

Keywords: database, forensic genetics, genetic analysis, sample management, software solution

Procedia PDF Downloads 367
26743 A Novel Machine Learning Approach to Aid Agrammatism in Non-fluent Aphasia

Authors: Rohan Bhasin

Abstract:

Agrammatism in non-fluent Aphasia Cases can be defined as a language disorder wherein a patient can only use content words ( nouns, verbs and adjectives ) for communication and their speech is devoid of functional word types like conjunctions and articles, generating speech of with extremely rudimentary grammar . Past approaches involve Speech Therapy of some order with conversation analysis used to analyse pre-therapy speech patterns and qualitative changes in conversational behaviour after therapy. We describe this approach as a novel method to generate functional words (prepositions, articles, ) around content words ( nouns, verbs and adjectives ) using a combination of Natural Language Processing and Deep Learning algorithms. The applications of this approach can be used to assist communication. The approach the paper investigates is : LSTMs or Seq2Seq: A sequence2sequence approach (seq2seq) or LSTM would take in a sequence of inputs and output sequence. This approach needs a significant amount of training data, with each training data containing pairs such as (content words, complete sentence). We generate such data by starting with complete sentences from a text source, removing functional words to get just the content words. However, this approach would require a lot of training data to get a coherent input. The assumptions of this approach is that the content words received in the inputs of both text models are to be preserved, i.e, won't alter after the functional grammar is slotted in. This is a potential limit to cases of severe Agrammatism where such order might not be inherently correct. The applications of this approach can be used to assist communication mild Agrammatism in non-fluent Aphasia Cases. Thus by generating these function words around the content words, we can provide meaningful sentence options to the patient for articulate conversations. Thus our project translates the use case of generating sentences from content-specific words into an assistive technology for non-Fluent Aphasia Patients.

Keywords: aphasia, expressive aphasia, assistive algorithms, neurology, machine learning, natural language processing, language disorder, behaviour disorder, sequence to sequence, LSTM

Procedia PDF Downloads 160
26742 Application of Signature Verification Models for Document Recognition

Authors: Boris M. Fedorov, Liudmila P. Goncharenko, Sergey A. Sybachin, Natalia A. Mamedova, Ekaterina V. Makarenkova, Saule Rakhimova

Abstract:

In modern economic conditions, the question of the possibility of correct recognition of a signature on digital documents in order to verify the expression of will or confirm a certain operation is relevant. The additional complexity of processing lies in the dynamic variability of the signature for each individual, as well as in the way information is processed because the signature refers to biometric data. The article discusses the issues of using artificial intelligence models in order to improve the quality of signature confirmation in document recognition. The analysis of several possible options for using the model is carried out. The results of the study are given, in which it is possible to correctly determine the authenticity of the signature on small samples.

Keywords: signature recognition, biometric data, artificial intelligence, neural networks

Procedia PDF Downloads 145
26741 Application Methodology for the Generation of 3D Thermal Models Using UAV Photogrammety and Dual Sensors for Mining/Industrial Facilities Inspection

Authors: Javier Sedano-Cibrián, Julio Manuel de Luis-Ruiz, Rubén Pérez-Álvarez, Raúl Pereda-García, Beatriz Malagón-Picón

Abstract:

Structural inspection activities are necessary to ensure the correct functioning of infrastructures. Unmanned Aerial Vehicle (UAV) techniques have become more popular than traditional techniques. Specifically, UAV Photogrammetry allows time and cost savings. The development of this technology has permitted the use of low-cost thermal sensors in UAVs. The representation of 3D thermal models with this type of equipment is in continuous evolution. The direct processing of thermal images usually leads to errors and inaccurate results. A methodology is proposed for the generation of 3D thermal models using dual sensors, which involves the application of visible Red-Blue-Green (RGB) and thermal images in parallel. Hence, the RGB images are used as the basis for the generation of the model geometry, and the thermal images are the source of the surface temperature information that is projected onto the model. Mining/industrial facilities representations that are obtained can be used for inspection activities.

Keywords: aerial thermography, data processing, drone, low-cost, point cloud

Procedia PDF Downloads 139
26740 Predictive Maintenance: Machine Condition Real-Time Monitoring and Failure Prediction

Authors: Yan Zhang

Abstract:

Predictive maintenance is a technique to predict when an in-service machine will fail so that maintenance can be planned in advance. Analytics-driven predictive maintenance is gaining increasing attention in many industries such as manufacturing, utilities, aerospace, etc., along with the emerging demand of Internet of Things (IoT) applications and the maturity of technologies that support Big Data storage and processing. This study aims to build an end-to-end analytics solution that includes both real-time machine condition monitoring and machine learning based predictive analytics capabilities. The goal is to showcase a general predictive maintenance solution architecture, which suggests how the data generated from field machines can be collected, transmitted, stored, and analyzed. We use a publicly available aircraft engine run-to-failure dataset to illustrate the streaming analytics component and the batch failure prediction component. We outline the contributions of this study from four aspects. First, we compare the predictive maintenance problems from the view of the traditional reliability centered maintenance field, and from the view of the IoT applications. When evolving to the IoT era, predictive maintenance has shifted its focus from ensuring reliable machine operations to improve production/maintenance efficiency via any maintenance related tasks. It covers a variety of topics, including but not limited to: failure prediction, fault forecasting, failure detection and diagnosis, and recommendation of maintenance actions after failure. Second, we review the state-of-art technologies that enable a machine/device to transmit data all the way through the Cloud for storage and advanced analytics. These technologies vary drastically mainly based on the power source and functionality of the devices. For example, a consumer machine such as an elevator uses completely different data transmission protocols comparing to the sensor units in an environmental sensor network. The former may transfer data into the Cloud via WiFi directly. The latter usually uses radio communication inherent the network, and the data is stored in a staging data node before it can be transmitted into the Cloud when necessary. Third, we illustrate show to formulate a machine learning problem to predict machine fault/failures. By showing a step-by-step process of data labeling, feature engineering, model construction and evaluation, we share following experiences: (1) what are the specific data quality issues that have crucial impact on predictive maintenance use cases; (2) how to train and evaluate a model when training data contains inter-dependent records. Four, we review the tools available to build such a data pipeline that digests the data and produce insights. We show the tools we use including data injection, streaming data processing, machine learning model training, and the tool that coordinates/schedules different jobs. In addition, we show the visualization tool that creates rich data visualizations for both real-time insights and prediction results. To conclude, there are two key takeaways from this study. (1) It summarizes the landscape and challenges of predictive maintenance applications. (2) It takes an example in aerospace with publicly available data to illustrate each component in the proposed data pipeline and showcases how the solution can be deployed as a live demo.

Keywords: Internet of Things, machine learning, predictive maintenance, streaming data

Procedia PDF Downloads 382
26739 Signal Processing Techniques for Adaptive Beamforming with Robustness

Authors: Ju-Hong Lee, Ching-Wei Liao

Abstract:

Adaptive beamforming using antenna array of sensors is useful in the process of adaptively detecting and preserving the presence of the desired signal while suppressing the interference and the background noise. For conventional adaptive array beamforming, we require a prior information of either the impinging direction or the waveform of the desired signal to adapt the weights. The adaptive weights of an antenna array beamformer under a steered-beam constraint are calculated by minimizing the output power of the beamformer subject to the constraint that forces the beamformer to make a constant response in the steering direction. Hence, the performance of the beamformer is very sensitive to the accuracy of the steering operation. In the literature, it is well known that the performance of an adaptive beamformer will be deteriorated by any steering angle error encountered in many practical applications, e.g., the wireless communication systems with massive antennas deployed at the base station and user equipment. Hence, developing effective signal processing techniques to deal with the problem due to steering angle error for array beamforming systems has become an important research work. In this paper, we present an effective signal processing technique for constructing an adaptive beamformer against the steering angle error. The proposed array beamformer adaptively estimates the actual direction of the desired signal by using the presumed steering vector and the received array data snapshots. Based on the presumed steering vector and a preset angle range for steering mismatch tolerance, we first create a matrix related to the direction vector of signal sources. Two projection matrices are generated from the matrix. The projection matrix associated with the desired signal information and the received array data are utilized to iteratively estimate the actual direction vector of the desired signal. The estimated direction vector of the desired signal is then used for appropriately finding the quiescent weight vector. The other projection matrix is set to be the signal blocking matrix required for performing adaptive beamforming. Accordingly, the proposed beamformer consists of adaptive quiescent weights and partially adaptive weights. Several computer simulation examples are provided for evaluating and comparing the proposed technique with the existing robust techniques.

Keywords: adaptive beamforming, robustness, signal blocking, steering angle error

Procedia PDF Downloads 120
26738 Research of Data Cleaning Methods Based on Dependency Rules

Authors: Yang Bao, Shi Wei Deng, WangQun Lin

Abstract:

This paper introduces the concept and principle of data cleaning, analyzes the types and causes of dirty data, and proposes several key steps of typical cleaning process, puts forward a well scalability and versatility data cleaning framework, in view of data with attribute dependency relation, designs several of violation data discovery algorithms by formal formula, which can obtain inconsistent data to all target columns with condition attribute dependent no matter data is structured (SQL) or unstructured (NoSQL), and gives 6 data cleaning methods based on these algorithms.

Keywords: data cleaning, dependency rules, violation data discovery, data repair

Procedia PDF Downloads 560
26737 Post Pandemic Mobility Analysis through Indexing and Sharding in MongoDB: Performance Optimization and Insights

Authors: Karan Vishavjit, Aakash Lakra, Shafaq Khan

Abstract:

The COVID-19 pandemic has pushed healthcare professionals to use big data analytics as a vital tool for tracking and evaluating the effects of contagious viruses. To effectively analyze huge datasets, efficient NoSQL databases are needed. The analysis of post-COVID-19 health and well-being outcomes and the evaluation of the effectiveness of government efforts during the pandemic is made possible by this research’s integration of several datasets, which cuts down on query processing time and creates predictive visual artifacts. We recommend applying sharding and indexing technologies to improve query effectiveness and scalability as the dataset expands. Effective data retrieval and analysis are made possible by spreading the datasets into a sharded database and doing indexing on individual shards. Analysis of connections between governmental activities, poverty levels, and post-pandemic well being is the key goal. We want to evaluate the effectiveness of governmental initiatives to improve health and lower poverty levels. We will do this by utilising advanced data analysis and visualisations. The findings provide relevant data that supports the advancement of UN sustainable objectives, future pandemic preparation, and evidence-based decision-making. This study shows how Big Data and NoSQL databases may be used to address problems with global health.

Keywords: big data, COVID-19, health, indexing, NoSQL, sharding, scalability, well being

Procedia PDF Downloads 66
26736 Theorizing Optimal Use of Numbers and Anecdotes: The Science of Storytelling in Newsrooms

Authors: Hai L. Tran

Abstract:

When covering events and issues, the news media often employ both personal accounts as well as facts and figures. However, the process of using numbers and narratives in the newsroom is mostly operated through trial and error. There is a demonstrated need for the news industry to better understand the specific effects of storytelling and data-driven reporting on the audience as well as explanatory factors driving such effects. In the academic world, anecdotal evidence and statistical evidence have been studied in a mutually exclusive manner. Existing research tends to treat pertinent effects as though the use of one form precludes the other and as if a tradeoff is required. Meanwhile, narratives and statistical facts are often combined in various communication contexts, especially in news presentations. There is value in reconceptualizing and theorizing about both relative and collective impacts of numbers and narratives as well as the mechanism underlying such effects. The current undertaking seeks to link theory to practice by providing a complete picture of how and why people are influenced by information conveyed through quantitative and qualitative accounts. Specifically, the cognitive-experiential theory is invoked to argue that humans employ two distinct systems to process information. The rational system requires the processing of logical evidence effortful analytical cognitions, which are affect-free. Meanwhile, the experiential system is intuitive, rapid, automatic, and holistic, thereby demanding minimum cognitive resources and relating to the experience of affect. In certain situations, one system might dominate the other, but rational and experiential modes of processing operations in parallel and at the same time. As such, anecdotes and quantified facts impact audience response differently and a combination of data and narratives is more effective than either form of evidence. In addition, the present study identifies several media variables and human factors driving the effects of statistics and anecdotes. An integrative model is proposed to explain how message characteristics (modality, vividness, salience, congruency, position) and individual differences (involvement, numeracy skills, cognitive resources, cultural orientation) impact selective exposure, which in turn activates pertinent modes of processing, and thereby induces corresponding responses. The present study represents a step toward bridging theoretical frameworks from various disciplines to better understand the specific effects and the conditions under which the use of anecdotal evidence and/or statistical evidence enhances or undermines information processing. In addition to theoretical contributions, this research helps inform news professionals about the benefits and pitfalls of incorporating quantitative and qualitative accounts in reporting. It proposes a typology of possible scenarios and appropriate strategies for journalists to use when presenting news with anecdotes and numbers.

Keywords: data, narrative, number, anecdote, storytelling, news

Procedia PDF Downloads 76
26735 Native Language Identification with Cross-Corpus Evaluation Using Social Media Data: ’Reddit’

Authors: Yasmeen Bassas, Sandra Kuebler, Allen Riddell

Abstract:

Native language identification is one of the growing subfields in natural language processing (NLP). The task of native language identification (NLI) is mainly concerned with predicting the native language of an author’s writing in a second language. In this paper, we investigate the performance of two types of features; content-based features vs. content independent features, when they are evaluated on a different corpus (using social media data “Reddit”). In this NLI task, the predefined models are trained on one corpus (TOEFL), and then the trained models are evaluated on different data using an external corpus (Reddit). Three classifiers are used in this task; the baseline, linear SVM, and logistic regression. Results show that content-based features are more accurate and robust than content independent ones when tested within the corpus and across corpus.

Keywords: NLI, NLP, content-based features, content independent features, social media corpus, ML

Procedia PDF Downloads 131
26734 Yoghurt Kepel Stelechocarpus burahol as an Effort of Functional Food Diversification from Region of Yogyakarta

Authors: Dian Nur Amalia, Rifqi Dhiemas Aji, Tri Septa Wahyuningsih, Endang Wahyuni

Abstract:

Kepel fruit (Stelechocarpus burahol) is a scarce fruit that belongs as a logogram of Daerah Istimewa Yogyakarta. Kepel fruit can be used as substance of beauty treatment product, such as deodorant and good for skin health, and also contains antioxidant compound. Otherwise, this fruit is scarcely cultivated by people because of its image as a palace fruit and also the flesh percentage just a little, so it has low economic value. The flesh of kepel fruit is about 49% of its whole fruit. This little part as supporting point why kepel fruit has to be extracted and processed with the other product. Yoghurt is milk processing product that also have a role as functional food. Economically, the price of yoghurt is higher than whole milk or other milk processing product. Yoghurt is usually added with flavor of dye from plant or from chemical substance. Kepel fruit has a role as flavor in yoghurt, besides as product that good for digestion, yoghurt with kepel also has function as “beauty” food. Writing method that used is literature study by looking for the potential of kepel fruit as a local fruit of Yogyakarta and yoghurt as milk processing product. The process just like making common yoghurt because kepel fruit just have a role as flavor substance, so it does not affect to the other processing of yoghurt. Food diversification can be done as an effort to increase the value of local resources that proper to compete in Asean Economic Community (AEC), one of the way is producing kepel yoghurt.

Keywords: kepel, yoghurt, Daerah Istimewa Yogyakarta, functional food

Procedia PDF Downloads 314
26733 Advances in Mathematical Sciences: Unveiling the Power of Data Analytics

Authors: Zahid Ullah, Atlas Khan

Abstract:

The rapid advancements in data collection, storage, and processing capabilities have led to an explosion of data in various domains. In this era of big data, mathematical sciences play a crucial role in uncovering valuable insights and driving informed decision-making through data analytics. The purpose of this abstract is to present the latest advances in mathematical sciences and their application in harnessing the power of data analytics. This abstract highlights the interdisciplinary nature of data analytics, showcasing how mathematics intersects with statistics, computer science, and other related fields to develop cutting-edge methodologies. It explores key mathematical techniques such as optimization, mathematical modeling, network analysis, and computational algorithms that underpin effective data analysis and interpretation. The abstract emphasizes the role of mathematical sciences in addressing real-world challenges across different sectors, including finance, healthcare, engineering, social sciences, and beyond. It showcases how mathematical models and statistical methods extract meaningful insights from complex datasets, facilitating evidence-based decision-making and driving innovation. Furthermore, the abstract emphasizes the importance of collaboration and knowledge exchange among researchers, practitioners, and industry professionals. It recognizes the value of interdisciplinary collaborations and the need to bridge the gap between academia and industry to ensure the practical application of mathematical advancements in data analytics. The abstract highlights the significance of ongoing research in mathematical sciences and its impact on data analytics. It emphasizes the need for continued exploration and innovation in mathematical methodologies to tackle emerging challenges in the era of big data and digital transformation. In summary, this abstract sheds light on the advances in mathematical sciences and their pivotal role in unveiling the power of data analytics. It calls for interdisciplinary collaboration, knowledge exchange, and ongoing research to further unlock the potential of mathematical methodologies in addressing complex problems and driving data-driven decision-making in various domains.

Keywords: mathematical sciences, data analytics, advances, unveiling

Procedia PDF Downloads 86
26732 Effects of Safety Intervention Program towards Behaviors among Rubber Wood Processing Workers Using Theory of Planned Behavior

Authors: Junjira Mahaboon, Anongnard Boonpak, Nattakarn Worrasan, Busma Kama, Mujalin Saikliang, Siripor Dankachatarn

Abstract:

Rubber wood processing is one of the most important industries in southern Thailand. The process has several safety hazards for example unsafe wood cutting machine guarding, wood dust, noise, and heavy lifting. However, workers’ occupational health and safety measures to promote their behaviors are still limited. This quasi-experimental research was to determine factors affecting workers’ safety behaviors using theory of planned behavior after implementing job safety intervention program. The purposes were to (1) determine factors affecting workers’ behaviors and (2) to evaluate effectiveness of the intervention program. The sample of study was 66 workers from a rubber wood processing factory. Factors in the Theory of Planned Behavior model (TPB) were measured before and after the intervention. The factors of TPB included attitude towards behavior, subjective norm, perceived behavioral control, intention, and behavior. Firstly, Job Safety Analysis (JSA) was conducted and Safety Standard Operation Procedures (SSOP) were established. The questionnaire was also used to collect workers’ characteristics and TPB factors. Then, job safety intervention program to promote workers’ behavior according to SSOP were implemented for a four month period. The program included SSOP training, personal protective equipment use, and safety promotional campaign. After that, the TPB factors were again collected. Paired sample t-test and independent t-test were used to analyze the data. The result revealed that attitude towards behavior and intention increased significantly after the intervention at p<0.05. These factors also significantly determined the workers’ safety behavior according to SSOP at p<0.05. However, subjective norm, and perceived behavioral control were not significantly changed nor related to safety behaviors. In conclusion, attitude towards behavior and workers’ intention should be promoted to encourage workers’ safety behaviors. SSOP intervention program e.g. short meeting, safety training, and promotional campaign should be continuously implemented in a routine basis to improve workers’ behavior.

Keywords: job safety analysis, rubber wood processing workers, safety standard operation procedure, theory of planned behavior

Procedia PDF Downloads 189
26731 Knowledge Graph Development to Connect Earth Metadata and Standard English Queries

Authors: Gabriel Montague, Max Vilgalys, Catherine H. Crawford, Jorge Ortiz, Dava Newman

Abstract:

There has never been so much publicly accessible atmospheric and environmental data. The possibilities of these data are exciting, but the sheer volume of available datasets represents a new challenge for researchers. The task of identifying and working with a new dataset has become more difficult with the amount and variety of available data. Datasets are often documented in ways that differ substantially from the common English used to describe the same topics. This presents a barrier not only for new scientists, but for researchers looking to find comparisons across multiple datasets or specialists from other disciplines hoping to collaborate. This paper proposes a method for addressing this obstacle: creating a knowledge graph to bridge the gap between everyday English language and the technical language surrounding these datasets. Knowledge graph generation is already a well-established field, although there are some unique challenges posed by working with Earth data. One is the sheer size of the databases – it would be infeasible to replicate or analyze all the data stored by an organization like The National Aeronautics and Space Administration (NASA) or the European Space Agency. Instead, this approach identifies topics from metadata available for datasets in NASA’s Earthdata database, which can then be used to directly request and access the raw data from NASA. By starting with a single metadata standard, this paper establishes an approach that can be generalized to different databases, but leaves the challenge of metadata harmonization for future work. Topics generated from the metadata are then linked to topics from a collection of English queries through a variety of standard and custom natural language processing (NLP) methods. The results from this method are then compared to a baseline of elastic search applied to the metadata. This comparison shows the benefits of the proposed knowledge graph system over existing methods, particularly in interpreting natural language queries and interpreting topics in metadata. For the research community, this work introduces an application of NLP to the ecological and environmental sciences, expanding the possibilities of how machine learning can be applied in this discipline. But perhaps more importantly, it establishes the foundation for a platform that can enable common English to access knowledge that previously required considerable effort and experience. By making this public data accessible to the full public, this work has the potential to transform environmental understanding, engagement, and action.

Keywords: earth metadata, knowledge graphs, natural language processing, question-answer systems

Procedia PDF Downloads 144
26730 Towards a Distributed Computation Platform Tailored for Educational Process Discovery and Analysis

Authors: Awatef Hicheur Cairns, Billel Gueni, Hind Hafdi, Christian Joubert, Nasser Khelifa

Abstract:

Given the ever changing needs of the job markets, education and training centers are increasingly held accountable for student success. Therefore, education and training centers have to focus on ways to streamline their offers and educational processes in order to achieve the highest level of quality in curriculum contents and managerial decisions. Educational process mining is an emerging field in the educational data mining (EDM) discipline, concerned with developing methods to discover, analyze and provide a visual representation of complete educational processes. In this paper, we present our distributed computation platform which allows different education centers and institutions to load their data and access to advanced data mining and process mining services. To achieve this, we present also a comparative study of the different clustering techniques developed in the context of process mining to partition efficiently educational traces. Our goal is to find the best strategy for distributing heavy analysis computations on many processing nodes of our platform.

Keywords: educational process mining, distributed process mining, clustering, distributed platform, educational data mining, ProM

Procedia PDF Downloads 450
26729 Geological Structure Identification in Semilir Formation: An Correlated Geological and Geophysical (Very Low Frequency) Data for Zonation Disaster with Current Density Parameters and Geological Surface Information

Authors: E. M. Rifqi Wilda Pradana, Bagus Bayu Prabowo, Meida Riski Pujiyati, Efraim Maykhel Hagana Ginting, Virgiawan Arya Hangga Reksa

Abstract:

The VLF (Very Low Frequency) method is an electromagnetic method that uses low frequencies between 10-30 KHz which results in a fairly deep penetration. In this study, the VLF method was used for zonation of disaster-prone areas by identifying geological structures in the form of faults. Data acquisition was carried out in Trimulyo Region, Jetis District, Bantul Regency, Special Region of Yogyakarta, Indonesia with 8 measurement paths. This study uses wave transmitters from Japan and Australia to obtain Tilt and Elipt values that can be used to create RAE (Rapat Arus Ekuivalen or Current Density) sections that can be used to identify areas that are easily crossed by electric current. This section will indicate the existence of a geological structure in the form of faults in the study area which is characterized by a high RAE value. In data processing of VLF method, it is obtained Tilt vs Elliptical graph and Moving Average (MA) Tilt vs Moving Average (MA) Elipt graph of each path that shows a fluctuating pattern and does not show any intersection at all. Data processing uses Matlab software and obtained areas with low RAE values that are 0%-6% which shows medium with low conductivity and high resistivity and can be interpreted as sandstone, claystone, and tuff lithology which is part of the Semilir Formation. Whereas a high RAE value of 10% -16% which shows a medium with high conductivity and low resistivity can be interpreted as a fault zone filled with fluid. The existence of the fault zone is strengthened by the discovery of a normal fault on the surface with strike N550W and dip 630E at coordinates X= 433256 and Y= 9127722 so that the activities of residents in the zone such as housing, mining activities and other activities can be avoided to reduce the risk of natural disasters.

Keywords: current density, faults, very low frequency, zonation

Procedia PDF Downloads 169
26728 A Review on Cloud Computing and Internet of Things

Authors: Sahar S. Tabrizi, Dogan Ibrahim

Abstract:

Cloud Computing is a convenient model for on-demand networks that uses shared pools of virtual configurable computing resources, such as servers, networks, storage devices, applications, etc. The cloud serves as an environment for companies and organizations to use infrastructure resources without making any purchases and they can access such resources wherever and whenever they need. Cloud computing is useful to overcome a number of problems in various Information Technology (IT) domains such as Geographical Information Systems (GIS), Scientific Research, e-Governance Systems, Decision Support Systems, ERP, Web Application Development, Mobile Technology, etc. Companies can use Cloud Computing services to store large amounts of data that can be accessed from anywhere on Earth and also at any time. Such services are rented by the client companies where the actual rent depends upon the amount of data stored on the cloud and also the amount of processing power used in a given time period. The resources offered by the cloud service companies are flexible in the sense that the user companies can increase or decrease their storage requirements or the processing power requirements at any time, thus minimizing the overall rental cost of the service they receive. In addition, the Cloud Computing service providers offer fast processors and applications software that can be shared by their clients. This is especially important for small companies with limited budgets which cannot afford to purchase their own expensive hardware and software. This paper is an overview of the Cloud Computing, giving its types, principles, advantages, and disadvantages. In addition, the paper gives some example engineering applications of Cloud Computing and makes suggestions for possible future applications in the field of engineering.

Keywords: cloud computing, cloud systems, cloud services, IaaS, PaaS, SaaS

Procedia PDF Downloads 231