Search results for: data filtering
7002 An Efficient 3D Animation Data Reduction Using Frame Removal
Authors: Jinsuk Yang, Choongjae Joo, Kyoungsu Oh
Abstract:
Existing methods in which the animation data of all frames are stored and reproduced as with vertex animation cannot be used in mobile device environments because these methods use large amounts of the memory. So 3D animation data reduction methods aimed at solving this problem have been extensively studied thus far and we propose a new method as follows. First, we find and remove frames in which motion changes are small out of all animation frames and store only the animation data of remaining frames (involving large motion changes). When playing the animation, the removed frame areas are reconstructed using the interpolation of the remaining frames. Our key contribution is to calculate the accelerations of the joints of individual frames and the standard deviations of the accelerations using the information of joint locations in the relevant 3D model in order to find and delete frames in which motion changes are small. Our methods can reduce data sizes by approximately 50% or more while providing quality which is not much lower compared to original animations. Therefore, our method is expected to be usefully used in mobile device environments or other environments in which memory sizes are limited.
Keywords: Data Reduction, Interpolation, Vertex Animation, 3D Animation.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 16617001 Diagnosis of the Heart Rhythm Disorders by Using Hybrid Classifiers
Authors: Sule Yucelbas, Gulay Tezel, Cuneyt Yucelbas, Seral Ozsen
Abstract:
In this study, it was tried to identify some heart rhythm disorders by electrocardiography (ECG) data that is taken from MIT-BIH arrhythmia database by subtracting the required features, presenting to artificial neural networks (ANN), artificial immune systems (AIS), artificial neural network based on artificial immune system (AIS-ANN) and particle swarm optimization based artificial neural network (PSO-NN) classifier systems. The main purpose of this study is to evaluate the performance of hybrid AIS-ANN and PSO-ANN classifiers with regard to the ANN and AIS. For this purpose, the normal sinus rhythm (NSR), atrial premature contraction (APC), sinus arrhythmia (SA), ventricular trigeminy (VTI), ventricular tachycardia (VTK) and atrial fibrillation (AF) data for each of the RR intervals were found. Then these data in the form of pairs (NSR-APC, NSR-SA, NSR-VTI, NSR-VTK and NSR-AF) is created by combining discrete wavelet transform which is applied to each of these two groups of data and two different data sets with 9 and 27 features were obtained from each of them after data reduction. Afterwards, the data randomly was firstly mixed within themselves, and then 4-fold cross validation method was applied to create the training and testing data. The training and testing accuracy rates and training time are compared with each other.
As a result, performances of the hybrid classification systems, AIS-ANN and PSO-ANN were seen to be close to the performance of the ANN system. Also, the results of the hybrid systems were much better than AIS, too. However, ANN had much shorter period of training time than other systems. In terms of training times, ANN was followed by PSO-ANN, AIS-ANN and AIS systems respectively. Also, the features that extracted from the data affected the classification results significantly.
Keywords: AIS, ANN, ECG, hybrid classifiers, PSO.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 19167000 Validation of Visibility Data from Road Weather Information Systems by Comparing Three Data Resources: Case Study in Ohio
Authors: Fan Ye
Abstract:
Adverse weather conditions, particularly those with low visibility, are critical to the driving tasks. However, the direct relationship between visibility distances and traffic flow/roadway safety is uncertain due to the limitation of visibility data availability. The recent growth of deployment of Road Weather Information Systems (RWIS) makes segment-specific visibility information available which can be integrated with other Intelligent Transportation System, such as automated warning system and variable speed limit, to improve mobility and safety. Before applying the RWIS visibility measurements in traffic study and operations, it is critical to validate the data. Therefore, an attempt was made in the paper to examine the validity and viability of RWIS visibility data by comparing visibility measurements among RWIS, airport weather stations, and weather information recorded by police in crash reports, based on Ohio data. The results indicated that RWIS visibility measurements were significantly different from airport visibility data in Ohio, but no conclusion regarding the reliability of RWIS visibility could be drawn in the consideration of no verified ground truth in the comparisons. It was suggested that more objective methods are needed to validate the RWIS visibility measurements, such as continuous in-field measurements associated with various weather events using calibrated visibility sensors.
Keywords: Low visibility, RWIS, traffic safety, visibility.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 13346999 A Comparative Analysis of Different Web Content Mining Tools
Authors: T. Suresh Kumar, M. Arthanari, N. Shanthi
Abstract:
Nowadays, the Web has become one of the most pervasive platforms for information change and retrieval. It collects the suitable and perfectly fitting information from websites that one requires. Data mining is the form of extracting data’s available in the internet. Web mining is one of the elements of data mining Technique, which relates to various research communities such as information recovery, folder managing system and simulated intellects. In this Paper we have discussed the concepts of Web mining. We contain generally focused on one of the categories of Web mining, specifically the Web Content Mining and its various farm duties. The mining tools are imperative to scanning the many images, text, and HTML documents and then, the result is used by the various search engines. We conclude by presenting a comparative table of these tools based on some pertinent criteria.
Keywords: Data Mining, Web Mining, Web Content Mining, Mining Tools, Information retrieval.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 35526998 Implementation of Neural Network Based Electricity Load Forecasting
Authors: Myint Myint Yi, Khin Sandar Linn, Marlar Kyaw
Abstract:
This paper proposed a novel model for short term load forecast (STLF) in the electricity market. The prior electricity demand data are treated as time series. The model is composed of several neural networks whose data are processed using a wavelet technique. The model is created in the form of a simulation program written with MATLAB. The load data are treated as time series data. They are decomposed into several wavelet coefficient series using the wavelet transform technique known as Non-decimated Wavelet Transform (NWT). The reason for using this technique is the belief in the possibility of extracting hidden patterns from the time series data. The wavelet coefficient series are used to train the neural networks (NNs) and used as the inputs to the NNs for electricity load prediction. The Scale Conjugate Gradient (SCG) algorithm is used as the learning algorithm for the NNs. To get the final forecast data, the outputs from the NNs are recombined using the same wavelet technique. The model was evaluated with the electricity load data of Electronic Engineering Department in Mandalay Technological University in Myanmar. The simulation results showed that the model was capable of producing a reasonable forecasting accuracy in STLF.Keywords: Neural network, Load forecast, Time series, wavelettransform.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 24936997 Integration of Big Data to Predict Transportation for Smart Cities
Authors: Sun-Young Jang, Sung-Ah Kim, Dongyoun Shin
Abstract:
The Intelligent transportation system is essential to build smarter cities. Machine learning based transportation prediction could be highly promising approach by delivering invisible aspect visible. In this context, this research aims to make a prototype model that predicts transportation network by using big data and machine learning technology. In detail, among urban transportation systems this research chooses bus system. The research problem that existing headway model cannot response dynamic transportation conditions. Thus, bus delay problem is often occurred. To overcome this problem, a prediction model is presented to fine patterns of bus delay by using a machine learning implementing the following data sets; traffics, weathers, and bus statues. This research presents a flexible headway model to predict bus delay and analyze the result. The prototyping model is composed by real-time data of buses. The data are gathered through public data portals and real time Application Program Interface (API) by the government. These data are fundamental resources to organize interval pattern models of bus operations as traffic environment factors (road speeds, station conditions, weathers, and bus information of operating in real-time). The prototyping model is designed by the machine learning tool (RapidMiner Studio) and conducted tests for bus delays prediction. This research presents experiments to increase prediction accuracy for bus headway by analyzing the urban big data. The big data analysis is important to predict the future and to find correlations by processing huge amount of data. Therefore, based on the analysis method, this research represents an effective use of the machine learning and urban big data to understand urban dynamics.
Keywords: Big data, bus headway prediction, machine learning, public transportation.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 15626996 Clustering Approach to Unveiling Relationships between Gene Regulatory Networks
Authors: Hiba Hasan, Khalid Raza
Abstract:
Reverse engineering of genetic regulatory network involves the modeling of the given gene expression data into a form of the network. Computationally it is possible to have the relationships between genes, so called gene regulatory networks (GRNs), that can help to find the genomics and proteomics based diagnostic approach for any disease. In this paper, clustering based method has been used to reconstruct genetic regulatory network from time series gene expression data. Supercoiled data set from Escherichia coli has been taken to demonstrate the proposed method.
Keywords: Gene expression, gene regulatory networks (GRNs), clustering, data preprocessing, network visualization.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 21526995 Estimating Bridge Deterioration for Small Data Sets Using Regression and Markov Models
Authors: Yina F. Muñoz, Alexander Paz, Hanns De La Fuente-Mella, Joaquin V. Fariña, Guilherme M. Sales
Abstract:
The primary approach for estimating bridge deterioration uses Markov-chain models and regression analysis. Traditional Markov models have problems in estimating the required transition probabilities when a small sample size is used. Often, reliable bridge data have not been taken over large periods, thus large data sets may not be available. This study presents an important change to the traditional approach by using the Small Data Method to estimate transition probabilities. The results illustrate that the Small Data Method and traditional approach both provide similar estimates; however, the former method provides results that are more conservative. That is, Small Data Method provided slightly lower than expected bridge condition ratings compared with the traditional approach. Considering that bridges are critical infrastructures, the Small Data Method, which uses more information and provides more conservative estimates, may be more appropriate when the available sample size is small. In addition, regression analysis was used to calculate bridge deterioration. Condition ratings were determined for bridge groups, and the best regression model was selected for each group. The results obtained were very similar to those obtained when using Markov chains; however, it is desirable to use more data for better results.
Keywords: Concrete bridges, deterioration, Markov chains, probability matrix.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 14406994 A Conceptual Query-Driven Design Framework for Data Warehouse
Authors: Resmi Nair, Campbell Wilson, Bala Srinivasan
Abstract:
Data warehouse is a dedicated database used for querying and reporting. Queries in this environment show special characteristics such as multidimensionality and aggregation. Exploiting the nature of queries, in this paper we propose a query driven design framework. The proposed framework is general and allows a designer to generate a schema based on a set of queries.Keywords: Conceptual schema, data warehouse, queries, requirements.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 20066993 A Prototype of Augmented Reality for Visualising Large Sensors’ Datasets
Authors: Folorunso Olufemi Ayinde, Mohd Shahrizal Sunar, Sarudin Kari, Dzulkifli Mohamad
Abstract:
In this paper we discuss the development of an Augmented Reality (AR) - based scientific visualization system prototype that supports identification, localisation, and 3D visualisation of oil leakages sensors datasets. Sensors generates significant amount of multivariate datasets during normal and leak situations. Therefore we have developed a data model to effectively manage such data and enhance the computational support needed for the effective data explorations. A challenge of this approach is to reduce the data inefficiency powered by the disparate, repeated, inconsistent and missing attributes of most available sensors datasets. To handle this challenge, this paper aim to develop an AR-based scientific visualization interface which automatically identifies, localise and visualizes all necessary data relevant to a particularly selected region of interest (ROI) along the virtual pipeline network. Necessary system architectural supports needed as well as the interface requirements for such visualizations are also discussed in this paper.
Keywords: Sensor Leakages Datasets, Augmented Reality, Sensor Data-Model, Scientific Visualization.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 16806992 Personalized Applications for Advanced Healthcare through AI-ML and Blockchain
Authors: Anuja Vyas, Aikel Indurkhya, Hari Krishna Garg
Abstract:
Nearly 25 years have passed since the landmark publication of the Human Genome Project, yet scientists have only begun to scratch the surface of its potential benefits. To bridge this gap, a personalized genomic application has been envisioned as a transformative tool accessible to people worldwide. This innovative solution proposes an integrated framework combining blockchain technology, genome-specific applications, and data compression techniques, ensuring operations to be swift, secure, transparent, and space-efficient. The software harnesses advanced Artificial Intelligence and Machine Learning methodologies, such as neural networks, evaluation matrices, fuzzy logic, and expert systems, to analyze individual genomic data. It generates personalized reports by comparing a user's genome with a reference genome, highlighting significant differences. Blockchain technology, with its inherent security, encryption, and immutability features, is leveraged for robust data transport and storage. In addition, a 'Data Abbreviation' technique ensures that genetic data and reports occupy minimal space. This integrated approach promises to be a significant leap forward, potentially transforming human health and well-being on a global scale.
Keywords: Artificial intelligence in genomics, blockchain technology, data abbreviation, data compression, data security in genomics, data storage, expert systems, fuzzy logic, genome applications, genomic data analysis, human genome project, neural networks, personalized genomics.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 406991 Secure Cryptographic Operations on SIM Card for Mobile Financial Services
Authors: Kerem Ok, Serafettin Senturk, Serdar Aktas, Cem Cevikbas
Abstract:
Mobile technology is very popular nowadays and it provides a digital world where users can experience many value-added services. Service Providers are also eager to offer diverse value-added services to users such as digital identity, mobile financial services and so on. In this context, the security of data storage in smartphones and the security of communication between the smartphone and service provider are critical for the success of these services. In order to provide the required security functions, the SIM card is one acceptable alternative. Since SIM cards include a Secure Element, they are able to store sensitive data, create cryptographically secure keys, encrypt and decrypt data. In this paper, we design and implement a SIM and a smartphone framework that uses a SIM card for secure key generation, key storage, data encryption, data decryption and digital signing for mobile financial services. Our frameworks show that the SIM card can be used as a controlled Secure Element to provide required security functions for popular e-services such as mobile financial services.Keywords: SIM Card, mobile financial services, cryptography, secure data storage.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 20656990 A Soft Systems Methodology Perspective on Data Warehousing Education Improvement
Abstract:
This paper demonstrates how the soft systems methodology can be used to improve the delivery of a module in data warehousing for fourth year information technology students. Graduates in information technology needs to have academic skills but also needs to have good practical skills to meet the skills requirements of the information technology industry. In developing and improving current data warehousing education modules one has to find a balance in meeting the expectations of various role players such as the students themselves, industry and academia. The soft systems methodology, developed by Peter Checkland, provides a methodology for facilitating problem understanding from different world views. In this paper it is demonstrated how the soft systems methodology can be used to plan the improvement of data warehousing education for fourth year information technology students.Keywords: Data warehousing, education, soft systems methodology, stakeholders, systems thinking.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 17076989 Security Architecture for At-Home Medical Care Using Sensor Network
Authors: S.S.Mohanavalli, Sheila Anand
Abstract:
This paper proposes a novel architecture for At- Home medical care which enables senior citizens, patients with chronic ailments and patients requiring post- operative care to be remotely monitored in the comfort of their homes. This architecture is implemented using sensors and wireless networking for transmitting patient data to the hospitals, health- care centers for monitoring by medical professionals. Patients are equipped with sensors to measure their physiological parameters, like blood pressure, pulse rate etc. and a Wearable Data Acquisition Unit is used to transmit the patient sensor data. Medical professionals can be alerted to any abnormal variations in these values for diagnosis and suitable treatment. Security threats and challenges inherent to wireless communication and sensor network have been discussed and a security mechanism to ensure data confidentiality and source authentication has been proposed. Symmetric key algorithm AES has been used for encrypting the data and a patent-free, two-pass block cipher mode CCFB has been used for implementing semantic security.Keywords: data confidentiality, integrity, remotemonitoring, source authentication
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 17426988 Machine Learning Facing Behavioral Noise Problem in an Imbalanced Data Using One Side Behavioral Noise Reduction: Application to a Fraud Detection
Authors: Salma El Hajjami, Jamal Malki, Alain Bouju, Mohammed Berrada
Abstract:
With the expansion of machine learning and data mining in the context of Big Data analytics, the common problem that affects data is class imbalance. It refers to an imbalanced distribution of instances belonging to each class. This problem is present in many real world applications such as fraud detection, network intrusion detection, medical diagnostics, etc. In these cases, data instances labeled negatively are significantly more numerous than the instances labeled positively. When this difference is too large, the learning system may face difficulty when tackling this problem, since it is initially designed to work in relatively balanced class distribution scenarios. Another important problem, which usually accompanies these imbalanced data, is the overlapping instances between the two classes. It is commonly referred to as noise or overlapping data. In this article, we propose an approach called: One Side Behavioral Noise Reduction (OSBNR). This approach presents a way to deal with the problem of class imbalance in the presence of a high noise level. OSBNR is based on two steps. Firstly, a cluster analysis is applied to groups similar instances from the minority class into several behavior clusters. Secondly, we select and eliminate the instances of the majority class, considered as behavioral noise, which overlap with behavior clusters of the minority class. The results of experiments carried out on a representative public dataset confirm that the proposed approach is efficient for the treatment of class imbalances in the presence of noise.Keywords: Machine learning, Imbalanced data, Data mining, Big data.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 11376987 Content Based Sampling over Transactional Data Streams
Authors: Mansour Tarafdar, Mohammad Saniee Abade
Abstract:
This paper investigates the problem of sampling from transactional data streams. We introduce CFISDS as a content based sampling algorithm that works on a landmark window model of data streams and preserve more informed sample in sample space. This algorithm that work based on closed frequent itemset mining tasks, first initiate a concept lattice using initial data, then update lattice structure using an incremental mechanism.Incremental mechanism insert, update and delete nodes in/from concept lattice in batch manner. Presented algorithm extracts the final samples on demand of user. Experimental results show the accuracy of CFISDS on synthetic and real datasets, despite on CFISDS algorithm is not faster than exist sampling algorithms such as Z and DSS.
Keywords: Sampling, data streams, closed frequent item set mining.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 17096986 An Automatic Tool for Checking Consistency between Data Flow Diagrams (DFDs)
Authors: Rosziati Ibrahim, Siow Yen Yen
Abstract:
System development life cycle (SDLC) is a process uses during the development of any system. SDLC consists of four main phases: analysis, design, implement and testing. During analysis phase, context diagram and data flow diagrams are used to produce the process model of a system. A consistency of the context diagram to lower-level data flow diagrams is very important in smoothing up developing process of a system. However, manual consistency check from context diagram to lower-level data flow diagrams by using a checklist is time-consuming process. At the same time, the limitation of human ability to validate the errors is one of the factors that influence the correctness and balancing of the diagrams. This paper presents a tool that automates the consistency check between Data Flow Diagrams (DFDs) based on the rules of DFDs. The tool serves two purposes: as an editor to draw the diagrams and as a checker to check the correctness of the diagrams drawn. The consistency check from context diagram to lower-level data flow diagrams is embedded inside the tool to overcome the manual checking problem.Keywords: Data Flow Diagram, Context Diagram, ConsistencyCheck, Syntax and Semantic Rules
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 34396985 Real-Time Implementation of STANAG 4539 High-Speed HF Modem
Authors: S. Saraç, F. Kara, C.Vural
Abstract:
High-frequency (HF) communications have been used by military organizations for more than 90 years. The opportunity of very long range communications without the need for advanced equipment makes HF a convenient and inexpensive alternative of satellite communications. Besides the advantages, voice and data transmission over HF is a challenging task, because the HF channel generally suffers from Doppler shift and spread, multi-path, cochannel interference, and many other sources of noise. In constructing an HF data modem, all these effects must be taken into account. STANAG 4539 is a NATO standard for high-speed data transmission over HF. It allows data rates up to 12800 bps over an HF channel of 3 kHz. In this work, an efficient implementation of STANAG 4539 on a single Texas Instruments- TMS320C6747 DSP chip is described. The state-of-the-art algorithms used in the receiver and the efficiency of the implementation enables real-time high-speed data / digitized voice transmission over poor HF channels.
Keywords: High frequency, modem, STANAG 4539.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 53416984 Detection Efficient Enterprises via Data Envelopment Analysis
Authors: S. Turkan
Abstract:
In this paper, the Turkey’s Top 500 Industrial Enterprises data in 2014 were analyzed by data envelopment analysis. Data envelopment analysis is used to detect efficient decision-making units such as universities, hospitals, schools etc. by using inputs and outputs. The decision-making units in this study are enterprises. To detect efficient enterprises, some financial ratios are determined as inputs and outputs. For this reason, financial indicators related to productivity of enterprises are considered. The efficient foreign weighted owned capital enterprises are detected via super efficiency model. According to the results, it is said that Mercedes-Benz is the most efficient foreign weighted owned capital enterprise in Turkey.Keywords: Data envelopment analysis, super efficiency, financial ratios, BCC model.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 8766983 Fusion of ETM+ Multispectral and Panchromatic Texture for Remote Sensing Classification
Authors: Mahesh Pal
Abstract:
This paper proposes to use ETM+ multispectral data and panchromatic band as well as texture features derived from the panchromatic band for land cover classification. Four texture features including one 'internal texture' and three GLCM based textures namely correlation, entropy, and inverse different moment were used in combination with ETM+ multispectral data. Two data sets involving combination of multispectral, panchromatic band and its texture were used and results were compared with those obtained by using multispectral data alone. A decision tree classifier with and without boosting were used to classify different datasets. Results from this study suggest that the dataset consisting of panchromatic band, four of its texture features and multispectral data was able to increase the classification accuracy by about 2%. In comparison, a boosted decision tree was able to increase the classification accuracy by about 3% with the same dataset.Keywords: Internal texture; GLCM; decision tree; boosting; classification accuracy.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 17366982 A Formal Approach for Instructional Design Integrated with Data Visualization for Learning Analytics
Authors: Douglas A. Menezes, Isabel D. Nunes, Ulrich Schiel
Abstract:
Most Virtual Learning Environments do not provide support mechanisms for the integrated planning, construction and follow-up of Instructional Design supported by Learning Analytic results. The present work aims to present an authoring tool that will be responsible for constructing the structure of an Instructional Design (ID), without the data being altered during the execution of the course. The visual interface aims to present the critical situations present in this ID, serving as a support tool for the course follow-up and possible improvements, which can be made during its execution or in the planning of a new edition of this course. The model for the ID is based on High-Level Petri Nets and the visualization forms are determined by the specific kind of the data generated by an e-course, a population of students generating sequentially dependent data.
Keywords: Educational data visualization, high-level petri nets, instructional design, learning analytics.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 8486981 Visual Text Analytics Technologies for Real-Time Big Data: Chronological Evolution and Issues
Authors: Siti Azrina B. A. Aziz, Siti Hafizah A. Hamid
Abstract:
New approaches to analyze and visualize data stream in real-time basis is important in making a prompt decision by the decision maker. Financial market trading and surveillance, large-scale emergency response and crowd control are some example scenarios that require real-time analytic and data visualization. This situation has led to the development of techniques and tools that support humans in analyzing the source data. With the emergence of Big Data and social media, new techniques and tools are required in order to process the streaming data. Today, ranges of tools which implement some of these functionalities are available. In this paper, we present chronological evolution evaluation of technologies for supporting of real-time analytic and visualization of the data stream. Based on the past research papers published from 2002 to 2014, we gathered the general information, main techniques, challenges and open issues. The techniques for streaming text visualization are identified based on Text Visualization Browser in chronological order. This paper aims to review the evolution of streaming text visualization techniques and tools, as well as to discuss the problems and challenges for each of identified tools.Keywords: Information visualization, visual analytics, text mining, visual text analytics tools, big data visualization.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 10026980 Churn Prediction for Telecommunication Industry Using Artificial Neural Networks
Authors: Ulas Vural, M. Ergun Okay, E. Mesut Yildiz
Abstract:
Telecommunication service providers demand accurate and precise prediction of customer churn probabilities to increase the effectiveness of their customer relation services. The large amount of customer data owned by the service providers is suitable for analysis by machine learning methods. In this study, expenditure data of customers are analyzed by using an artificial neural network (ANN). The ANN model is applied to the data of customers with different billing duration. The proposed model successfully predicts the churn probabilities at 83% accuracy for only three months expenditure data and the prediction accuracy increases up to 89% when the nine month data is used. The experiments also show that the accuracy of ANN model increases on an extended feature set with information of the changes on the bill amounts.Keywords: Customer relationship management, churn prediction, telecom industry, deep learning, Artificial Neural Networks, ANN.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 7606979 A Technical Perspective on Roadway Safety in Eastern Province: Data Evaluation and Spatial Analysis
Authors: Muhammad Farhan, Sayed Faruque, Amr Mohammed, Sami Osman, Omar Al-Jabari, Abdul Almojil
Abstract:
Saudi Arabia in recent years has seen drastic increase in traffic related crashes. With population of over 29 million, Saudi Arabia is considered as a fast growing and emerging economy. The rapid population increase and economic growth has resulted in rapid expansion of transportation infrastructure, which has led to increase in road crashes. Saudi Ministry of Interior reported more than 7,000 people killed and 68,000 injured in 2011 ranking Saudi Arabia to be one of the worst worldwide in traffic safety. The traffic safety issues in the country also result in distress to road users and cause and economic loss exceeding 3.7 billion Euros annually. Keeping this in view, the researchers in Saudi Arabia are investigating ways to improve traffic safety conditions in the country. This paper presents a multilevel approach to collect traffic safety related data required to do traffic safety studies in the region. Two highway corridors including King Fahd Highway 39 kilometre and Gulf Cooperation Council Highway 42 kilometre long connecting the cities of Dammam and Khobar were selected as a study area. Traffic data collected included traffic counts, crash data, travel time data, and speed data. The collected data was analysed using geographic information system to evaluate any correlation. Further research is needed to investigate the effectiveness of traffic safety related data when collected in a concerted effort.
Keywords: Crash Data, Data Collection, Traffic Safety.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 23516978 Machine Scoring Model Using Data Mining Techniques
Authors: Wimalin S. Laosiritaworn, Pongsak Holimchayachotikul
Abstract:
this article proposed a methodology for computer numerical control (CNC) machine scoring. The case study company is a manufacturer of hard disk drive parts in Thailand. In this company, sample of parts manufactured from CNC machine are usually taken randomly for quality inspection. These inspection data were used to make a decision to shut down the machine if it has tendency to produce parts that are out of specification. Large amount of data are produced in this process and data mining could be very useful technique in analyzing them. In this research, data mining techniques were used to construct a machine scoring model called 'machine priority assessment model (MPAM)'. This model helps to ensure that the machine with higher risk of producing defective parts be inspected before those with lower risk. If the defective prone machine is identified sooner, defective part and rework could be reduced hence improving the overall productivity. The results showed that the proposed method can be successfully implemented and approximately 351,000 baht of opportunity cost could have saved in the case study company.Keywords: Computer Numerical Control, Data Mining, HardDisk Drive.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 13956977 The Impact of Seasonality on Rainfall Patterns: A Case Study
Authors: Priti Kaushik, Randhir Singh Baghel, Somil Khandelwal
Abstract:
This study uses whole-year data from Rajasthan, India, at the meteorological divisional level to analyze and evaluate long-term spatiotemporal trends in rainfall and looked at the data from each of the thirteen tehsils in the Jaipur district to see how the rainfall pattern has altered over the last 10 years. Data on daily rainfall from the Indian Meteorological Department (IMD) in Jaipur are available for the years 2012 through 2021. We mainly focus on comparing data of tehsil wise in the Jaipur district, Rajasthan, India. Also analyzed is the fact that July and August always see higher rainfall than any other month. Rainfall usually starts to rise around week 25th and peaks in weeks 32nd or 33rd. They showed that on several occasions, 2017 saw the least amount of rainfall during a long span of 10 years. The greatest rain fell between 2012 and 2021 in 2013, 2019, and 2020.
Keywords: Data analysis, extreme events, rainfall, descriptive case studies, precipitation temperature.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1906976 Enhance the Power of Sentiment Analysis
Authors: Yu Zhang, Pedro Desouza
Abstract:
Since big data has become substantially more accessible and manageable due to the development of powerful tools for dealing with unstructured data, people are eager to mine information from social media resources that could not be handled in the past. Sentiment analysis, as a novel branch of text mining, has in the last decade become increasingly important in marketing analysis, customer risk prediction and other fields. Scientists and researchers have undertaken significant work in creating and improving their sentiment models. In this paper, we present a concept of selecting appropriate classifiers based on the features and qualities of data sources by comparing the performances of five classifiers with three popular social media data sources: Twitter, Amazon Customer Reviews, and Movie Reviews. We introduced a couple of innovative models that outperform traditional sentiment classifiers for these data sources, and provide insights on how to further improve the predictive power of sentiment analysis. The modeling and testing work was done in R and Greenplum in-database analytic tools.
Keywords: Sentiment Analysis, Social Media, Twitter, Amazon, Data Mining, Machine Learning, Text Mining.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 35186975 Calibration of 2D and 3D Optical Measuring Instruments in Industrial Environments at Submillimeter Range
Authors: A. Mínguez-Martínez, J. de Vicente
Abstract:
Modern manufacturing processes have led to the miniaturization of systems and, as a result, parts at the micro and nanoscale are produced. This trend seems to become increasingly important in the near future. Besides, as a requirement of Industry 4.0, the digitalization of the models of production and processes makes it very important to ensure that the dimensions of newly manufactured parts meet the specifications of the models. Therefore, it is possible to reduce the scrap and the cost of non-conformities, ensuring the stability of the production at the same time. To ensure the quality of manufactured parts, it becomes necessary to carry out traceable measurements at scales lower than one millimeter. Providing adequate traceability to the SI unit of length (the meter) to 2D and 3D measurements at this scale is a problem that does not have a unique solution in industrial environments. Researchers in the field of dimensional metrology all around the world are working on this issue. A solution for industrial environments, even if it is not complete, will enable working with some traceability. At this point, we believe that the study of the surfaces could provide us with a first approximation to a solution. In this paper, we propose a calibration procedure for the scales of optical measuring instruments, particularizing for a confocal microscope, using material standards easy to find and calibrate in metrology and quality laboratories in industrial environments. Confocal microscopes are measuring instruments capable of filtering the out-of-focus reflected light so that when it reaches the detector, it is possible to take pictures of the part of the surface that is focused. Varying and taking pictures at different Z levels of the focus, a specialized software interpolates between the different planes, and it could reconstruct the surface geometry into a 3D model. As it is easy to deduce, it is necessary to give traceability to each axis. As a complementary result, the roughness Ra parameter will be traced to the reference. Although the solution is designed for a confocal microscope, it may be used for the calibration of other optical measuring instruments, by applying minor changes.
Keywords: Industrial environment, confocal microscope, optical measuring instrument, traceability.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 4106974 Lineup Optimization Model of Basketball Players Based on the Prediction of Recursive Neural Networks
Authors: Wang Yichen, Haruka Yamashita
Abstract:
In recent years, in the field of sports, decision making such as member in the game and strategy of the game based on then analysis of the accumulated sports data are widely attempted. In fact, in the NBA basketball league where the world's highest level players gather, to win the games, teams analyze the data using various statistical techniques. However, it is difficult to analyze the game data for each play such as the ball tracking or motion of the players in the game, because the situation of the game changes rapidly, and the structure of the data should be complicated. Therefore, it is considered that the analysis method for real time game play data is proposed. In this research, we propose an analytical model for "determining the optimal lineup composition" using the real time play data, which is considered to be difficult for all coaches. In this study, because replacing the entire lineup is too complicated, and the actual question for the replacement of players is "whether or not the lineup should be changed", and “whether or not Small Ball lineup is adopted”. Therefore, we propose an analytical model for the optimal player selection problem based on Small Ball lineups. In basketball, we can accumulate scoring data for each play, which indicates a player's contribution to the game, and the scoring data can be considered as a time series data. In order to compare the importance of players in different situations and lineups, we combine RNN (Recurrent Neural Network) model, which can analyze time series data, and NN (Neural Network) model, which can analyze the situation on the field, to build the prediction model of score. This model is capable to identify the current optimal lineup for different situations. In this research, we collected all the data of accumulated data of NBA from 2019-2020. Then we apply the method to the actual basketball play data to verify the reliability of the proposed model.Keywords: Recurrent Neural Network, players lineup, basketball data, decision making model.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 8286973 New Multisensor Data Fusion Method Based on Probabilistic Grids Representation
Authors: Zhichao Zhao, Yi Liu, Shunping Xiao
Abstract:
A new data fusion method called joint probability density matrix (JPDM) is proposed, which can associate and fuse measurements from spatially distributed heterogeneous sensors to identify the real target in a surveillance region. Using the probabilistic grids representation, we numerically combine the uncertainty regions of all the measurements in a general framework. The NP-hard multisensor data fusion problem has been converted to a peak picking problem in the grids map. Unlike most of the existing data fusion method, the JPDM method dose not need association processing, and will not lead to combinatorial explosion. Its convergence to the CRLB with a diminishing grid size has been proved. Simulation results are presented to illustrate the effectiveness of the proposed technique.
Keywords: Cramer-Rao lower bound (CRLB), data fusion, probabilistic grids, joint probability density matrix, localization, sensor network.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1803