Search results for: Data mining andInformation Extraction
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 8066

Search results for: Data mining andInformation Extraction

7256 Missing Link Data Estimation with Recurrent Neural Network: An Application Using Speed Data of Daegu Metropolitan Area

Authors: JaeHwan Yang, Da-Woon Jeong, Seung-Young Kho, Dong-Kyu Kim

Abstract:

In terms of ITS, information on link characteristic is an essential factor for plan or operation. But in practical cases, not every link has installed sensors on it. The link that does not have data on it is called “Missing Link”. The purpose of this study is to impute data of these missing links. To get these data, this study applies the machine learning method. With the machine learning process, especially for the deep learning process, missing link data can be estimated from present link data. For deep learning process, this study uses “Recurrent Neural Network” to take time-series data of road. As input data, Dedicated Short-range Communications (DSRC) data of Dalgubul-daero of Daegu Metropolitan Area had been fed into the learning process. Neural Network structure has 17 links with present data as input, 2 hidden layers, for 1 missing link data. As a result, forecasted data of target link show about 94% of accuracy compared with actual data.

Keywords: Data Estimation, link data, machine learning, road network.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1475
7255 CNet Module Design of IMCS

Authors: Youkyung Park, SeungYup Kang, SungHo Kim, SimKyun Yook

Abstract:

IMCS is Integrated Monitoring and Control System for thermal power plant. This system consists of mainly two parts; controllers and OIS (Operator Interface System). These two parts are connected by Ethernet-based communication. The controller side of communication is managed by CNet module and OIS side is managed by data server of OIS. CNet module sends the data of controller to data server and receives commend data from data server. To minimizes or balance the load of data server, this module buffers data created by controller at every cycle and send buffered data to data server on request of data server. For multiple data server, this module manages the connection line with each data server and response for each request from multiple data server. CNet module is included in each controller of redundant system. When controller fail-over happens on redundant system, this module can provide data of controller to data sever without loss. This paper presents three main features – separation of get task, usage of ring buffer and monitoring communication status –of CNet module to carry out these functions.

Keywords: Ethernet communication, DCS, power plant, ring buffer, data integrity

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1545
7254 Effects of Process Parameters on the Yield of Oil from Coconut Fruit

Authors: Ndidi F. Amulu, Godian O. Mbah, Maxwel I. Onyiah, Callistus N. Ude

Abstract:

Analysis of the properties of coconut (Cocos nucifera) and its oil was evaluated in this work using standard analytical techniques. The analyses carried out include proximate composition of the fruit, extraction of oil from the fruit using different process parameters and physicochemical analysis of the extracted oil. The results showed the percentage (%) moisture, crude lipid, crude protein, ash and carbohydrate content of the coconut as 7.59, 55.15, 5.65, 7.35 and 19.51 respectively. The oil from the coconut fruit was odourless and yellowish liquid at room temperature (30oC). The treatment combinations used (leaching time, leaching temperature and solute: solvent ratio) showed significant differences (P<0.05) in the yield of oil from coconut flour. The oil yield ranged between 36.25%-49.83%. Lipid indices of the coconut oil indicated the acid value (AV) as 10.05Na0H/g of oil, free fatty acid (FFA) as 5.03%, saponification values (SV) as 183.26mgKOH-1g of oil, iodine value (IV) as 81.00 I2/g of oil, peroxide value (PV) as 5.00 ml/ g of oil and viscosity (V) as 0.002. A standard statistical package minitab version 16.0 program was used in the regression analysis and analysis of variance (ANOVA). The statistical software mentioned above was also used to generate various plots such as single effect plot, interactions effect plot and contour plot. The response or yield of oil from the coconut flour was used to develop a mathematical model that correlates the yield to the process variables studied. The maximum conditions obtained that gave the highest yield of coconut oil were leaching time of 2hrs, leaching temperature of 50oC and solute/solvent ratio of 0.05g/ml.

Keywords: Coconut, oil-extraction, optimization, physicochemical, proximate.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2630
7253 Case Studies of CSAMT Method Applied to Study of Complex Rock Mass Structure and Hidden Tectonic

Authors: Yuxin Chen, Qingyun Di, C. Dinis da Gama

Abstract:

In projects like waterpower, transportation and mining, etc., proving up the rock-mass structure and hidden tectonic to estimate the geological body-s activity is very important. Integrating the seismic results, drilling and trenching data, CSAMT method was carried out at a planning dame site in southwest China to evaluate the stability of a deformation. 2D and imitated 3D inversion resistivity results of CSAMT method were analyzed. The results indicated that CSAMT was an effective method for defining an outline of deformation body to several hundred meters deep; the Lung Pan Deformation was stable in natural conditions; but uncertain after the future reservoir was impounded. This research presents a good case study of the fine surveying and research on complex geological structure and hidden tectonic in engineering project.

Keywords: CSAMT Surveying, Deformation Stability.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2420
7252 Big Data: Concepts, Technologies and Applications in the Public Sector

Authors: A. Alexandru, C. A. Alexandru, D. Coardos, E. Tudora

Abstract:

Big Data (BD) is associated with a new generation of technologies and architectures which can harness the value of extremely large volumes of very varied data through real time processing and analysis. It involves changes in (1) data types, (2) accumulation speed, and (3) data volume. This paper presents the main concepts related to the BD paradigm, and introduces architectures and technologies for BD and BD sets. The integration of BD with the Hadoop Framework is also underlined. BD has attracted a lot of attention in the public sector due to the newly emerging technologies that allow the availability of network access. The volume of different types of data has exponentially increased. Some applications of BD in the public sector in Romania are briefly presented.

Keywords: Big data, big data Analytics, Hadoop framework, cloud computing.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2291
7251 Strengthening Legal Protection of Personal Data through Technical Protection Regulation in Line with Human Rights

Authors: Tomy Prihananto, Damar Apri Sudarmadi

Abstract:

Indonesia recognizes the right to privacy as a human right. Indonesia provides legal protection against data management activities because the protection of personal data is a part of human rights. This paper aims to describe the arrangement of data management and data management in Indonesia. This paper is a descriptive research with qualitative approach and collecting data from literature study. Results of this paper are comprehensive arrangement of data that have been set up as a technical requirement of data protection by encryption methods. Arrangements on encryption and protection of personal data are mutually reinforcing arrangements in the protection of personal data. Indonesia has two important and immediately enacted laws that provide protection for the privacy of information that is part of human rights.

Keywords: Indonesia, protection, personal data, privacy, human rights, encryption.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 958
7250 Pattern Discovery from Student Feedback: Identifying Factors to Improve Student Emotions in Learning

Authors: Angelina A. Tzacheva, Jaishree Ranganathan

Abstract:

Interest in (STEM) Science Technology Engineering Mathematics education especially Computer Science education has seen a drastic increase across the country. This fuels effort towards recruiting and admitting a diverse population of students. Thus the changing conditions in terms of the student population, diversity and the expected teaching and learning outcomes give the platform for use of Innovative Teaching models and technologies. It is necessary that these methods adapted should also concentrate on raising quality of such innovations and have positive impact on student learning. Light-Weight Team is an Active Learning Pedagogy, which is considered to be low-stake activity and has very little or no direct impact on student grades. Emotion plays a major role in student’s motivation to learning. In this work we use the student feedback data with emotion classification using surveys at a public research institution in the United States. We use Actionable Pattern Discovery method for this purpose. Actionable patterns are patterns that provide suggestions in the form of rules to help the user achieve better outcomes. The proposed method provides meaningful insight in terms of changes that can be incorporated in the Light-Weight team activities, resources utilized in the course. The results suggest how to enhance student emotions to a more positive state, in particular focuses on the emotions ‘Trust’ and ‘Joy’.

Keywords: Actionable pattern discovery, education, emotion, data mining.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 493
7249 Discriminant Analysis as a Function of Predictive Learning to Select Evolutionary Algorithms in Intelligent Transportation System

Authors: Jorge A. Ruiz-Vanoye, Ocotlán Díaz-Parra, Alejandro Fuentes-Penna, Daniel Vélez-Díaz, Edith Olaco García

Abstract:

In this paper, we present the use of the discriminant analysis to select evolutionary algorithms that better solve instances of the vehicle routing problem with time windows. We use indicators as independent variables to obtain the classification criteria, and the best algorithm from the generic genetic algorithm (GA), random search (RS), steady-state genetic algorithm (SSGA), and sexual genetic algorithm (SXGA) as the dependent variable for the classification. The discriminant classification was trained with classic instances of the vehicle routing problem with time windows obtained from the Solomon benchmark. We obtained a classification of the discriminant analysis of 66.7%.

Keywords: Intelligent transportation systems, data-mining techniques, evolutionary algorithms, discriminant analysis, machine learning.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1529
7248 Big Brain: A Single Database System for a Federated Data Warehouse Architecture

Authors: X. Gumara Rigol, I. Martínez de Apellaniz Anzuola, A. Garcia Serrano, A. Franzi Cros, O. Vidal Calbet, A. Al Maruf

Abstract:

Traditional federated architectures for data warehousing work well when corporations have existing regional data warehouses and there is a need to aggregate data at a global level. Schibsted Media Group has been maturing from a decentralised organisation into a more globalised one and needed to build both some of the regional data warehouses for some brands at the same time as the global one. In this paper, we present the architectural alternatives studied and why a custom federated approach was the notable recommendation to go further with the implementation. Although the data warehouses are logically federated, the implementation uses a single database system which presented many advantages like: cost reduction and improved data access to global users allowing consumers of the data to have a common data model for detailed analysis across different geographies and a flexible layer for local specific needs in the same place.

Keywords: Data integration, data warehousing, federated architecture, online analytical processing.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 688
7247 An In-Depth Analysis of Open Data Portals as an Emerging Public E-Service

Authors: Martin Lnenicka

Abstract:

Governments collect and produce large amounts of data. Increasingly, governments worldwide have started to implement open data initiatives and also launch open data portals to enable the release of these data in open and reusable formats. Therefore, a large number of open data repositories, catalogues and portals have been emerging in the world. The greater availability of interoperable and linkable open government data catalyzes secondary use of such data, so they can be used for building useful applications which leverage their value, allow insight, provide access to government services, and support transparency. The efficient development of successful open data portals makes it necessary to evaluate them systematic, in order to understand them better and assess the various types of value they generate, and identify the required improvements for increasing this value. Thus, the attention of this paper is directed particularly to the field of open data portals. The main aim of this paper is to compare the selected open data portals on the national level using content analysis and propose a new evaluation framework, which further improves the quality of these portals. It also establishes a set of considerations for involving businesses and citizens to create eservices and applications that leverage on the datasets available from these portals.

Keywords: Big data, content analysis, criteria comparison, data quality, open data, open data portals, public sector.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3048
7246 Analysis of Electrocardiograph (ECG) Signal for the Detection of Abnormalities Using MATLAB

Authors: Durgesh Kumar Ojha, Monica Subashini

Abstract:

The proposed method is to study and analyze Electrocardiograph (ECG) waveform to detect abnormalities present with reference to P, Q, R and S peaks. The first phase includes the acquisition of real time ECG data. In the next phase, generation of signals followed by pre-processing. Thirdly, the procured ECG signal is subjected to feature extraction. The extracted features detect abnormal peaks present in the waveform Thus the normal and abnormal ECG signal could be differentiated based on the features extracted. The work is implemented in the most familiar multipurpose tool, MATLAB. This software efficiently uses algorithms and techniques for detection of any abnormalities present in the ECG signal. Proper utilization of MATLAB functions (both built-in and user defined) can lead us to work with ECG signals for processing and analysis in real time applications. The simulation would help in improving the accuracy and the hardware could be built conveniently.

Keywords: ECG Waveform, Peak Detection, Arrhythmia, Matlab.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 11968
7245 Highlighting Document's Structure

Authors: Sylvie Ratté, Wilfried Njomgue, Pierre-André Ménard

Abstract:

In this paper, we present symbolic recognition models to extract knowledge characterized by document structures. Focussing on the extraction and the meticulous exploitation of the semantic structure of documents, we obtain a meaningful contextual tagging corresponding to different unit types (title, chapter, section, enumeration, etc.).

Keywords: Information retrieval, document structures, symbolic grammars.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1209
7244 Multi-View Neural Network Based Gait Recognition

Authors: Saeid Fazli, Hadis Askarifar, Maryam Sheikh Shoaie

Abstract:

Human identification at a distance has recently gained growing interest from computer vision researchers. Gait recognition aims essentially to address this problem by identifying people based on the way they walk [1]. Gait recognition has 3 steps. The first step is preprocessing, the second step is feature extraction and the third one is classification. This paper focuses on the classification step that is essential to increase the CCR (Correct Classification Rate). Multilayer Perceptron (MLP) is used in this work. Neural Networks imitate the human brain to perform intelligent tasks [3].They can represent complicated relationships between input and output and acquire knowledge about these relationships directly from the data [2]. In this paper we apply MLP NN for 11 views in our database and compare the CCR values for these views. Experiments are performed with the NLPR databases, and the effectiveness of the proposed method for gait recognition is demonstrated.

Keywords: Human motion analysis, biometrics, gait recognition, principal component analysis, MLP neural network.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2082
7243 File System-Based Data Protection Approach

Authors: Jaechun No

Abstract:

As data to be stored in storage subsystems tremendously increases, data protection techniques have become more important than ever, to provide data availability and reliability. In this paper, we present the file system-based data protection (WOWSnap) that has been implemented using WORM (Write-Once-Read-Many) scheme. In the WOWSnap, once WORM files have been created, only the privileged read requests to them are allowed to protect data against any intentional/accidental intrusions. Furthermore, all WORM files are related to their protection cycle that is a time period during which WORM files should securely be protected. Once their protection cycle is expired, the WORM files are automatically moved to the general-purpose data section without any user interference. This prevents the WORM data section from being consumed by unnecessary files. We evaluated the performance of WOWSnap on Linux cluster.

Keywords: Data protection, Protection cycle, WORM

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1644
7242 Piezoelectric Bimorph Harvester Based on Different Lead Zirconate Titanate Materials to Enhance Energy Collection

Authors: Irene Perez-Alfaro, Nieves Murillo, Carlos Bernal, Daniel Gil-Hernandez

Abstract:

Nowadays, the increasing applicability of internet of things (IoT) systems has changed the way that the world around is perceived. The massive interconnection of systems by means of sensing, processing and communication, allows multitude of data to be at our fingertips. In this way, countless advances have been made in different fields such as personal care, predictive maintenance in industry, quality control in production processes, security, and in everything imaginable. However, all these electronic systems have in common the need to be electrically powered. In this context, batteries and wires are the most commonly used solutions, but they are not a definitive solution in some applications, because of the attainability, the serviceability, or the performance requirements. Therefore, the need arises to look for other types of solutions based on energy harvesting and long-life electronics. Energy Harvesting can be defined as the action of capturing energy from the environment and store it for an instantaneous use or later use. Among the materials capable of harvesting energy from the environment, such as thermoelectrics, electromagnetics, photovoltaics or triboelectrics, the most suitable is the piezoelectric material. The phenomenon of piezoelectricity is one of the most powerful sources for energy harvesting, ranging from a few micro wats to hundreds of wats, depending on certain factors such as material type, geometry, excitation frequency, mechanical and electrical configurations, among others. In this research work, an exhaustive study is carried out on how different types of piezoelectric materials and electrical configurations influence the maximum power that a bimorph harvester is able to extract from mechanical vibrations. A series of experiments has been carried out in which the manufactured bimorph specimens are excited under fixed inertial vibrational conditions. In addition, in order to evaluate the dependence of the maximum transferred power, different load resistors are tested. In this way, the pure active power that achieves the maximum power transfer can be approximated. In this paper, we present the design of low-cost energy harvesting solutions based on piezoelectric smart materials with tunable frequency. The results obtained show the differences in energy extraction between the PZT materials studied and their electrical configurations. The aim of this work is to gain a better understanding of the behavior of piezoelectric materials, and the design process of bimorph PZT harvesters to optimize environmental energy extraction.

Keywords: Bimorph harvesters, electrical impedance, energy harvesting, piezoelectric, smart material.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 420
7241 An Efficient Technique for Extracting Fuzzy Rulesfrom Neural Networks

Authors: Besa Muslimi, Miriam A. M. Capretz, Jagath Samarabandu

Abstract:

Artificial neural networks (ANN) have the ability to model input-output relationships from processing raw data. This characteristic makes them invaluable in industry domains where such knowledge is scarce at best. In the recent decades, in order to overcome the black-box characteristic of ANNs, researchers have attempted to extract the knowledge embedded within ANNs in the form of rules that can be used in inference systems. This paper presents a new technique that is able to extract a small set of rules from a two-layer ANN. The extracted rules yield high classification accuracy when implemented within a fuzzy inference system. The technique targets industry domains that possess less complex problems for which no expert knowledge exists and for which a simpler solution is preferred to a complex one. The proposed technique is more efficient, simple, and applicable than most of the previously proposed techniques.

Keywords: fuzzy rule extraction, fuzzy systems, knowledgeacquisition, pattern recognition, artificial neural networks.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1557
7240 Landscape Data Transformation: Categorical Descriptions to Numerical Descriptors

Authors: Dennis A. Apuan

Abstract:

Categorical data based on description of the agricultural landscape imposed some mathematical and analytical limitations. This problem however can be overcome by data transformation through coding scheme and the use of non-parametric multivariate approach. The present study describes data transformation from qualitative to numerical descriptors. In a collection of 103 random soil samples over a 60 hectare field, categorical data were obtained from the following variables: levels of nitrogen, phosphorus, potassium, pH, hue, chroma, value and data on topography, vegetation type, and the presence of rocks. Categorical data were coded, and Spearman-s rho correlation was then calculated using PAST software ver. 1.78 in which Principal Component Analysis was based. Results revealed successful data transformation, generating 1030 quantitative descriptors. Visualization based on the new set of descriptors showed clear differences among sites, and amount of variation was successfully measured. Possible applications of data transformation are discussed.

Keywords: data transformation, numerical descriptors, principalcomponent analysis

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1487
7239 Effective Keyword and Similarity Thresholds for the Discovery of Themes from the User Web Access Patterns

Authors: Haider A Ramadhan, Khalil Shihab

Abstract:

Clustering techniques have been used by many intelligent software agents to group similar access patterns of the Web users into high level themes which express users intentions and interests. However, such techniques have been mostly focusing on one salient feature of the Web document visited by the user, namely the extracted keywords. The major aim of these techniques is to come up with an optimal threshold for the number of keywords needed to produce more focused themes. In this paper we focus on both keyword and similarity thresholds to generate themes with concentrated themes, and hence build a more sound model of the user behavior. The purpose of this paper is two fold: use distance based clustering methods to recognize overall themes from the Proxy log file, and suggest an efficient cut off levels for the keyword and similarity thresholds which tend to produce more optimal clusters with better focus and efficient size.

Keywords: Data mining, knowledge discovery, clustering, dataanalysis, Web log analysis, theme based searching.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1433
7238 A Survey of Semantic Integration Approaches in Bioinformatics

Authors: Chaimaa Messaoudi, Rachida Fissoune, Hassan Badir

Abstract:

Technological advances of computer science and data analysis are helping to provide continuously huge volumes of biological data, which are available on the web. Such advances involve and require powerful techniques for data integration to extract pertinent knowledge and information for a specific question. Biomedical exploration of these big data often requires the use of complex queries across multiple autonomous, heterogeneous and distributed data sources. Semantic integration is an active area of research in several disciplines, such as databases, information-integration, and ontology. We provide a survey of some approaches and techniques for integrating biological data, we focus on those developed in the ontology community.

Keywords: Semantic data integration, biological ontology, linked data, semantic web, OWL, RDF.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1799
7237 Performance Analysis of Proprietary and Non-Proprietary Tools for Regression Testing Using Genetic Algorithm

Authors: K. Hema Shankari, R. Thirumalaiselvi, N. V. Balasubramanian

Abstract:

The present paper addresses to the research in the area of regression testing with emphasis on automated tools as well as prioritization of test cases. The uniqueness of regression testing and its cyclic nature is pointed out. The difference in approach between industry, with business model as basis, and academia, with focus on data mining, is highlighted. Test Metrics are discussed as a prelude to our formula for prioritization; a case study is further discussed to illustrate this methodology. An industrial case study is also described in the paper, where the number of test cases is so large that they have to be grouped as Test Suites. In such situations, a genetic algorithm proposed by us can be used to reconfigure these Test Suites in each cycle of regression testing. The comparison is made between a proprietary tool and an open source tool using the above-mentioned metrics. Our approach is clarified through several tables.

Keywords: APFD metric, genetic algorithm, regression testing, RFT tool, test case prioritization, selenium tool.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 895
7236 Annotations of Gene Pathways Images in Biomedical Publications Using Siamese Network

Authors: Micheal Olaolu Arowolo, Muhammad Azam, Fei He, Mihail Popescu, Dong Xu

Abstract:

As the quantity of biological articles rises, so does the number of biological route figures. Each route figure shows gene names and relationships. Manually annotating pathway diagrams is time-consuming. Advanced image understanding models could speed up curation, but they must be more precise. There is rich information in biological pathway figures. The first step to performing image understanding of these figures is to recognize gene names automatically. Classical optical character recognition methods have been employed for gene name recognition, but they are not optimized for literature mining data. This study devised a method to recognize an image bounding box of gene name as a photo using deep Siamese neural network models to outperform the existing methods using ResNet, DenseNet and Inception architectures, the results obtained about 84% accuracy.

Keywords: Biological pathway, gene identification, object detection, Siamese network, ResNet.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 202
7235 Comparison between Associative Classification and Decision Tree for HCV Treatment Response Prediction

Authors: Enas M. F. El Houby, Marwa S. Hassan

Abstract:

Combined therapy using Interferon and Ribavirin is the standard treatment in patients with chronic hepatitis C. However, the number of responders to this treatment is low, whereas its cost and side effects are high. Therefore, there is a clear need to predict patient’s response to the treatment based on clinical information to protect the patients from the bad drawbacks, Intolerable side effects and waste of money. Different machine learning techniques have been developed to fulfill this purpose. From these techniques are Associative Classification (AC) and Decision Tree (DT). The aim of this research is to compare the performance of these two techniques in the prediction of virological response to the standard treatment of HCV from clinical information. 200 patients treated with Interferon and Ribavirin; were analyzed using AC and DT. 150 cases had been used to train the classifiers and 50 cases had been used to test the classifiers. The experiment results showed that the two techniques had given acceptable results however the best accuracy for the AC reached 92% whereas for DT reached 80%.

Keywords: Associative Classification, Data mining, Decision tree, HCV, interferon.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1864
7234 Comparative Analysis of Different Page Ranking Algorithms

Authors: S. Prabha, K. Duraiswamy, J. Indhumathi

Abstract:

Search engine plays an important role in internet, to retrieve the relevant documents among the huge number of web pages. However, it retrieves more number of documents, which are all relevant to your search topics. To retrieve the most meaningful documents related to search topics, ranking algorithm is used in information retrieval technique. One of the issues in data miming is ranking the retrieved document. In information retrieval the ranking is one of the practical problems. This paper includes various Page Ranking algorithms, page segmentation algorithms and compares those algorithms used for Information Retrieval. Diverse Page Rank based algorithms like Page Rank (PR), Weighted Page Rank (WPR), Weight Page Content Rank (WPCR), Hyperlink Induced Topic Selection (HITS), Distance Rank, Eigen Rumor, Distance Rank Time Rank, Tag Rank, Relational Based Page Rank and Query Dependent Ranking algorithms are discussed and compared.

Keywords: Information Retrieval, Web Page Ranking, search engine, web mining, page segmentations.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 4263
7233 Automatic Clustering of Gene Ontology by Genetic Algorithm

Authors: Razib M. Othman, Safaai Deris, Rosli M. Illias, Zalmiyah Zakaria, Saberi M. Mohamad

Abstract:

Nowadays, Gene Ontology has been used widely by many researchers for biological data mining and information retrieval, integration of biological databases, finding genes, and incorporating knowledge in the Gene Ontology for gene clustering. However, the increase in size of the Gene Ontology has caused problems in maintaining and processing them. One way to obtain their accessibility is by clustering them into fragmented groups. Clustering the Gene Ontology is a difficult combinatorial problem and can be modeled as a graph partitioning problem. Additionally, deciding the number k of clusters to use is not easily perceived and is a hard algorithmic problem. Therefore, an approach for solving the automatic clustering of the Gene Ontology is proposed by incorporating cohesion-and-coupling metric into a hybrid algorithm consisting of a genetic algorithm and a split-and-merge algorithm. Experimental results and an example of modularized Gene Ontology in RDF/XML format are given to illustrate the effectiveness of the algorithm.

Keywords: Automatic clustering, cohesion-and-coupling metric, gene ontology; genetic algorithm, split-and-merge algorithm.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1934
7232 The Status Info Processing and Keeping System for Production Equipment

Authors: So Jeong Nam, Seung Woo Lee, Jai-Kyung Lee

Abstract:

With the globalized production and logistics environment, the need for reducing the product development interval and lead time, having a faster response to orders, conforming to quality standards, fair tracking, and boosting information exchanging activities with customers and partners, and coping with changes in the management environment, manufacturers are in dire need of an information management system in their manufacturing environments. There are lots of information systems that have been designed to manage the condition or operation of equipment in the field but existing systems have a decentralized architecture, which is not unified. Also, these systems cannot effectively handle the status data extraction process upon encountering a problem related to protocols or changes in the equipment or the setting. In this regard, this paper will introduce a system for processing and saving the status info of production equipment, which uses standard representation formats, to enable flexible responses to and support for variables in the field equipment. This system can be used for a variety of manufacturing and equipment settings and is capable of interacting with higher-tier systems such as MES.

Keywords: DAS, Equipment Status, Regular Expression

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1524
7231 Automatic Classification of Lung Diseases from CT Images

Authors: Abobaker Mohammed Qasem Farhan, Shangming Yang, Mohammed Al-Nehari

Abstract:

Pneumonia is a kind of lung disease that creates congestion in the chest. Such pneumonic conditions lead to loss of life due to the severity of high congestion. Pneumonic lung disease is caused by viral pneumonia, bacterial pneumonia, or COVID-19 induced pneumonia. The early prediction and classification of such lung diseases help reduce the mortality rate. We propose the automatic Computer-Aided Diagnosis (CAD) system in this paper using the deep learning approach. The proposed CAD system takes input from raw computerized tomography (CT) scans of the patient's chest and automatically predicts disease classification. We designed the Hybrid Deep Learning Algorithm (HDLA) to improve accuracy and reduce processing requirements. The raw CT scans are pre-processed first to enhance their quality for further analysis. We then applied a hybrid model that consists of automatic feature extraction and classification. We propose the robust 2D Convolutional Neural Network (CNN) model to extract the automatic features from the pre-processed CT image. This CNN model assures feature learning with extremely effective 1D feature extraction for each input CT image. The outcome of the 2D CNN model is then normalized using the Min-Max technique. The second step of the proposed hybrid model is related to training and classification using different classifiers. The simulation outcomes using the publicly available dataset prove the robustness and efficiency of the proposed model compared to state-of-art algorithms.

Keywords: CT scans, COVID-19, deep learning, image processing, pneumonia, lung disease.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 536
7230 Efficient Lossless Compression of Weather Radar Data

Authors: Wei-hua Ai, Wei Yan, Xiang Li

Abstract:

Data compression is used operationally to reduce bandwidth and storage requirements. An efficient method for achieving lossless weather radar data compression is presented. The characteristics of the data are taken into account and the optical linear prediction is used for the PPI images in the weather radar data in the proposed method. The next PPI image is identical to the current one and a dramatic reduction in source entropy is achieved by using the prediction algorithm. Some lossless compression methods are used to compress the predicted data. Experimental results show that for the weather radar data, the method proposed in this paper outperforms the other methods.

Keywords: Lossless compression, weather radar data, optical linear prediction, PPI image

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2229
7229 Conceptualizing the Knowledge to Manage and Utilize Data Assets in the Context of Digitization: Case Studies of Multinational Industrial Enterprises

Authors: Martin Böhmer, Agatha Dabrowski, Boris Otto

Abstract:

The trend of digitization significantly changes the role of data for enterprises. Data turn from an enabler to an intangible organizational asset that requires management and qualifies as a tradeable good. The idea of a networked economy has gained momentum in the data domain as collaborative approaches for data management emerge. Traditional organizational knowledge consequently needs to be extended by comprehensive knowledge about data. The knowledge about data is vital for organizations to ensure that data quality requirements are met and data can be effectively utilized and sovereignly governed. As this specific knowledge has been paid little attention to so far by academics, the aim of the research presented in this paper is to conceptualize it by proposing a “data knowledge model”. Relevant model entities have been identified based on a design science research (DSR) approach that iteratively integrates insights of various industry case studies and literature research.

Keywords: Data management, digitization, Industry 4.0, knowledge engineering, metamodel.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1421
7228 A Methodology for Data Migration between Different Database Management Systems

Authors: Bogdan Walek, Cyril Klimes

Abstract:

In present days the area of data migration is very topical. Current tools for data migration in the area of relational database have several disadvantages that are presented in this paper. We propose a methodology for data migration of the database tables and their data between various types of relational database systems (RDBMS). The proposed methodology contains an expert system. The expert system contains a knowledge base that is composed of IFTHEN rules and based on the input data suggests appropriate data types of columns of database tables. The proposed tool, which contains an expert system, also includes the possibility of optimizing the data types in the target RDBMS database tables based on processed data of the source RDBMS database tables. The proposed expert system is shown on data migration of selected database of the source RDBMS to the target RDBMS.

Keywords: Expert system, fuzzy, data migration, database, relational database, data type, relational database management system.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3452
7227 Identification of Conserved Domains and Motifs for GRF Gene Family

Authors: Jafar Ahmadi, Nafiseh Noormohammadi, Sedigheh Fabriki Ourang

Abstract:

GRF, Growth regulating factor, genes encode a novel class of plant-specific transcription factors. The GRF proteins play a role in the regulation of cell numbers in young and growing tissues and may act as transcription activations in growth and development of plants. Identification of GRF genes and their expression are important in plants to performance of the growth and development of various organs. In this study, to better understanding the structural and functional differences of GRFs family, 45 GRF proteins sequences in A. thaliana, Z. mays, O. sativa, B. napus, B. rapa, H. vulgare and S. bicolor, have been collected and analyzed through bioinformatics data mining. As a result, in secondary structure of GRFs, the number of alpha helices was more than beta sheets and in all of them QLQ domains were completely in the biggest alpha helix. In all GRFs, QLQ and WRC domains were completely protected except in AtGRF9. These proteins have no trans-membrane domain and due to have nuclear localization signals act in nuclear and they are component of unstable proteins in the test tube.

Keywords: Domain, Gene Family, GRF, Motif.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2295