Search results for: Large Data
8727 Performance Evaluation of Neural Network Prediction for Data Prefetching in Embedded Applications
Authors: Sofien Chtourou, Mohamed Chtourou, Omar Hammami
Abstract:
Embedded systems need to respect stringent real time constraints. Various hardware components included in such systems such as cache memories exhibit variability and therefore affect execution time. Indeed, a cache memory access from an embedded microprocessor might result in a cache hit where the data is available or a cache miss and the data need to be fetched with an additional delay from an external memory. It is therefore highly desirable to predict future memory accesses during execution in order to appropriately prefetch data without incurring delays. In this paper, we evaluate the potential of several artificial neural networks for the prediction of instruction memory addresses. Neural network have the potential to tackle the nonlinear behavior observed in memory accesses during program execution and their demonstrated numerous hardware implementation emphasize this choice over traditional forecasting techniques for their inclusion in embedded systems. However, embedded applications execute millions of instructions and therefore millions of addresses to be predicted. This very challenging problem of neural network based prediction of large time series is approached in this paper by evaluating various neural network architectures based on the recurrent neural network paradigm with pre-processing based on the Self Organizing Map (SOM) classification technique.Keywords: Address, data set, memory, prediction, recurrentneural network.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 16758726 Energy Consumption Forecast Procedure for an Industrial Facility
Authors: Tatyana Aleksandrovna Barbasova, Lev Sergeevich Kazarinov, Olga Valerevna Kolesnikova, Aleksandra Aleksandrovna Filimonova
Abstract:
We regard forecasting of energy consumption by private production areas of a large industrial facility as well as by the facility itself. As for production areas, the forecast is made based on empirical dependencies of the specific energy consumption and the production output. As for the facility itself, implementation of the task to minimize the energy consumption forecasting error is based on adjustment of the facility’s actual energy consumption values evaluated with the metering device and the total design energy consumption of separate production areas of the facility. The suggested procedure of optimal energy consumption was tested based on the actual data of core product output and energy consumption by a group of workshops and power plants of the large iron and steel facility. Test results show that implementation of this procedure gives the mean accuracy of energy consumption forecasting for winter 2014 of 0.11% for the group of workshops and 0.137% for the power plants.Keywords: Energy consumption, energy consumption forecasting error, energy efficiency, forecasting accuracy, forecasting.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 17208725 A Hybrid Approach for Quantification of Novelty in Rule Discovery
Authors: Vasudha Bhatnagar, Ahmed Sultan Al-Hegami, Naveen Kumar
Abstract:
Rule Discovery is an important technique for mining knowledge from large databases. Use of objective measures for discovering interesting rules lead to another data mining problem, although of reduced complexity. Data mining researchers have studied subjective measures of interestingness to reduce the volume of discovered rules to ultimately improve the overall efficiency of KDD process. In this paper we study novelty of the discovered rules as a subjective measure of interestingness. We propose a hybrid approach that uses objective and subjective measures to quantify novelty of the discovered rules in terms of their deviations from the known rules. We analyze the types of deviation that can arise between two rules and categorize the discovered rules according to the user specified threshold. We implement the proposed framework and experiment with some public datasets. The experimental results are quite promising.
Keywords: Knowledge Discovery in Databases (KDD), Data Mining, Rule Discovery, Interestingness, Subjective Measures, Novelty Measure.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 13548724 Parallel Direct Integration Variable Step Block Method for Solving Large System of Higher Order Ordinary Differential Equations
Authors: Zanariah Abdul Majid, Mohamed Suleiman
Abstract:
The aim of this paper is to investigate the performance of the developed two point block method designed for two processors for solving directly non stiff large systems of higher order ordinary differential equations (ODEs). The method calculates the numerical solution at two points simultaneously and produces two new equally spaced solution values within a block and it is possible to assign the computational tasks at each time step to a single processor. The algorithm of the method was developed in C language and the parallel computation was done on a parallel shared memory environment. Numerical results are given to compare the efficiency of the developed method to the sequential timing. For large problems, the parallel implementation produced 1.95 speed-up and 98% efficiency for the two processors.Keywords: Numerical methods, parallel method, block method, higher order ODEs.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 13828723 The Most Secure Smartphone Operating System: A Survey
Authors: Sundus Ayyaz, Saad Rehman
Abstract:
In the recent years, a fundamental revolution in the Mobile Phone technology from just being able to provide voice and short message services to becoming the most essential part of our lives by connecting to network and various app stores for downloading software apps of almost every activity related to our life from finding location to banking from getting news updates to downloading HD videos and so on. This progress in Smart Phone industry has modernized and transformed our way of living into a trouble-free world. The smart phone has become our personal computers with the addition of significant features such as multi core processors, multi-tasking, large storage space, bluetooth, WiFi, including large screen and cameras. With this evolution, the rise in the security threats have also been amplified. In Literature, different threats related to smart phones have been highlighted and various precautions and solutions have been proposed to keep the smart phone safe which carries all the private data of a user. In this paper, a survey has been carried out to find out the most secure and the most unsecure smart phone operating system among the most popular smart phones in use today.Keywords: Smart phone, operating system, security threats, Android, iOS, Balckberry, Windows.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 41808722 Compressible Lattice Boltzmann Method for Turbulent Jet Flow Simulations
Authors: K. Noah, F.-S. Lien
Abstract:
In Computational Fluid Dynamics (CFD), there are a variety of numerical methods, of which some depend on macroscopic model representatives. These models can be solved by finite-volume, finite-element or finite-difference methods on a microscopic description. However, the lattice Boltzmann method (LBM) is considered to be a mesoscopic particle method, with its scale lying between the macroscopic and microscopic scales. The LBM works well for solving incompressible flow problems, but certain limitations arise from solving compressible flows, particularly at high Mach numbers. An improved lattice Boltzmann model for compressible flow problems is presented in this research study. A higher-order Taylor series expansion of the Maxwell equilibrium distribution function is used to overcome limitations in LBM when solving high-Mach-number flows. Large eddy simulation (LES) is implemented in LBM to simulate turbulent jet flows. The results have been validated with available experimental data for turbulent compressible free jet flow at subsonic speeds.
Keywords: Compressible lattice Boltzmann metho-, large eddy simulation, turbulent jet flows.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 9538721 Mining Correlated Bicluster from Web Usage Data Using Discrete Firefly Algorithm Based Biclustering Approach
Authors: K. Thangavel, R. Rathipriya
Abstract:
For the past one decade, biclustering has become popular data mining technique not only in the field of biological data analysis but also in other applications like text mining, market data analysis with high-dimensional two-way datasets. Biclustering clusters both rows and columns of a dataset simultaneously, as opposed to traditional clustering which clusters either rows or columns of a dataset. It retrieves subgroups of objects that are similar in one subgroup of variables and different in the remaining variables. Firefly Algorithm (FA) is a recently-proposed metaheuristic inspired by the collective behavior of fireflies. This paper provides a preliminary assessment of discrete version of FA (DFA) while coping with the task of mining coherent and large volume bicluster from web usage dataset. The experiments were conducted on two web usage datasets from public dataset repository whereby the performance of FA was compared with that exhibited by other population-based metaheuristic called binary Particle Swarm Optimization (PSO). The results achieved demonstrate the usefulness of DFA while tackling the biclustering problem.
Keywords: Biclustering, Binary Particle Swarm Optimization, Discrete Firefly Algorithm, Firefly Algorithm, Usage profile Web usage mining.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 21338720 Context Generation with Image Based Sensors: An Interdisciplinary Enquiry on Technical and Social Issues and their Implications for System Design
Authors: Julia Moehrmann, Gunter Heidemann, Oliver Siemoneit, Christoph Hubig, Uwe-Philipp Kaeppeler, Paul Levi
Abstract:
Image data holds a large amount of different context information. However, as of today, these resources remain largely untouched. It is thus the aim of this paper to present a basic technical framework which allows for a quick and easy exploitation of context information from image data especially by non-expert users. Furthermore, the proposed framework is discussed in detail concerning important social and ethical issues which demand special requirements in system design. Finally, a first sensor prototype is presented which meets the identified requirements. Additionally, necessary implications for the software and hardware design of the system are discussed, rendering a sensor system which could be regarded as a good, acceptable and justifiable technical and thereby enabling the extraction of context information from image data.Keywords: Context-aware computing, ethical and social issues, image recognition, requirements in system design.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 16678719 A Generic Middleware to Instantly Sync Intensive Writes of Heterogeneous Massive Data via Internet
Authors: Haitao Yang, Zhenjiang Ruan, Fei Xu, Lanting Xia
Abstract:
Industry data centers often need to sync data changes reliably and instantly from a large-scale of heterogeneous autonomous relational databases accessed via the not-so-reliable Internet, for which a practical generic sync middleware of low maintenance and operation costs is most wanted. To this demand, this paper presented a generic sync middleware system (GSMS), which has been developed, applied and optimized since 2006, holding the principles or advantages that it must be SyncML-compliant and transparent to data application layer logic without referring to implementation details of databases synced, does not rely on host computer operating systems deployed, and its construction is light weighted and hence of low cost. Regarding these hard commitments of developing GSMS, in this paper we stressed the significant optimization breakthrough of GSMS sync delay being well below a fraction of millisecond per record sync. A series of ultimate tests with GSMS sync performance were conducted for a persuasive example, in which the source relational database underwent a broad range of write loads (from one thousand to one million intensive writes within a few minutes). All these tests showed that the performance of GSMS is competent and smooth even under ultimate write loads.
Keywords: Heterogeneous massive data, instantly sync intensive writes, Internet generic middleware design, optimization.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 4548718 Generating Frequent Patterns through Intersection between Transactions
Authors: M. Jamali, F. Taghiyareh
Abstract:
The problem of frequent itemset mining is considered in this paper. One new technique proposed to generate frequent patterns in large databases without time-consuming candidate generation. This technique is based on focusing on transaction instead of concentrating on itemset. This algorithm based on take intersection between one transaction and others transaction and the maximum shared items between transactions computed instead of creating itemset and computing their frequency. With applying real life transactions and some consumption is taken from real life data, the significant efficiency acquire from databases in generation association rules mining.Keywords: Association rules, data mining, frequent patterns, shared itemset.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 14038717 Automatic Visualization Pipeline Formation for Medical Datasets on Grid Computing Environment
Authors: Aboamama Atahar Ahmed, Muhammad Shafie Abd Latiff, Kamalrulnizam Abu Bakar, Zainul AhmadRajion
Abstract:
Distance visualization of large datasets often takes the direction of remote viewing and zooming techniques of stored static images. However, the continuous increase in the size of datasets and visualization operation causes insufficient performance with traditional desktop computers. Additionally, the visualization techniques such as Isosurface depend on the available resources of the running machine and the size of datasets. Moreover, the continuous demand for powerful computing powers and continuous increase in the size of datasets results an urgent need for a grid computing infrastructure. However, some issues arise in current grid such as resources availability at the client machines which are not sufficient enough to process large datasets. On top of that, different output devices and different network bandwidth between the visualization pipeline components often result output suitable for one machine and not suitable for another. In this paper we investigate how the grid services could be used to support remote visualization of large datasets and to break the constraint of physical co-location of the resources by applying the grid computing technologies. We show our grid enabled architecture to visualize large medical datasets (circa 5 million polygons) for remote interactive visualization on modest resources clients.
Keywords: Visualization, Grid computing, Medical datasets, visualization techniques, thin clients, Globus toolkit, VTK.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 17518716 A Reduced-Bit Multiplication Algorithm for Digital Arithmetic
Authors: Harpreet Singh Dhillon, Abhijit Mitra
Abstract:
A reduced-bit multiplication algorithm based on the ancient Vedic multiplication formulae is proposed in this paper. Both the Vedic multiplication formulae, Urdhva tiryakbhyam and Nikhilam, are first discussed in detail. Urdhva tiryakbhyam, being a general multiplication formula, is equally applicable to all cases of multiplication. It is applied to the digital arithmetic and is shown to yield a multiplier architecture which is very similar to the popular array multiplier. Due to its structure, it leads to a high carry propagation delay in case of multiplication of large numbers. Nikhilam Sutra, on the other hand, is more efficient in the multiplication of large numbers as it reduces the multiplication of two large numbers to that of two smaller numbers. The framework of the proposed algorithm is taken from this Sutra and is further optimized by use of some general arithmetic operations such as expansion and bit-shifting to take advantage of bit-reduction in multiplication. We illustrate the proposed algorithm by reducing a general 4x4-bit multiplication to a single 2 x 2-bit multiplication operation.
Keywords: Multiplication, algorithm, Vedic mathematics, digital arithmetic, reduced-bit.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 34548715 Effects of Synthetic Jet in Suppressing Cavity Oscillations
Abstract:
The three-dimensional incompressible flow past a rectangular open cavity is investigated, where the aspect ratio of the cavity is considered as 4. The principle objective is to use large-eddy simulation to resolve and control the large-scale structures, which are largely responsible for flow oscillations in a cavity. The flow past an open cavity is very common in aerospace applications and can be a cause of acoustic source due to hydrodynamic instability of the shear layer and its interactions with the downstream edge. The unsteady Navier-stokes equations have been solved on a staggered mesh using a symmetry-preserving central difference scheme. Synthetic jet has been used as an active control to suppress the cavity oscillations in wake mode for a Reynolds number of ReD = 3360. The effect of synthetic jet has been studied by varying the jet amplitude and frequency, which is placed at the upstream wall of the cavity. The study indicates that there exits a frequency band, which is larger than a critical value, is effective in attenuating cavity oscillations when blowing ratio is more than 1.0.Keywords: Cavity oscillation, Large Eddy Simulation, Synthetic Jet, Flow Control, Turbulence
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 18158714 Study of Proton-9,11Li Elastic Scattering at 60~75 MeV/Nucleon
Authors: Arafa A. Alholaisi, Jamal H. Madani, M. A. Alvi
Abstract:
The radial form of nuclear matter distribution, charge and the shape of nuclei are essential properties of nuclei, and hence, are of great attention for several areas of research in nuclear physics. More than last three decades have witnessed a range of experimental means employing leptonic probes (such as muons, electrons etc.) for exploring nuclear charge distributions, whereas the hadronic probes (for example alpha particles, protons, etc.) have been used to investigate the nuclear matter distributions. In this paper, p-9,11Li elastic scattering differential cross sections in the energy range to MeV have been studied by means of Coulomb modified Glauber scattering formalism. By applying the semi-phenomenological Bhagwat-Gambhir-Patil [BGP] nuclear density for loosely bound neutron rich 11Li nucleus, the estimated matter radius is found to be 3.446 fm which is quite large as compared to so known experimental value 3.12 fm. The results of microscopic optical model based calculation by applying Bethe-Brueckner–Hartree–Fock formalism (BHF) have also been compared. It should be noted that in most of phenomenological density model used to reproduce the p-11Li differential elastic scattering cross sections data, the calculated matter radius lies between 2.964 and 3.55 fm. The calculated results with phenomenological BGP model density and with nucleon density calculated in the relativistic mean-field (RMF) reproduces p-9Li and p-11Li experimental data quite nicely as compared to Gaussian- Gaussian or Gaussian-Oscillator densities at all energies under consideration. In the approach described here, no free/adjustable parameter has been employed to reproduce the elastic scattering data as against the well-known optical model based studies that involve at least four to six adjustable parameters to match the experimental data. Calculated reaction cross sections σR for p-11Li at these energies are quite large as compared to estimated values reported by earlier works though so far no experimental studies have been performed to measure it.
Keywords: Bhagwat-Gambhir-Patil density, coulomb modified Glauber model, halo nucleus, optical limit approximation.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 7258713 Straight Line Defect Detection with Feed Forward Neural Network
Authors: S. Liangwongsan, A. Oonsivilai
Abstract:
Nowadays, hard disk is one of the most popular storage components. In hard disk industry, the hard disk drive must pass various complex processes and tested systems. In each step, there are some failures. To reduce waste from these failures, we must find the root cause of those failures. Conventionall data analysis method is not effective enough to analyze the large capacity of data. In this paper, we proposed the Hough method for straight line detection that helps to detect straight line defect patterns that occurs in hard disk drive. The proposed method will help to increase more speed and accuracy in failure analysis.
Keywords: Hough Transform, Failure Analysis, Media, Hard Disk Drive
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 20948712 Implementation of an IoT Sensor Data Collection and Analysis Library
Authors: Jihyun Song, Kyeongjoo Kim, Minsoo Lee
Abstract:
Due to the development of information technology and wireless Internet technology, various data are being generated in various fields. These data are advantageous in that they provide real-time information to the users themselves. However, when the data are accumulated and analyzed, more various information can be extracted. In addition, development and dissemination of boards such as Arduino and Raspberry Pie have made it possible to easily test various sensors, and it is possible to collect sensor data directly by using database application tools such as MySQL. These directly collected data can be used for various research and can be useful as data for data mining. However, there are many difficulties in using the board to collect data, and there are many difficulties in using it when the user is not a computer programmer, or when using it for the first time. Even if data are collected, lack of expert knowledge or experience may cause difficulties in data analysis and visualization. In this paper, we aim to construct a library for sensor data collection and analysis to overcome these problems.
Keywords: Clustering, data mining, DBSCAN, k-means, k-medoids, sensor data.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 20108711 Knowledge Representation and Inconsistency Reasoning of Class Diagram Maintenance in Big Data
Authors: Chi-Lun Liu
Abstract:
Requirements modeling and analysis are important in successful information systems' maintenance. Unified Modeling Language (UML) class diagrams are useful standards for modeling information systems. To our best knowledge, there is a lack of a systems development methodology described by the organism metaphor. The core concept of this metaphor is adaptation. Using the knowledge representation and reasoning approach and ontologies to adopt new requirements are emergent in recent years. This paper proposes an organic methodology which is based on constructivism theory. This methodology is a knowledge representation and reasoning approach to analyze new requirements in the class diagrams maintenance. The process and rules in the proposed methodology automatically analyze inconsistencies in the class diagram. In the big data era, developing an automatic tool based on the proposed methodology to analyze large amounts of class diagram data is an important research topic in the future.
Keywords: Knowledge representation, reasoning, ontology, class diagram, software engineering.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 10428710 Sleep Scheduling Schemes Based on Location of Mobile User in Sensor-Cloud
Authors: N. Mahendran, R. Priya
Abstract:
The mobile cloud computing (MCC) with wireless sensor networks (WSNs) technology gets more attraction by research scholars because its combines the sensors data gathering ability with the cloud data processing capacity. This approach overcomes the limitation of data storage capacity and computational ability of sensor nodes. Finally, the stored data are sent to the mobile users when the user sends the request. The most of the integrated sensor-cloud schemes fail to observe the following criteria: 1) The mobile users request the specific data to the cloud based on their present location. 2) Power consumption since most of them are equipped with non-rechargeable batteries. Mostly, the sensors are deployed in hazardous and remote areas. This paper focuses on above observations and introduces an approach known as collaborative location-based sleep scheduling (CLSS) scheme. Both awake and asleep status of each sensor node is dynamically devised by schedulers and the scheduling is done purely based on the of mobile users’ current location; in this manner, large amount of energy consumption is minimized at WSN. CLSS work depends on two different methods; CLSS1 scheme provides lower energy consumption and CLSS2 provides the scalability and robustness of the integrated WSN.
Keywords: Sleep scheduling, mobile cloud computing, wireless sensor network, integration, location, network lifetime.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 9768709 Government (Big) Data Ecosystem: Definition, Classification of Actors, and Their Roles
Authors: Syed Iftikhar Hussain Shah, Vasilis Peristeras, Ioannis Magnisalis
Abstract:
Organizations, including governments, generate (big) data that are high in volume, velocity, veracity, and come from a variety of sources. Public Administrations are using (big) data, implementing base registries, and enforcing data sharing within the entire government to deliver (big) data related integrated services, provision of insights to users, and for good governance. Government (Big) data ecosystem actors represent distinct entities that provide data, consume data, manipulate data to offer paid services, and extend data services like data storage, hosting services to other actors. In this research work, we perform a systematic literature review. The key objectives of this paper are to propose a robust definition of government (big) data ecosystem and a classification of government (big) data ecosystem actors and their roles. We showcase a graphical view of actors, roles, and their relationship in the government (big) data ecosystem. We also discuss our research findings. We did not find too much published research articles about the government (big) data ecosystem, including its definition and classification of actors and their roles. Therefore, we lent ideas for the government (big) data ecosystem from numerous areas that include scientific research data, humanitarian data, open government data, industry data, in the literature.
Keywords: Big data, big data ecosystem, classification of big data actors, big data actors roles, definition of government (big) data ecosystem, data-driven government, eGovernment, gaps in data ecosystems, government (big) data, public administration, systematic literature review.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 21438708 Large Eddy Simulation of Compartment Fire with Gas Combustible
Authors: Mliki Bouchmel, Abbassi Mohamed Ammar, Kamel Geudri, Chrigui Mouldi, Omri Ahmed
Abstract:
The objective of this work is to use the Fire Dynamics Simulator (FDS) to investigate the behavior of a kerosene small-scale fire. FDS is a Computational Fluid Dynamics (CFD) tool developed specifically for fire applications. Throughout its development, FDS is used for the resolution of practical problems in fire protection engineering. At the same time FDS is used to study fundamental fire dynamics and combustion. Predictions are based on Large Eddy Simulation (LES) with a Smagorinsky turbulence model. LES directly computes the large-scale eddies and the sub-grid scale dissipative processes are modeled. This technique is the default turbulence model which was used in this study. The validation of the numerical prediction is done using a direct comparison of combustion output variables to experimental measurements. Effect of the mesh size on the temperature evolutions is investigated and optimum grid size is suggested. Effect of width openings is investigated. Temperature distribution and species flow are presented for different operating conditions. The effect of the composition of the used fuel on atmospheric pollution is also a focus point within this work. Good predictions are obtained where the size of the computational cells within the fire compartment is less than 1/10th of the characteristic fire diameter.
Keywords: Large eddy simulation, Radiation, Turbulence, combustion, pollution.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 21778707 Correspondence between Function and Interaction in Protein Interaction Network of Saccaromyces cerevisiae
Authors: Nurcan Tuncbag, Turkan Haliloglu, Ozlem Keskin
Abstract:
Understanding the cell's large-scale organization is an interesting task in computational biology. Thus, protein-protein interactions can reveal important organization and function of the cell. Here, we investigated the correspondence between protein interactions and function for the yeast. We obtained the correlations among the set of proteins. Then these correlations are clustered using both the hierarchical and biclustering methods. The detailed analyses of proteins in each cluster were carried out by making use of their functional annotations. As a result, we found that some functional classes appear together in almost all biclusters. On the other hand, in hierarchical clustering, the dominancy of one functional class is observed. In the light of the clustering data, we have verified some interactions which were not identified as core interactions in DIP and also, we have characterized some functionally unknown proteins according to the interaction data and functional correlation. In brief, from interaction data to function, some correlated results are noticed about the relationship between interaction and function which might give clues about the organization of the proteins, also to predict new interactions and to characterize functions of unknown proteins.Keywords: Pair-wise protein interactions, DIP database, functional correlations, biclustering.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 15908706 Oil Debris Signal Detection Based on Integral Transform and Empirical Mode Decomposition
Authors: Chuan Li, Ming Liang
Abstract:
Oil debris signal generated from the inductive oil debris monitor (ODM) is useful information for machine condition monitoring but is often spoiled by background noise. To improve the reliability in machine condition monitoring, the high-fidelity signal has to be recovered from the noisy raw data. Considering that the noise components with large amplitude often have higher frequency than that of the oil debris signal, the integral transform is proposed to enhance the detectability of the oil debris signal. To cancel out the baseline wander resulting from the integral transform, the empirical mode decomposition (EMD) method is employed to identify the trend components. An optimal reconstruction strategy including both de-trending and de-noising is presented to detect the oil debris signal with less distortion. The proposed approach is applied to detect the oil debris signal in the raw data collected from an experimental setup. The result demonstrates that this approach is able to detect the weak oil debris signal with acceptable distortion from noisy raw data.Keywords: Integral transform, empirical mode decomposition, oil debris, signal processing, detection.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 17168705 Using Data Mining Techniques for Finding Cardiac Outlier Patients
Authors: Farhan Ismaeel Dakheel, Raoof Smko, K. Negrat, Abdelsalam Almarimi
Abstract:
In this paper we used data mining techniques to identify outlier patients who are using large amount of drugs over a long period of time. Any healthcare or health insurance system should deal with the quantities of drugs utilized by chronic diseases patients. In Kingdom of Bahrain, about 20% of health budget is spent on medications. For the managers of healthcare systems, there is no enough information about the ways of drug utilization by chronic diseases patients, is there any misuse or is there outliers patients. In this work, which has been done in cooperation with information department in the Bahrain Defence Force hospital; we select the data for Cardiac patients in the period starting from 1/1/2008 to December 31/12/2008 to be the data for the model in this paper. We used three techniques for finding the drug utilization for cardiac patients. First we applied a clustering technique, followed by measuring of clustering validity, and finally we applied a decision tree as classification algorithm. The clustering results is divided into three clusters according to the drug utilization, for 1603 patients, who received 15,806 prescriptions during this period can be partitioned into three groups, where 23 patients (2.59%) who received 1316 prescriptions (8.32%) are classified to be outliers. The classification algorithm shows that the use of average drug utilization and the age, and the gender of the patient can be considered to be the main predictive factors in the induced model.Keywords: Data Mining, Clustering, Classification, Drug Utilization..
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 18988704 Imputation Technique for Feature Selection in Microarray Data Set
Authors: Younies Mahmoud, Mai Mabrouk, Elsayed Sallam
Abstract:
Analyzing DNA microarray data sets is a great challenge, which faces the bioinformaticians due to the complication of using statistical and machine learning techniques. The challenge will be doubled if the microarray data sets contain missing data, which happens regularly because these techniques cannot deal with missing data. One of the most important data analysis process on the microarray data set is feature selection. This process finds the most important genes that affect certain disease. In this paper, we introduce a technique for imputing the missing data in microarray data sets while performing feature selection.
Keywords: DNA microarray, feature selection, missing data, bioinformatics.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 27918703 Quality Classification and Monitoring Using Adaptive Metric Distance and Neural Networks: Application in Pickling Process
Authors: S. Bouhouche, M. Lahreche, S. Ziani, J. Bast
Abstract:
Modern manufacturing facilities are large scale, highly complex, and operate with large number of variables under closed loop control. Early and accurate fault detection and diagnosis for these plants can minimise down time, increase the safety of plant operations, and reduce manufacturing costs. Fault detection and isolation is more complex particularly in the case of the faulty analog control systems. Analog control systems are not equipped with monitoring function where the process parameters are continually visualised. In this situation, It is very difficult to find the relationship between the fault importance and its consequences on the product failure. We consider in this paper an approach to fault detection and analysis of its effect on the production quality using an adaptive centring and scaling in the pickling process in cold rolling. The fault appeared on one of the power unit driving a rotary machine, this machine can not track a reference speed given by another machine. The length of metal loop is then in continuous oscillation, this affects the product quality. Using a computerised data acquisition system, the main machine parameters have been monitored. The fault has been detected and isolated on basis of analysis of monitored data. Normal and faulty situation have been obtained by an artificial neural network (ANN) model which is implemented to simulate the normal and faulty status of rotary machine. Correlation between the product quality defined by an index and the residual is used to quality classification.Keywords: Modeling, fault detection and diagnosis, parameters estimation, neural networks, Fault Detection and Diagnosis (FDD), pickling process.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 15778702 Automatic Real-Patient Medical Data De-Identification for Research Purposes
Authors: Petr Vcelak, Jana Kleckova
Abstract:
Our Medicine-oriented research is based on a medical data set of real patients. It is a security problem to share patient private data with peoples other than clinician or hospital staff. We have to remove person identification information from medical data. The medical data without private data are available after a de-identification process for any research purposes. In this paper, we introduce an universal automatic rule-based de-identification application to do all this stuff on an heterogeneous medical data. A patient private identification is replaced by an unique identification number, even in burnedin annotation in pixel data. The identical identification is used for all patient medical data, so it keeps relationships in a data. Hospital can take an advantage of a research feedback based on results.Keywords: DASTA, De-identification, DICOM, Health Level Seven, Medical data, OCR, Personal data
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 16428701 Web Log Mining by an Improved AprioriAll Algorithm
Authors: Wang Tong, He Pi-lian
Abstract:
This paper sets forth the possibility and importance about applying Data Mining in Web logs mining and shows some problems in the conventional searching engines. Then it offers an improved algorithm based on the original AprioriAll algorithm which has been used in Web logs mining widely. The new algorithm adds the property of the User ID during the every step of producing the candidate set and every step of scanning the database by which to decide whether an item in the candidate set should be put into the large set which will be used to produce next candidate set. At the meantime, in order to reduce the number of the database scanning, the new algorithm, by using the property of the Apriori algorithm, limits the size of the candidate set in time whenever it is produced. Test results show the improved algorithm has a more lower complexity of time and space, better restrain noise and fit the capacity of memory.
Keywords: Candidate Sets Pruning, Data Mining, ImprovedAlgorithm, Noise Restrain, Web Log
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 22808700 Analyzing Multi-Labeled Data Based on the Roll of a Concept against a Semantic Range
Authors: Masahiro Kuzunishi, Tetsuya Furukawa, Ke Lu
Abstract:
Classifying data hierarchically is an efficient approach to analyze data. Data is usually classified into multiple categories, or annotated with a set of labels. To analyze multi-labeled data, such data must be specified by giving a set of labels as a semantic range. There are some certain purposes to analyze data. This paper shows which multi-labeled data should be the target to be analyzed for those purposes, and discusses the role of a label against a set of labels by investigating the change when a label is added to the set of labels. These discussions give the methods for the advanced analysis of multi-labeled data, which are based on the role of a label against a semantic range.Keywords: Classification Hierarchies, Data Analysis, Multilabeled Data, Orders of Sets of Labels
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 12088699 A Rule-based Approach for Anomaly Detection in Subscriber Usage Pattern
Authors: Rupesh K. Gopal, Saroj K. Meher
Abstract:
In this report we present a rule-based approach to detect anomalous telephone calls. The method described here uses subscriber usage CDR (call detail record) data sampled over two observation periods: study period and test period. The study period contains call records of customers- non-anomalous behaviour. Customers are first grouped according to their similar usage behaviour (like, average number of local calls per week, etc). For customers in each group, we develop a probabilistic model to describe their usage. Next, we use maximum likelihood estimation (MLE) to estimate the parameters of the calling behaviour. Then we determine thresholds by calculating acceptable change within a group. MLE is used on the data in the test period to estimate the parameters of the calling behaviour. These parameters are compared against thresholds. Any deviation beyond the threshold is used to raise an alarm. This method has the advantage of identifying local anomalies as compared to techniques which identify global anomalies. The method is tested for 90 days of study data and 10 days of test data of telecom customers. For medium to large deviations in the data in test window, the method is able to identify 90% of anomalous usage with less than 1% false alarm rate.Keywords: Subscription fraud, fraud detection, anomalydetection, maximum likelihood estimation, rule based systems.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 28138698 An EWMA p Chart Based On Improved Square Root Transformation
Authors: S. Sukparungsee
Abstract:
Generally, the traditional Shewhart p chart has been developed by for charting the binomial data. This chart has been developed using the normal approximation with condition as low defect level and the small to moderate sample size. In real applications, however, are away from these assumptions due to skewness in the exact distribution. In this paper, a modified Exponentially Weighted Moving Average (EWMA) control chat for detecting a change in binomial data by improving square root transformations, namely ISRT p EWMA control chart. The numerical results show that ISRT p EWMA chart is superior to ISRT p chart for small to moderate shifts, otherwise, the latter is better for large shifts.
Keywords: Number of defects, Exponentially Weighted Moving Average, Average Run Length, Square root transformations.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2485