Search results for: Spatial data
6467 Using Data Mining Techniques for Finding Cardiac Outlier Patients
Authors: Farhan Ismaeel Dakheel, Raoof Smko, K. Negrat, Abdelsalam Almarimi
Abstract:
In this paper we used data mining techniques to identify outlier patients who are using large amount of drugs over a long period of time. Any healthcare or health insurance system should deal with the quantities of drugs utilized by chronic diseases patients. In Kingdom of Bahrain, about 20% of health budget is spent on medications. For the managers of healthcare systems, there is no enough information about the ways of drug utilization by chronic diseases patients, is there any misuse or is there outliers patients. In this work, which has been done in cooperation with information department in the Bahrain Defence Force hospital; we select the data for Cardiac patients in the period starting from 1/1/2008 to December 31/12/2008 to be the data for the model in this paper. We used three techniques for finding the drug utilization for cardiac patients. First we applied a clustering technique, followed by measuring of clustering validity, and finally we applied a decision tree as classification algorithm. The clustering results is divided into three clusters according to the drug utilization, for 1603 patients, who received 15,806 prescriptions during this period can be partitioned into three groups, where 23 patients (2.59%) who received 1316 prescriptions (8.32%) are classified to be outliers. The classification algorithm shows that the use of average drug utilization and the age, and the gender of the patient can be considered to be the main predictive factors in the induced model.Keywords: Data Mining, Clustering, Classification, Drug Utilization..
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 18996466 Retail Strategy to Reduce Waste Keeping High Profit Utilizing Taylor's Law in Point-of-Sales Data
Authors: Gen Sakoda, Hideki Takayasu, Misako Takayasu
Abstract:
Waste reduction is a fundamental problem for sustainability. Methods for waste reduction with point-of-sales (POS) data are proposed, utilizing the knowledge of a recent econophysics study on a statistical property of POS data. Concretely, the non-stationary time series analysis method based on the Particle Filter is developed, which considers abnormal fluctuation scaling known as Taylor's law. This method is extended for handling incomplete sales data because of stock-outs by introducing maximum likelihood estimation for censored data. The way for optimal stock determination with pricing the cost of waste reduction is also proposed. This study focuses on the examination of the methods for large sales numbers where Taylor's law is obvious. Numerical analysis using aggregated POS data shows the effectiveness of the methods to reduce food waste maintaining a high profit for large sales numbers. Moreover, the way of pricing the cost of waste reduction reveals that a small profit loss realizes substantial waste reduction, especially in the case that the proportionality constant of Taylor’s law is small. Specifically, around 1% profit loss realizes half disposal at =0.12, which is the actual value of processed food items used in this research. The methods provide practical and effective solutions for waste reduction keeping a high profit, especially with large sales numbers.
Keywords: Food waste reduction, particle filter, point of sales, sustainable development goals, Taylor's Law, time series analysis.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 8716465 Experimental teaching, Perceived usefulness, Ease of use, Learning Interest and Science Achievement of Taiwan 8th Graders in TIMSS 2007 Database
Authors: Pei Wen Liao, Tsung Hau Jen
Abstract:
the data of Taiwanese 8th grader in the 4th cycle of Trends in International Mathematics and Science Study (TIMSS) are analyzed to examine the influence of the science teachers- preference in experimental teaching on the relationships between the affective variables ( the perceived usefulness of science, ease of using science and science learning interest) and the academic achievement in science. After dealing with the missing data, 3711 students and 145 science teacher-s data were analyzed through a Hierarchical Linear Modeling technique. The major objective of this study was to determine the role of the experimental teaching moderates the relationship between perceived usefulness and achievement.Keywords: TIMSS database, Science achievement, Experimental teaching, Perceived Usefulness, Perceived Ease of Use
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 16576464 Topographic Arrangement of 3D Design Components on 2D Maps by Unsupervised Feature Extraction
Authors: Stefan Menzel
Abstract:
As a result of the daily workflow in the design development departments of companies, databases containing huge numbers of 3D geometric models are generated. According to the given problem engineers create CAD drawings based on their design ideas and evaluate the performance of the resulting design, e.g. by computational simulations. Usually, new geometries are built either by utilizing and modifying sets of existing components or by adding single newly designed parts to a more complex design. The present paper addresses the two facets of acquiring components from large design databases automatically and providing a reasonable overview of the parts to the engineer. A unified framework based on the topographic non-negative matrix factorization (TNMF) is proposed which solves both aspects simultaneously. First, on a given database meaningful components are extracted into a parts-based representation in an unsupervised manner. Second, the extracted components are organized and visualized on square-lattice 2D maps. It is shown on the example of turbine-like geometries that these maps efficiently provide a wellstructured overview on the database content and, at the same time, define a measure for spatial similarity allowing an easy access and reuse of components in the process of design development.Keywords: Design decomposition, topographic non-negative matrix factorization, parts-based representation, self-organization, unsupervised feature extraction.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 13796463 Landscape Pattern Evolution and Optimization Strategy in Wuhan Urban Development Zone, China
Abstract:
With the rapid development of urbanization process in China, its environmental protection pressure is severely tested. So, analyzing and optimizing the landscape pattern is an important measure to ease the pressure on the ecological environment. This paper takes Wuhan Urban Development Zone as the research object, and studies its landscape pattern evolution and quantitative optimization strategy. First, remote sensing image data from 1990 to 2015 were interpreted by using Erdas software. Next, the landscape pattern index of landscape level, class level, and patch level was studied based on Fragstats. Then five indicators of ecological environment based on National Environmental Protection Standard of China were selected to evaluate the impact of landscape pattern evolution on the ecological environment. Besides, the cost distance analysis of ArcGIS was applied to simulate wildlife migration thus indirectly measuring the improvement of ecological environment quality. The result shows that the area of land for construction increased 491%. But the bare land, sparse grassland, forest, farmland, water decreased 82%, 47%, 36%, 25% and 11% respectively. They were mainly converted into construction land. On landscape level, the change of landscape index all showed a downward trend. Number of patches (NP), Landscape shape index (LSI), Connection index (CONNECT), Shannon's diversity index (SHDI), Aggregation index (AI) separately decreased by 2778, 25.7, 0.042, 0.6, 29.2%, all of which indicated that the NP, the degree of aggregation and the landscape connectivity declined. On class level, the construction land and forest, CPLAND, TCA, AI and LSI ascended, but the Distribution Statistics Core Area (CORE_AM) decreased. As for farmland, water, sparse grassland, bare land, CPLAND, TCA and DIVISION, the Patch Density (PD) and LSI descended, yet the patch fragmentation and CORE_AM increased. On patch level, patch area, Patch perimeter, Shape index of water, farmland and bare land continued to decline. The three indexes of forest patches increased overall, sparse grassland decreased as a whole, and construction land increased. It is obvious that the urbanization greatly influenced the landscape evolution. Ecological diversity and landscape heterogeneity of ecological patches clearly dropped. The Habitat Quality Index continuously declined by 14%. Therefore, optimization strategy based on greenway network planning is raised for discussion. This paper contributes to the study of landscape pattern evolution in planning and design and to the research on spatial layout of urbanization.
Keywords: Landscape pattern, optimization strategy, ArcGIS, Erdas, landscape metrics, landscape architecture.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 8546462 A Cumulative Learning Approach to Data Mining Employing Censored Production Rules (CPRs)
Authors: Rekha Kandwal, Kamal K.Bharadwaj
Abstract:
Knowledge is indispensable but voluminous knowledge becomes a bottleneck for efficient processing. A great challenge for data mining activity is the generation of large number of potential rules as a result of mining process. In fact sometimes result size is comparable to the original data. Traditional data mining pruning activities such as support do not sufficiently reduce the huge rule space. Moreover, many practical applications are characterized by continual change of data and knowledge, thereby making knowledge voluminous with each change. The most predominant representation of the discovered knowledge is the standard Production Rules (PRs) in the form If P Then D. Michalski & Winston proposed Censored Production Rules (CPRs), as an extension of production rules, that exhibit variable precision and supports an efficient mechanism for handling exceptions. A CPR is an augmented production rule of the form: If P Then D Unless C, where C (Censor) is an exception to the rule. Such rules are employed in situations in which the conditional statement 'If P Then D' holds frequently and the assertion C holds rarely. By using a rule of this type we are free to ignore the exception conditions, when the resources needed to establish its presence, are tight or there is simply no information available as to whether it holds or not. Thus the 'If P Then D' part of the CPR expresses important information while the Unless C part acts only as a switch changes the polarity of D to ~D. In this paper a scheme based on Dempster-Shafer Theory (DST) interpretation of a CPR is suggested for discovering CPRs from the discovered flat PRs. The discovery of CPRs from flat rules would result in considerable reduction of the already discovered rules. The proposed scheme incrementally incorporates new knowledge and also reduces the size of knowledge base considerably with each episode. Examples are given to demonstrate the behaviour of the proposed scheme. The suggested cumulative learning scheme would be useful in mining data streams.
Keywords: Censored production rules, cumulative learning, data mining, machine learning.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 14856461 Pattern Classification of Back-Propagation Algorithm Using Exclusive Connecting Network
Authors: Insung Jung, Gi-Nam Wang
Abstract:
The objective of this paper is to a design of pattern classification model based on the back-propagation (BP) algorithm for decision support system. Standard BP model has done full connection of each node in the layers from input to output layers. Therefore, it takes a lot of computing time and iteration computing for good performance and less accepted error rate when we are doing some pattern generation or training the network. However, this model is using exclusive connection in between hidden layer nodes and output nodes. The advantage of this model is less number of iteration and better performance compare with standard back-propagation model. We simulated some cases of classification data and different setting of network factors (e.g. hidden layer number and nodes, number of classification and iteration). During our simulation, we found that most of simulations cases were satisfied by BP based using exclusive connection network model compared to standard BP. We expect that this algorithm can be available to identification of user face, analysis of data, mapping data in between environment data and information.Keywords: Neural network, Back-propagation, classification.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 16566460 Quick Sequential Search Algorithm Used to Decode High-Frequency Matrices
Authors: Mohammed M. Siddeq, Mohammed H. Rasheed, Omar M. Salih, Marcos A. Rodrigues
Abstract:
This research proposes a data encoding and decoding method based on the Matrix Minimization algorithm. This algorithm is applied to high-frequency coefficients for compression/encoding. The algorithm starts by converting every three coefficients to a single value; this is accomplished based on three different keys. The decoding/decompression uses a search method called QSS (Quick Sequential Search) Decoding Algorithm presented in this research based on the sequential search to recover the exact coefficients. In the next step, the decoded data are saved in an auxiliary array. The basic idea behind the auxiliary array is to save all possible decoded coefficients; this is because another algorithm, such as conventional sequential search, could retrieve encoded/compressed data independently from the proposed algorithm. The experimental results showed that our proposed decoding algorithm retrieves original data faster than conventional sequential search algorithms.
Keywords: Matrix Minimization Algorithm, Decoding Sequential Search Algorithm, image compression, Discrete Cosine Transform, Discrete Wavelet Transform.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2476459 CFD Analysis of Multi-Phase Reacting Transport Phenomena in Discharge Process of Non-Aqueous Lithium-Air Battery
Authors: Jinliang Yuan, Jong-Sung Yu, Bengt Sundén
Abstract:
A computational fluid dynamics (CFD) model is developed for rechargeable non-aqueous electrolyte lithium-air batteries with a partial opening for oxygen supply to the cathode. Multi-phase transport phenomena occurred in the battery are considered, including dissolved lithium ions and oxygen gas in the liquid electrolyte, solid-phase electron transfer in the porous functional materials and liquid-phase charge transport in the electrolyte. These transport processes are coupled with the electrochemical reactions at the active surfaces, and effects of discharge reaction-generated solid Li2O2 on the transport properties and the electrochemical reaction rate are evaluated and implemented in the model. The predicted results are discussed and analyzed in terms of the spatial and transient distribution of various parameters, such as local oxygen concentration, reaction rate, variable solid Li2O2 volume fraction and porosity, as well as the effective diffusion coefficients. It is found that the effect of the solid Li2O2 product deposited at the solid active surfaces is significant on the transport phenomena and the overall battery performance.
Keywords: Computational Fluid Dynamics (CFD), Modeling, Multi-phase, Transport Phenomena, Lithium-air battery.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 27456458 Design and Performance of Adaptive Polarized MIMO MC-SS-CDMA System for Downlink Mobile Communications
Authors: Joseph V. M. Halim, Hesham El-Badawy, Hadia M. El-Hennawy
Abstract:
In this paper, an adaptive polarized Multiple-Input Multiple-Output (MIMO) Multicarrier Spread Spectrum Code Division Multiple Access (MC-SS-CDMA) system is designed for downlink mobile communications. The proposed system will be examined in Frequency Division Duplex (FDD) mode for both macro urban and suburban environments. For the same transmission bandwidth, a performance comparison between both nonoverlapped and orthogonal Frequency Division Multiplexing (FDM) schemes will be presented. Also, the proposed system will be compared with both the closed loop vertical MIMO MC-SS-CDMA system and the synchronous vertical STBC-MIMO MC-SS-CDMA system. As will be shown, the proposed system introduces a significant performance gain as well as reducing the spatial dimensions of the MIMO system and simplifying the receiver implementation. The effect of the polarization diversity characteristics on the BER performance will be discussed. Also, the impact of excluding the cross-polarization MCSS- CDMA blocks in the base station will be investigated. In addition, the system performance will be evaluated under different Feedback Information (FBI) rates for slowly-varying channels. Finally, a performance comparison for vehicular and pedestrian environments will be presentedKeywords: Closed loop technique, MC-SS-CDMA, Polarized MIMO systems, Transmit diversity.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 16226457 A Case Study of Applying Virtual Prototyping in Construction
Authors: Stephen C. W. Kong
Abstract:
The use of 3D computer-aided design (CAD) models to support construction project planning has been increasing in the previous year. 3D CAD models reveal more planning ideas by visually showing the construction site environment in different stages of the construction process. Using 3D CAD models together with scheduling software to prepare construction plan can identify errors in process sequence and spatial arrangement, which is vital to the success of a construction project. A number of 4D (3D plus time) CAD tools has been developed and utilized in different construction projects due to the awareness of their importance. Virtual prototyping extends the idea of 4D CAD by integrating more features for simulating real construction process. Virtual prototyping originates from the manufacturing industry where production of products such as cars and airplanes are virtually simulated in computer before they are built in the factory. Virtual prototyping integrates 3D CAD, simulation engine, analysis tools (like structural analysis and collision detection), and knowledgebase to streamline the whole product design and production process. In this paper, we present the application of a virtual prototyping software which has been used in a few construction projects in Hong Kong to support construction project planning. Specifically, the paper presents an implementation of virtual prototyping in a residential building project in Hong Kong. The applicability, difficulties and benefits of construction virtual prototyping are examined based on this project.Keywords: construction project planning, prefabrication, simulation, virtual prototyping.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 28266456 A Survey in Techniques for Imbalanced Intrusion Detection System Datasets
Authors: Najmeh Abedzadeh, Matthew Jacobs
Abstract:
An intrusion detection system (IDS) is a software application that monitors malicious activities and generates alerts if any are detected. However, most network activities in IDS datasets are normal, and the relatively few numbers of attacks make the available data imbalanced. Consequently, cyber-attacks can hide inside a large number of normal activities, and machine learning algorithms have difficulty learning and classifying the data correctly. In this paper, a comprehensive literature review is conducted on different types of algorithms for both implementing the IDS and methods in correcting the imbalanced IDS dataset. The most famous algorithms are machine learning (ML), deep learning (DL), synthetic minority over-sampling technique (SMOTE), and reinforcement learning (RL). Most of the research use the CSE-CIC-IDS2017, CSE-CIC-IDS2018, and NSL-KDD datasets for evaluating their algorithms.
Keywords: IDS, intrusion detection system, imbalanced datasets, sampling algorithms, big data.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 11256455 Sonic Localization Cues for Classrooms: A Structural Model Proposal
Authors: Abhijit Mitra, C. Ardil
Abstract:
We investigate sonic cues for binaural sound localization within classrooms and present a structural model for the same. Two of the primary cues for localization, interaural time difference (ITD) and interaural level difference (ILD) created between the two ears by sounds from a particular point in space, are used. Although these cues do not lend any information about the elevation of a sound source, the torso, head, and outer ear carry out elevation dependent spectral filtering of sounds before they reach the inner ear. This effect is commonly captured in head related transfer function (HRTF) which aids in resolving the ambiguity from the ITDs and ILDs alone and helps localize sounds in free space. The proposed structural model of HRTF produces well controlled horizontal as well as vertical effects. The implemented HRTF is a signal processing model which tries to mimic the physical effects of the sounds interacting with different parts of the body. The effectiveness of the method is tested by synthesizing spatial audio, in MATLAB, for use in listening tests with human subjects and is found to yield satisfactory results in comparison with existing models.
Keywords: Auditory localization, Binaural sound, Head related impulse response, Head related transfer function, Interaural level difference, Interaural time difference, Localization cues.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 17296454 Disidentification of Historical City Centers: A Comparative Study of the Old and New Settlements of Mardin, Turkey
Authors: Fatma Kürüm Varolgüneş, Fatih Canan
Abstract:
Mardin is one of the unique cities in Turkey with its rich cultural and historical heritage. Mardin’s traditional dwellings have been affected both by natural data such as climate and topography and by cultural data like lifestyle and belief. However, in the new settlements, housing is formed with modern approaches and unsuitable forms clashing with Mardin’s culture and environment. While the city is expanding, traditional textures are ignored. Thus, traditional settlements are losing their identity and are vanishing because of the rapid change and transformation. The main aim of this paper is to determine the physical and social data needed to define the characteristic features of Mardin’s old and new settlements. In this context, based on social and cultural data, old and new settlement formations of Mardin have been investigated from various aspects. During this research, the following methods have been utilized: observations, interviews, public surveys, literature review, as well as site examination via maps, photographs and questionnaire methodology. In conclusion, this paper focuses on how changes in the physical forms of cities affect the typology and the identity of cities, as in the case of Mardin.
Keywords: Urban and local identity, historical city center, traditional settlements, Mardin, Turkey.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 10426453 Data Mining Techniques in Computer-Aided Diagnosis: Non-Invasive Cancer Detection
Authors: Florin Gorunescu
Abstract:
Diagnosis can be achieved by building a model of a certain organ under surveillance and comparing it with the real time physiological measurements taken from the patient. This paper deals with the presentation of the benefits of using Data Mining techniques in the computer-aided diagnosis (CAD), focusing on the cancer detection, in order to help doctors to make optimal decisions quickly and accurately. In the field of the noninvasive diagnosis techniques, the endoscopic ultrasound elastography (EUSE) is a recent elasticity imaging technique, allowing characterizing the difference between malignant and benign tumors. Digitalizing and summarizing the main EUSE sample movies features in a vector form concern with the use of the exploratory data analysis (EDA). Neural networks are then trained on the corresponding EUSE sample movies vector input in such a way that these intelligent systems are able to offer a very precise and objective diagnosis, discriminating between benign and malignant tumors. A concrete application of these Data Mining techniques illustrates the suitability and the reliability of this methodology in CAD.Keywords: Endoscopic ultrasound elastography, exploratorydata analysis, neural networks, non-invasive cancer detection.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 18676452 From Electroencephalogram to Epileptic Seizures Detection by Using Artificial Neural Networks
Authors: Gaetano Zazzaro, Angelo Martone, Roberto V. Montaquila, Luigi Pavone
Abstract:
Seizure is the main factor that affects the quality of life of epileptic patients. The diagnosis of epilepsy, and hence the identification of epileptogenic zone, is commonly made by using continuous Electroencephalogram (EEG) signal monitoring. Seizure identification on EEG signals is made manually by epileptologists and this process is usually very long and error prone. The aim of this paper is to describe an automated method able to detect seizures in EEG signals, using knowledge discovery in database process and data mining methods and algorithms, which can support physicians during the seizure detection process. Our detection method is based on Artificial Neural Network classifier, trained by applying the multilayer perceptron algorithm, and by using a software application, called Training Builder that has been developed for the massive extraction of features from EEG signals. This tool is able to cover all the data preparation steps ranging from signal processing to data analysis techniques, including the sliding window paradigm, the dimensionality reduction algorithms, information theory, and feature selection measures. The final model shows excellent performances, reaching an accuracy of over 99% during tests on data of a single patient retrieved from a publicly available EEG dataset.
Keywords: Artificial Neural Network, Data Mining, Electroencephalogram, Epilepsy, Feature Extraction, Seizure Detection, Signal Processing.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 13146451 TELUM Land Use Model: An Investigation of Data Requirements and Calibration Results for Chittenden County MPO, U.S.A.
Authors: Georgia Pozoukidou
Abstract:
TELUM software is a land use model designed specifically to help metropolitan planning organizations (MPOs) prepare their transportation improvement programs and fulfill their numerous planning responsibilities. In this context obtaining, preparing, and validating socioeconomic forecasts are becoming fundamental tasks for an MPO in order to ensure that consistent population and employment data are provided to travel demand models. Chittenden County Metropolitan Planning Organization of Vermont State was used as a case study to test the applicability of TELUM land use model. The technical insights and lessons learned from the land use model application have transferable value for all MPOs faced with land use forecasting development and transportation modeling.
Keywords: Calibration data requirements, land use models, land use planning, Metropolitan Planning Organizations.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 21006450 Study of Functional Relevant Conformational Mobility of β-2 Adrenoreceptor by Means of Molecular Dynamics Simulation
Authors: G. V. Novikov, V. S. Sivozhelezov, S. S. Kolesnikov, K. V. Shaitan
Abstract:
The study reports about the influence of binding of orthosteric ligands as well as point mutations on the conformational dynamics of β-2-adrenoreceptor. Using molecular dynamics simulation we found that there was a little fraction of active states of the receptor in its apo (ligand free) ensemble corresponded to its constitutive activity. Analysis of MD trajectories indicated that such spontaneous activation of the receptor is accompanied by the motion in intracellular part of its alpha-helices. Thus receptor’s constitutive activity directly results from its conformational dynamics. On the other hand the binding of a full agonist resulted in a significant shift of the initial equilibrium towards its active state. Finally, the binding of the inverse agonist stabilized the receptor in its inactive state. It is likely that the binding of inverse agonists might be a universal way of constitutive activity inhibition in vivo. Our results indicate that ligand binding redistribute pre-existing conformational degrees of freedom (in accordance to the Monod-Wyman-Changeux-Model) of the receptor rather than cause induced fit in it. Therefore, the ensemble of biologically relevant receptor conformations is encoded in its spatial structure, and individual conformations from that ensemble might be used by the cell in conformity with the physiological behavior.
Keywords: Seven-transmembrane receptors, constitutive activity, activation, x-ray crystallography, principal component analysis, molecular dynamics simulation.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 39576449 Road Traffic Accidents Analysis in Mexico City through Crowdsourcing Data and Data Mining Techniques
Authors: Gabriela V. Angeles Perez, Jose Castillejos Lopez, Araceli L. Reyes Cabello, Emilio Bravo Grajales, Adriana Perez Espinosa, Jose L. Quiroz Fabian
Abstract:
Road traffic accidents are among the principal causes of traffic congestion, causing human losses, damages to health and the environment, economic losses and material damages. Studies about traditional road traffic accidents in urban zones represents very high inversion of time and money, additionally, the result are not current. However, nowadays in many countries, the crowdsourced GPS based traffic and navigation apps have emerged as an important source of information to low cost to studies of road traffic accidents and urban congestion caused by them. In this article we identified the zones, roads and specific time in the CDMX in which the largest number of road traffic accidents are concentrated during 2016. We built a database compiling information obtained from the social network known as Waze. The methodology employed was Discovery of knowledge in the database (KDD) for the discovery of patterns in the accidents reports. Furthermore, using data mining techniques with the help of Weka. The selected algorithms was the Maximization of Expectations (EM) to obtain the number ideal of clusters for the data and k-means as a grouping method. Finally, the results were visualized with the Geographic Information System QGIS.Keywords: Data mining, K-means, road traffic accidents, Waze, Weka.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 12156448 Efficient Tuning Parameter Selection by Cross-Validated Score in High Dimensional Models
Authors: Yoonsuh Jung
Abstract:
As DNA microarray data contain relatively small sample size compared to the number of genes, high dimensional models are often employed. In high dimensional models, the selection of tuning parameter (or, penalty parameter) is often one of the crucial parts of the modeling. Cross-validation is one of the most common methods for the tuning parameter selection, which selects a parameter value with the smallest cross-validated score. However, selecting a single value as an ‘optimal’ value for the parameter can be very unstable due to the sampling variation since the sample sizes of microarray data are often small. Our approach is to choose multiple candidates of tuning parameter first, then average the candidates with different weights depending on their performance. The additional step of estimating the weights and averaging the candidates rarely increase the computational cost, while it can considerably improve the traditional cross-validation. We show that the selected value from the suggested methods often lead to stable parameter selection as well as improved detection of significant genetic variables compared to the tradition cross-validation via real data and simulated data sets.Keywords: Cross Validation, Parameter Averaging, Parameter Selection, Regularization Parameter Search.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 15726447 Using ALOHA Code to Evaluate CO2 Concentration for Maanshan Nuclear Power Plant
Authors: W. S. Hsu, S. W. Chen, Y. T. Ku, Y. Chiang, J. R. Wang , J. H. Yang, C. Shih
Abstract:
ALOHA code was used to calculate the concentration under the CO2 storage burst condition for Maanshan nuclear power plant (NPP) in this study. Five main data are input into ALOHA code including location, building, chemical, atmospheric, and source data. The data from Final Safety Analysis Report (FSAR) and some reports were used in this study. The ALOHA results are compared with the failure criteria of R.G. 1.78 to confirm the habitability of control room. The result of comparison presents that the ALOHA result is below the R.G. 1.78 criteria. This implies that the habitability of control room can be maintained in this case. The sensitivity study for atmospheric parameters was performed in this study. The results show that the wind speed has the larger effect in the concentration calculation.
Keywords: PWR, ALOHA, habitability, Maanshan.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 7426446 Review and Comparison of Associative Classification Data Mining Approaches
Authors: Suzan Wedyan
Abstract:
Associative classification (AC) is a data mining approach that combines association rule and classification to build classification models (classifiers). AC has attracted a significant attention from several researchers mainly because it derives accurate classifiers that contain simple yet effective rules. In the last decade, a number of associative classification algorithms have been proposed such as Classification based Association (CBA), Classification based on Multiple Association Rules (CMAR), Class based Associative Classification (CACA), and Classification based on Predicted Association Rule (CPAR). This paper surveys major AC algorithms and compares the steps and methods performed in each algorithm including: rule learning, rule sorting, rule pruning, classifier building, and class prediction.
Keywords: Associative Classification, Classification, Data Mining, Learning, Rule Ranking, Rule Pruning, Prediction.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 66346445 Distributed Splay Suffix Arrays: A New Structure for Distributed String Search
Authors: Tu Kun, Gu Nai-jie, Bi Kun, Liu Gang, Dong Wan-li
Abstract:
As a structure for processing string problem, suffix array is certainly widely-known and extensively-studied. But if the string access pattern follows the “90/10" rule, suffix array can not take advantage of the fact that we often find something that we have just found. Although the splay tree is an efficient data structure for small documents when the access pattern follows the “90/10" rule, it requires many structures and an excessive amount of pointer manipulations for efficiently processing and searching large documents. In this paper, we propose a new and conceptually powerful data structure, called splay suffix arrays (SSA), for string search. This data structure combines the features of splay tree and suffix arrays into a new approach which is suitable to implementation on both conventional and clustered computers.Keywords: suffix arrays, splay tree, string search, distributedalgorithm
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 17776444 Dimension Free Rigid Point Set Registration in Linear Time
Authors: Jianqin Qu
Abstract:
This paper proposes a rigid point set matching algorithm in arbitrary dimensions based on the idea of symmetric covariant function. A group of functions of the points in the set are formulated using rigid invariants. Each of these functions computes a pair of correspondence from the given point set. Then the computed correspondences are used to recover the unknown rigid transform parameters. Each computed point can be geometrically interpreted as the weighted mean center of the point set. The algorithm is compact, fast, and dimension free without any optimization process. It either computes the desired transform for noiseless data in linear time, or fails quickly in exceptional cases. Experimental results for synthetic data and 2D/3D real data are provided, which demonstrate potential applications of the algorithm to a wide range of problems.Keywords: Covariant point, point matching, dimension free, rigid registration.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 6836443 Functional and Efficient Query Interpreters: Principle, Application and Performances’ Comparison
Authors: Laurent Thiry, Michel Hassenforder
Abstract:
This paper presents a general approach to implement efficient queries’ interpreters in a functional programming language. Indeed, most of the standard tools actually available use an imperative and/or object-oriented language for the implementation (e.g. Java for Jena-Fuseki) but other paradigms are possible with, maybe, better performances. To proceed, the paper first explains how to model data structures and queries in a functional point of view. Then, it proposes a general methodology to get performances (i.e. number of computation steps to answer a query) then it explains how to integrate some optimization techniques (short-cut fusion and, more important, data transformations). It then compares the functional server proposed to a standard tool (Fuseki) demonstrating that the first one can be twice to ten times faster to answer queries.Keywords: Data transformation, functional programming, information server, optimization.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 7536442 Evidence Theory Enabled Quickest Change Detection Using Big Time-Series Data from Internet of Things
Authors: Hossein Jafari, Xiangfang Li, Lijun Qian, Alexander Aved, Timothy Kroecker
Abstract:
Traditionally in sensor networks and recently in the Internet of Things, numerous heterogeneous sensors are deployed in distributed manner to monitor a phenomenon that often can be model by an underlying stochastic process. The big time-series data collected by the sensors must be analyzed to detect change in the stochastic process as quickly as possible with tolerable false alarm rate. However, sensors may have different accuracy and sensitivity range, and they decay along time. As a result, the big time-series data collected by the sensors will contain uncertainties and sometimes they are conflicting. In this study, we present a framework to take advantage of Evidence Theory (a.k.a. Dempster-Shafer and Dezert-Smarandache Theories) capabilities of representing and managing uncertainty and conflict to fast change detection and effectively deal with complementary hypotheses. Specifically, Kullback-Leibler divergence is used as the similarity metric to calculate the distances between the estimated current distribution with the pre- and post-change distributions. Then mass functions are calculated and related combination rules are applied to combine the mass values among all sensors. Furthermore, we applied the method to estimate the minimum number of sensors needed to combine, so computational efficiency could be improved. Cumulative sum test is then applied on the ratio of pignistic probability to detect and declare the change for decision making purpose. Simulation results using both synthetic data and real data from experimental setup demonstrate the effectiveness of the presented schemes.Keywords: CUSUM, evidence theory, KL divergence, quickest change detection, time series data.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 9946441 A Secure Proxy Signature Scheme with Fault Tolerance Based on RSA System
Authors: H. El-Kamchouchi, Heba Gaber, Fatma Ahmed, Dalia H. El-Kamchouchi
Abstract:
Due to the rapid growth in modern communication systems, fault tolerance and data security are two important issues in a secure transaction. During the transmission of data between the sender and receiver, errors may occur frequently. Therefore, the sender must re-transmit the data to the receiver in order to correct these errors, which makes the system very feeble. To improve the scalability of the scheme, we present a secure proxy signature scheme with fault tolerance over an efficient and secure authenticated key agreement protocol based on RSA system. Authenticated key agreement protocols have an important role in building a secure communications network between the two parties.
Keywords: Proxy signature, fault tolerance, RSA, key agreement protocol.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 14856440 Hybrid Intelligent Intrusion Detection System
Authors: Norbik Bashah, Idris Bharanidharan Shanmugam, Abdul Manan Ahmed
Abstract:
Intrusion Detection Systems are increasingly a key part of systems defense. Various approaches to Intrusion Detection are currently being used, but they are relatively ineffective. Artificial Intelligence plays a driving role in security services. This paper proposes a dynamic model Intelligent Intrusion Detection System, based on specific AI approach for intrusion detection. The techniques that are being investigated includes neural networks and fuzzy logic with network profiling, that uses simple data mining techniques to process the network data. The proposed system is a hybrid system that combines anomaly, misuse and host based detection. Simple Fuzzy rules allow us to construct if-then rules that reflect common ways of describing security attacks. For host based intrusion detection we use neural-networks along with self organizing maps. Suspicious intrusions can be traced back to its original source path and any traffic from that particular source will be redirected back to them in future. Both network traffic and system audit data are used as inputs for both.Keywords: Intrusion Detection, Network Security, Data mining, Fuzzy Logic.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 21316439 Investigation of Learning Challenges in Building Measurement Unit
Authors: Argaw T. Gurmu, Muhammad N. Mahmood
Abstract:
The objective of this research is to identify the architecture and construction management students’ learning challenges of the building measurement. This research used the survey data obtained collected from the students who completed the building measurement unit. NVivo qualitative data analysis software was used to identify relevant themes. The analysis of the qualitative data revealed the major learning difficulties such as inadequacy of practice questions for the examination, inability to work as a team, lack of detailed understanding of the prerequisite units, insufficiency of the time allocated for tutorials and incompatibility of lecture and tutorial schedules. The output of this research can be used as a basis for improving the teaching and learning activities in construction measurement units.
Keywords: Building measurement, construction management, learning challenges, evaluate survey.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 11056438 An Efficient and Generic Hybrid Framework for High Dimensional Data Clustering
Authors: Dharmveer Singh Rajput , P. K. Singh, Mahua Bhattacharya
Abstract:
Clustering in high dimensional space is a difficult problem which is recurrent in many fields of science and engineering, e.g., bioinformatics, image processing, pattern reorganization and data mining. In high dimensional space some of the dimensions are likely to be irrelevant, thus hiding the possible clustering. In very high dimensions it is common for all the objects in a dataset to be nearly equidistant from each other, completely masking the clusters. Hence, performance of the clustering algorithm decreases. In this paper, we propose an algorithmic framework which combines the (reduct) concept of rough set theory with the k-means algorithm to remove the irrelevant dimensions in a high dimensional space and obtain appropriate clusters. Our experiment on test data shows that this framework increases efficiency of the clustering process and accuracy of the results.Keywords: High dimensional clustering, sub-space, k-means, rough set, discernibility matrix.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1950