Search results for: Data Structure

8071 Clustering Categorical Data Using Hierarchies (CLUCDUH)

Abstract:

Clustering large populations is an important problem when the data contain noise and different shapes. A good clustering algorithm or approach should be efficient enough to detect clusters sensitively. Besides space complexity, time complexity also gains importance as the size grows. Using hierarchies we developed a new algorithm to split attributes according to the values they have and choosing the dimension for splitting so as to divide the database roughly into equal parts as much as possible. At each node we calculate some certain descriptive statistical features of the data which reside and by pruning we generate the natural clusters with a complexity of O(n).

Keywords: Clustering, tree, split, pruning, entropy, gini.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1545

8070 Analysis of Users’ Behavior on Book Loan Log Based On Association Rule Mining

Authors: Kanyarat Bussaban, Kunyanuth Kularbphettong

Abstract:

This research aims to create a model for analysis of student behavior using Library resources based on data mining technique in case of Suan Sunandha Rajabhat University. The model was created under association rules, Apriori algorithm. The results were found 14 rules and the rules were tested with testing data set and it showed that the ability of classify data was 79.24percent and the MSE was 22.91. The results showed that the user’s behavior model by using association rule technique can use to manage the library resources.

Keywords: Behavior, data mining technique, Apriori algorithm.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2299

8069 The Role of Paraphrase in Interpreting Students’ Writing

Authors: Maya Lisa Aryanti, S. S. M. Hum

Abstract:

To improve students’ skill, writing is the most challenging skill to be developed. The reason is that besides helping the students to develop their skill, this activity also helps them to express themselves. This paper depicts how paraphrasing is very helpful to interpret students’ writing. Syntactic units, used tenses and meanings will indeed change once the writings were paraphrased. The objectives of this research are to reveal the inappropriate structure of syntactic units, to show what types of sentences the students often make, and to show how paraphrasing can help to infer the message. The methodology of this research is descriptive qualitative research. In addition, theories of linguistics are also included. This includes theory of Syntax to describe syntactic units and tenses and theory of Semantics to describe theories of meaning and how paraphrasing works. The theories of general linguistics, grammar and writing are also provided to support the theories of Syntax and Semantics. The results of this research are concerned with how the message is received in the end. The message written in the students’ essay is not clear because of the improper structure of syntactic units and use of incorrect of tenses. The students tend to use simple sentences, compound sentences and complex sentences with a few mistakes in their writing. In addition, they tend to create unnecessary phrases. The last point is that this research shows how paraphrase works to attain complete meaning of a sentence.

Keywords: Paraphrase, meanings, syntactic units and tenses.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1010

8068 Data Integrity: Challenges in Health Information Systems in South Africa

Authors: T. Thulare, M. Herselman, A. Botha

Abstract:

Poor system use, including inappropriate design of health information systems, causes difficulties in communication with patients and increased time spent by healthcare professionals in recording the necessary health information for medical records. System features like pop-up reminders, complex menus, and poor user interfaces can make medical records far more time consuming than paper cards as well as affect decision-making processes. Although errors associated with health information and their real and likely effect on the quality of care and patient safety have been documented for many years, more research is needed to measure the occurrence of these errors and determine the causes to implement solutions. Therefore, the purpose of this paper is to identify data integrity challenges in hospital information systems through a scoping review and based on the results provide recommendations on how to manage these. Only 34 papers were found to be most suitable out of 297 publications initially identified in the field. The results indicated that human and computerized systems are the most common challenges associated with data integrity and factors such as policy, environment, health workforce, and lack of awareness attribute to these challenges but if measures are taken the data integrity challenges can be managed.

Keywords: Data integrity, data integrity challenges, hospital information systems, South Africa.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1347

8067 Geophysical Investigation of Abnormal Seepages in Goronyo Dam Sokoto, North Western Nigeria Using Self-Potential Method

Authors: A. I. Augie, M. Saleh, A. A. Gado

Abstract:

In this research, Self-Potential (SP) method was employed to locate anomalous electrical conductivity located in Goronyo area and also to determine the condition of the embankment of the dam. SP data were plotted against distance along with the profile and spacing of electrode using surfer software (version 12). High and low zones of SP values were identified along the right and left abutments of the dam reservoir. The regions with high SP values were described to be high tendency of fluid flow associate with wet sandy soil. These zones have the SP values ranging from 200 mV and above. High SP values were due to the high moisture content that may lead to the seepage of water leaking through this zone. The zones with high SP values occupied Profiles S1, S2, S3, S4 and S5 indicating the presence of potential seepage paths within the subsurface of the embankment. These regions of seepage were identified as weak zones and potential pathways through which water could be lost from the dam reservoir. The SP values for the regions range from 250 m to 400 m (S1), 306 m to 400 m (S2), 192 m to 400 m (S3), 48 m to 200 m (S4) and 7 m to 170 m (S5) with their corresponding maximum depths of 30 m, 28 m, 28 m, 30 m and 26 m respectively. However, zones of low SP values in the overburden were observed which shows the presence of intact regions, which may be due to the compactness and dryness around the dam. The weak zones were considered as geological features (such as fractures, joints, and faults) that have undermined the integrity of the dam structure, which has led to the abnormal seepage.

Keywords: Self-potential, subsurface, seepage, condition and dam.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 680

8066 The Applicability of the Zipper Strut to Seismic Rehabilitation of Steel Structures

Authors: G. R. Nouri, H. Imani Kalesar, Zahra Ameli

Abstract:

Chevron frames (Inverted-V-braced frames or Vbraced frames) have seismic disadvantages, such as not good exhibit force redistribution capability and compression brace buckles immediately. Researchers developed new design provisions on increasing both the ductility and lateral resistance of these structures in seismic areas. One of these new methods is adding zipper columns, as proposed by Khatib et al. (1988) [2]. Zipper columns are vertical members connecting the intersection points of the braces above the first floor. In this paper applicability of the suspended zipper system to Seismic Rehabilitation of Steel Structures is investigated. The models are 3-, 6-, 9-, and 12-story Inverted-V-braced frames. In this case, it is assumed that the structures must be rehabilitated. For rehabilitation of structures, zipper column is used. The result of researches showed that the suspended zipper system is effective in case of 3-, 6-, and 9-story Inverted-V-braced frames and it would increase lateral resistance of structure up to life safety level. But in case of high-rise buildings (such as 12 story frame), it doesn-t show good performance. For solving this problem, the braced bay can consist of small “units" over the height of the entire structure, which each of them is a zipper-braced bay with a few stories. By using this method the lateral resistance of 12 story Inverted-V-braced frames is increased up to safety life level.

Keywords: chevron-braced frames, suspended zipper frames, zipper frames, zipper columns

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2215

8065 Self-Supervised Pretraining on Paired Sequences of fMRI Data for Transfer Learning to Brain Decoding Tasks

Authors: Sean Paulsen, Michael Casey

Abstract:

In this work, we present a self-supervised pretraining framework for transformers on functional Magnetic Resonance Imaging (fMRI) data. First, we pretrain our architecture on two self-supervised tasks simultaneously to teach the model a general understanding of the temporal and spatial dynamics of human auditory cortex during music listening. Our pretraining results are the first to suggest a synergistic effect of multitask training on fMRI data. Second, we finetune the pretrained models and train additional fresh models on a supervised fMRI classification task. We observe significantly improved accuracy on held-out runs with the finetuned models, which demonstrates the ability of our pretraining tasks to facilitate transfer learning. This work contributes to the growing body of literature on transformer architectures for pretraining and transfer learning with fMRI data, and serves as a proof of concept for our pretraining tasks and multitask pretraining on fMRI data.

Keywords: Transfer learning, fMRI, self-supervised, brain decoding, transformer, multitask training.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 131

8064 On Speeding Up Support Vector Machines: Proximity Graphs Versus Random Sampling for Pre-Selection Condensation

Authors: Xiaohua Liu, Juan F. Beltran, Nishant Mohanchandra, Godfried T. Toussaint

Abstract:

Support vector machines (SVMs) are considered to be the best machine learning algorithms for minimizing the predictive probability of misclassification. However, their drawback is that for large data sets the computation of the optimal decision boundary is a time consuming function of the size of the training set. Hence several methods have been proposed to speed up the SVM algorithm. Here three methods used to speed up the computation of the SVM classifiers are compared experimentally using a musical genre classification problem. The simplest method pre-selects a random sample of the data before the application of the SVM algorithm. Two additional methods use proximity graphs to pre-select data that are near the decision boundary. One uses k-Nearest Neighbor graphs and the other Relative Neighborhood Graphs to accomplish the task.

Keywords: Machine learning, data mining, support vector machines, proximity graphs, relative-neighborhood graphs, k-nearestneighbor graphs, random sampling, training data condensation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1912

8063 Dynamic Correlations and Portfolio Optimization between Islamic and Conventional Equity Indexes: A Vine Copula-Based Approach

Authors: Imen Dhaou

Abstract:

This study examines conditional Value at Risk by applying the GJR-EVT-Copula model, and finds the optimal portfolio for eight Dow Jones Islamic-conventional pairs. Our methodology consists of modeling the data by a bivariate GJR-GARCH model in which we extract the filtered residuals and then apply the Peak over threshold model (POT) to fit the residual tails in order to model marginal distributions. After that, we use pair-copula to find the optimal portfolio risk dependence structure. Finally, with Monte Carlo simulations, we estimate the Value at Risk (VaR) and the conditional Value at Risk (CVaR). The empirical results show the VaR and CVaR values for an equally weighted portfolio of Dow Jones Islamic-conventional pairs. In sum, we found that the optimal investment focuses on Islamic-conventional US Market index pairs because of high investment proportion; however, all other index pairs have low investment proportion. These results deliver some real repercussions for portfolio managers and policymakers concerning to optimal asset allocations, portfolio risk management and the diversification advantages of these markets.

Keywords: CVaR, Dow Jones Islamic index, GJR-GARCH-EVT-pair copula, portfolio optimization.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 988

8062 Wavelet and K-L Seperability Based Feature Extraction Method for Functional Data Classification

Authors: Jun Wan, Zehua Chen, Yingwu Chen, Zhidong Bai

Abstract:

This paper proposes a novel feature extraction method, based on Discrete Wavelet Transform (DWT) and K-L Seperability (KLS), for the classification of Functional Data (FD). This method combines the decorrelation and reduction property of DWT and the additive independence property of KLS, which is helpful to extraction classification features of FD. It is an advanced approach of the popular wavelet based shrinkage method for functional data reduction and classification. A theory analysis is given in the paper to prove the consistent convergence property, and a simulation study is also done to compare the proposed method with the former shrinkage ones. The experiment results show that this method has advantages in improving classification efficiency, precision and robustness.

Keywords: classification, functional data, feature extraction, K-Lseperability, wavelet.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1456

8061 A Cuckoo Search with Differential Evolution for Clustering Microarray Gene Expression Data

Authors: M. Pandi, K. Premalatha

Abstract:

A DNA microarray technology is a collection of microscopic DNA spots attached to a solid surface. Scientists use DNA microarrays to measure the expression levels of large numbers of genes simultaneously or to genotype multiple regions of a genome. Elucidating the patterns hidden in gene expression data offers a tremendous opportunity for an enhanced understanding of functional genomics. However, the large number of genes and the complexity of biological networks greatly increase the challenges of comprehending and interpreting the resulting mass of data, which often consists of millions of measurements. It is handled by clustering which reveals the natural structures and identifying the interesting patterns in the underlying data. In this paper, gene based clustering in gene expression data is proposed using Cuckoo Search with Differential Evolution (CS-DE). The experiment results are analyzed with gene expression benchmark datasets. The results show that CS-DE outperforms CS in benchmark datasets. To find the validation of the clustering results, this work is tested with one internal and one external cluster validation indexes.

Keywords: DNA, Microarray, genomics, Cuckoo Search, Differential Evolution, Gene expression data, Clustering.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1476

8060 Physiological Action of Anthraquinone-Containing Preparations

Authors: Dmitry Yu. Korulkin, Raissa A. Muzychkina, Evgenii N. Kojaev

Abstract:

In review the generalized data about biological activity of anthraquinone-containing plants and specimens on their basis is presented. Data of traditional medicine, results of bioscreening and clinical researches of specimens are analyzed.

Keywords: Anthraquinones, physiologically active substances, phytopreparation, Ramon.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2064

8059 Dynamical Analysis of Circadian Gene Expression

Authors: Carla Layana Luis Diambra

Abstract:

Microarrays technique allows the simultaneous measurements of the expression levels of thousands of mRNAs. By mining this data one can identify the dynamics of the gene expression time series. By recourse of principal component analysis, we uncover the circadian rhythmic patterns underlying the gene expression profiles from Cyanobacterium Synechocystis. We applied PCA to reduce the dimensionality of the data set. Examination of the components also provides insight into the underlying factors measured in the experiments. Our results suggest that all rhythmic content of data can be reduced to three main components.

Keywords: circadian rhythms, clustering, gene expression, PCA.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1582

8058 Modified Data Mining Approach for Defective Diagnosis in Hard Disk Drive Industry

Authors: S. Soommat, S. Patamatamkul, T. Prempridi, M. Sritulyachot, P. Ineure, S. Yimman

Abstract:

Currently, slider process of Hard Disk Drive Industry become more complex, defective diagnosis for yield improvement becomes more complicated and time-consumed. Manufacturing data analysis with data mining approach is widely used for solving that problem. The existing mining approach from combining of the KMean clustering, the machine oriented Kruskal-Wallis test and the multivariate chart were applied for defective diagnosis but it is still be a semiautomatic diagnosis system. This article aims to modify an algorithm to support an automatic decision for the existing approach. Based on the research framework, the new approach can do an automatic diagnosis and help engineer to find out the defective factors faster than the existing approach about 50%.

Keywords: Slider process, Defective diagnosis and Data mining.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1192

8057 Authentication and Data Hiding Using a Reversible ROI-based Watermarking Scheme for DICOM Images

Authors: Osamah M. Al-Qershi, Khoo Bee Ee

Abstract:

In recent years image watermarking has become an important research area in data security, confidentiality and image integrity. Many watermarking techniques were proposed for medical images. However, medical images, unlike most of images, require extreme care when embedding additional data within them because the additional information must not affect the image quality and readability. Also the medical records, electronic or not, are linked to the medical secrecy, for that reason, the records must be confidential. To fulfill those requirements, this paper presents a lossless watermarking scheme for DICOM images. The proposed a fragile scheme combines two reversible techniques based on difference expansion for patient's data hiding and protecting the region of interest (ROI) with tamper detection and recovery capability. Patient's data are embedded into ROI, while recovery data are embedded into region of non-interest (RONI). The experimental results show that the original image can be exactly extracted from the watermarked one in case of no tampering. In case of tampered ROI, tampered area can be localized and recovered with a high quality version of the original area.

Keywords: DICOM, reversible, ROI-based, watermarking.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1708

8056 Stego Machine – Video Steganography using Modified LSB Algorithm

Authors: Mritha Ramalingam

Abstract:

Computer technology and the Internet have made a breakthrough in the existence of data communication. This has opened a whole new way of implementing steganography to ensure secure data transfer. Steganography is the fine art of hiding the information. Hiding the message in the carrier file enables the deniability of the existence of any message at all. This paper designs a stego machine to develop a steganographic application to hide data containing text in a computer video file and to retrieve the hidden information. This can be designed by embedding text file in a video file in such away that the video does not loose its functionality using Least Significant Bit (LSB) modification method. This method applies imperceptible modifications. This proposed method strives for high security to an eavesdropper-s inability to detect hidden information.

Keywords: Data hiding, LSB, Stego machine, VideoSteganography

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 4260

8055 Data Projects for “Social Good”: Challenges and Opportunities

Authors: Mikel Niño, Roberto V. Zicari, Todor Ivanov, Kim Hee, Naveed Mushtaq, Marten Rosselli, Concha Sánchez-Ocaña, Karsten Tolle, José Miguel Blanco, Arantza Illarramendi, Jörg Besier, Harry Underwood

Abstract:

One of the application fields for data analysis techniques and technologies gaining momentum is the area of social good or “common good”, covering cases related to humanitarian crises, global health care, or ecology and environmental issues, among others. The promotion of data-driven projects in this field aims at increasing the efficacy and efficiency of social initiatives, improving the way these actions help humanity in general and people in need in particular. This application field, however, poses its own barriers and challenges when developing data-driven projects, lagging behind in comparison with other scenarios. These challenges derive from aspects such as the scope and scale of the social issue to solve, cultural and political barriers, the skills of main stakeholders and the technological resources available, the motivation to be engaged in such projects, or the ethical and legal issues related to sensitive data. This paper analyzes the application of data projects in the field of social good, reviewing its current state and noteworthy initiatives, and presenting a framework covering the key aspects to analyze in such projects. The goal is to provide guidelines to understand the main challenges and opportunities for this type of data project, as well as identifying the main differential issues compared to “classical” data projects in general. A case study is presented on the initial steps and stakeholder analysis of a data project for the inclusion of refugees in the city of Frankfurt, Germany, in order to empirically confront the framework with a real example.

Keywords: Data-Driven projects, humanitarian operations, personal and sensitive data, social good, stakeholders analysis.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1769

8054 An Intelligent Combined Method Based on Power Spectral Density, Decision Trees and Fuzzy Logic for Hydraulic Pumps Fault Diagnosis

Authors: Kaveh Mollazade, Hojat Ahmadi, Mahmoud Omid, Reza Alimardani

Abstract:

Recently, the issue of machine condition monitoring and fault diagnosis as a part of maintenance system became global due to the potential advantages to be gained from reduced maintenance costs, improved productivity and increased machine availability. The aim of this work is to investigate the effectiveness of a new fault diagnosis method based on power spectral density (PSD) of vibration signals in combination with decision trees and fuzzy inference system (FIS). To this end, a series of studies was conducted on an external gear hydraulic pump. After a test under normal condition, a number of different machine defect conditions were introduced for three working levels of pump speed (1000, 1500, and 2000 rpm), corresponding to (i) Journal-bearing with inner face wear (BIFW), (ii) Gear with tooth face wear (GTFW), and (iii) Journal-bearing with inner face wear plus Gear with tooth face wear (B&GW). The features of PSD values of vibration signal were extracted using descriptive statistical parameters. J48 algorithm is used as a feature selection procedure to select pertinent features from data set. The output of J48 algorithm was employed to produce the crisp if-then rule and membership function sets. The structure of FIS classifier was then defined based on the crisp sets. In order to evaluate the proposed PSD-J48-FIS model, the data sets obtained from vibration signals of the pump were used. Results showed that the total classification accuracy for 1000, 1500, and 2000 rpm conditions were 96.42%, 100%, and 96.42% respectively. The results indicate that the combined PSD-J48-FIS model has the potential for fault diagnosis of hydraulic pumps.

Keywords: Power Spectral Density, Machine ConditionMonitoring, Hydraulic Pump, Fuzzy Logic.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2698

8053 A Multi-Feature Deep Learning Algorithm for Urban Traffic Classification with Limited Labeled Data

Authors: Rohan Putatunda, Aryya Gangopadhyay

Abstract:

Acoustic sensors, if embedded in smart street lights, can help in capturing the activities (car honking, sirens, events, traffic, etc.) in cities. Needless to say, the acoustic data from such scenarios are complex due to multiple audio streams originating from different events, and when decomposed to independent signals, the amount of retrieved data volume is small in quantity which is inadequate to train deep neural networks. So, in this paper, we address the two challenges: a) separating the mixed signals, and b) developing an efficient acoustic classifier under data paucity. So, to address these challenges, we propose an architecture with supervised deep learning, where the initial captured mixed acoustics data are analyzed with Fast Fourier Transformation (FFT), followed by filtering the noise from the signal, and then decomposed to independent signals by fast independent component analysis (Fast ICA). To address the challenge of data paucity, we propose a multi feature-based deep neural network with high performance that is reflected in our experiments when compared to the conventional convolutional neural network (CNN) and multi-layer perceptron (MLP).

Keywords: FFT, ICA, vehicle classification, multi-feature DNN, CNN, MLP.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 417

8052 Effectiveness of ICT Training Workshop for Tutors of Allama Iqbal Open University, Pakistan

Authors: Muhammad Javid Qadir, Abdul Hameed

Abstract:

The purpose of the study was to investigate the effectiveness of ICT training workshop of tutors of Allama Iqbal Open University Pakistan. The study was delimited to tutors of Multan region. The total sample comprised of 100 tutors. All the tutors who participated in ICT training workshop in Multan region were taken as sample in the study. A questionnaire having two parts, based on five point rating scale was developed by the researcher. Part one was about the competency level of computer skills while Part two was based on items related to training delivery, structure and content. Part One of questionnaire had five levels of competency about computer skills. The questionnaire was personally administered and collected back by the researcher himself on the last day of workshop. The collected data were analyzed by using descriptive statistics. Through this study it was found that majority of the tutors strongly agreed that training enhanced their computer skills. Majority of the respondents consider themselves to be generally competent in the use of computer. They also agreed that there was appropriate infrastructure and technical support in lab during training workshop. Moreover, it was found that the training imparted the knowledge of pedagogy of using computers for distance education.

Keywords: ICT, Tutors, AIOU.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2180

8051 Statistical Analysis of Parameters Effects on Maximum Strain and Torsion Angle of FRP Honeycomb Sandwich Panels Subjected to Torsion

Authors: Mehdi Modabberifar, Milad Roodi, Ehsan Souri

Abstract:

In recent years, honeycomb fiber reinforced plastic (FRP) sandwich panels have been increasingly used in various industries. Low weight, low price and high mechanical strength are the benefits of these structures. However, their mechanical properties and behavior have not been fully explored. The objective of this study is to conduct a combined numerical-statistical investigation of honeycomb FRP sandwich beams subject to torsion load. In this paper, the effect of geometric parameters of sandwich panel on maximum shear strain in both face and core and angle of torsion in a honeycomb FRP sandwich structures in torsion is investigated. The effect of Parameters including core thickness, face skin thickness, cell shape, cell size, and cell thickness on mechanical behavior of the structure were numerically investigated. Main effects of factors were considered in this paper and regression equations were derived. Taguchi method was employed as experimental design and an optimum parameter combination for the maximum structure stiffness has been obtained. The results showed that cell size and face skin thickness have the most significant impacts on torsion angle, maximum shear strain in face and core.

Keywords: Finite element, honeycomb FRP sandwich panel, torsion, civil engineering.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2610

8050 An Educational Data Mining System for Advising Higher Education Students

Authors: Heba Mohammed Nagy, Walid Mohamed Aly, Osama Fathy Hegazy

Abstract:

Educational data mining is a specific data mining field applied to data originating from educational environments, it relies on different approaches to discover hidden knowledge from the available data. Among these approaches are machine learning techniques which are used to build a system that acquires learning from previous data. Machine learning can be applied to solve different regression, classification, clustering and optimization problems.

In our research, we propose a “Student Advisory Framework” that utilizes classification and clustering to build an intelligent system. This system can be used to provide pieces of consultations to a first year university student to pursue a certain education track where he/she will likely succeed in, aiming to decrease the high rate of academic failure among these students. A real case study in Cairo Higher Institute for Engineering, Computer Science and Management is presented using real dataset collected from 2000−2012.The dataset has two main components: pre-higher education dataset and first year courses results dataset. Results have proved the efficiency of the suggested framework.

Keywords: Classification, Clustering, Educational Data Mining (EDM), Machine Learning.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 5203

8049 Auto Classification for Search Intelligence

Authors: Lilac A. E. Al-Safadi

Abstract:

This paper proposes an auto-classification algorithm of Web pages using Data mining techniques. We consider the problem of discovering association rules between terms in a set of Web pages belonging to a category in a search engine database, and present an auto-classification algorithm for solving this problem that are fundamentally based on Apriori algorithm. The proposed technique has two phases. The first phase is a training phase where human experts determines the categories of different Web pages, and the supervised Data mining algorithm will combine these categories with appropriate weighted index terms according to the highest supported rules among the most frequent words. The second phase is the categorization phase where a web crawler will crawl through the World Wide Web to build a database categorized according to the result of the data mining approach. This database contains URLs and their categories.

Keywords: Information Processing on the Web, Data Mining, Document Classification.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1614

8048 Considering the Effect of Semi-Rigid Connection in Steel Frame Structures for Progressive Collapse

Authors: Fooad Karimi Ghaleh Jough, Mohsen Soori

Abstract:

Today, the occurrence of progressive failure in structures has become a challenging issue, requiring the presentation of suitable solutions for structural resistance to this phenomenon. It is also necessary to evaluate the vulnerability of existing and under-construction buildings to progressive failure. The kind of lateral load-resisting system the building and its connections have is one of the most significant and influential variables in structural resistance to the risk of progressing failure. Using the "Alternative Path" approach suggested by the GSA2003 and UFC2013 recommendations, different configurations of semi-rigid connections against progressive failure are offered in this study. In order to do this, the Opensees program was used to model nine distinct semi-rigid connection configurations on a three-story Special Area of Conservation (SAC) structure, accounting for the impact of connection stiffness. Then, using nonlinear dynamic analysis, the effects of column removal were explored in two scenarios: corner column removal and middle column removal on the first level. Nonlinear static analysis results showed that when a column is removed, structures with semi-rigid connections experience larger displacements, which result in the construction of a plastic hinge. Furthermore, it was clear from the findings of the nonlinear static analysis that the possibility of progressive failure increased with the number of semi-rigid connections in the structure.

Keywords: Semi-rigid, nonlinear static analysis, progressive collapse, alternative path.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 76

8047 Retrieval of Relevant Visual Data in Selected Machine Vision Tasks: Examples of Hardware-based and Software-based Solutions

Authors: Andrzej Śluzek

Abstract:

To illustrate diversity of methods used to extract relevant (where the concept of relevance can be differently defined for different applications) visual data, the paper discusses three groups of such methods. They have been selected from a range of alternatives to highlight how hardware and software tools can be complementarily used in order to achieve various functionalities in case of different specifications of “relevant data". First, principles of gated imaging are presented (where relevance is determined by the range). The second methodology is intended for intelligent intrusion detection, while the last one is used for content-based image matching and retrieval. All methods have been developed within projects supervised by the author.

Keywords: Relevant visual data, gated imaging, intrusion detection, image matching.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1385

8046 Analytical Study of Sedimentation Formation in Lined Canals using the SHARC Software- A Case Study of the Western Intake Structure in Dez Diversion Weir in Dezful, Iran

Authors: A.H. Sajedipoor, N. Hedayat, M. Mashal

Abstract:

Sedimentation is a hydraulic phenomenon that is emerging as a serious challenge in river engineering. When the flow reaches a certain state that gather potential energy, it shifts the sediment load along channel bed. The transport of such materials can be in the form of suspended and bed loads. The movement of these along the river course and channels and the ways in which this could influence the water intakes is considered as the major challenges for sustainable O&M of hydraulic structures. This could be very serious in arid and semi-arid regions like Iran, where inappropriate watershed management could lead to shifting a great deal of sediments into the reservoirs and irrigation systems. This paper aims to investigate sedimentation in the Western Canal of Dez Diversion Weir in Iran, identifying factors which influence the process and provide ways in which to mitigate its detrimental effects by using the SHARC Software. For the purpose of this paper, data from the Dezful water authority and Dezful Hydrometric Station pertinent to a river course of about 6 Km were used. Results estimated sand and silt bed loads concentrations to be 193 ppm and 827ppm respectively. Given the available data on average annual bed loads and average suspended sediment loads of 165ppm and 837ppm, there was a significant statistical difference (16%) between the sand grains, whereas no significant difference (1.2%) was find in the silt grain sizes. One explanation for such finding being that along the 6 Km river course there was considerable meandering effects which explains recent shift in the hydraulic behavior along the stream course under investigation. The sand concentration in downstream relative to present state of the canal showed a steep descending curve. Sediment trapping on the other hand indicated a steep ascending curve. These occurred because the diversion weir was not considered in the simulation model.

Keywords: SHARC model, sedimentation, Western canal, Dezdiversion weir

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1637

8045 SOA-Based Mobile Application for Crime Control in Thailand

Authors: Jintana Khemprasit, Vatcharaporn Esichaikul

Abstract:

Crime is a major societal problem for most of the world's nations. Consequently, the police need to develop new methods to improve their efficiency in dealing with these ever increasing crime rates. Two of the common difficulties that the police face in crime control are crime investigation and the provision of crime information to the general public to help them protect themselves. Crime control in police operations involves the use of spatial data, crime data and the related crime data from different organizations (depending on the nature of the analysis to be made). These types of data are collected from several heterogeneous sources in different formats and from different platforms, resulting in a lack of standardization. Moreover, there is no standard framework for crime data collection, integration and dissemination through mobile devices. An investigation into the current situation in crime control was carried out to identify the needs to resolve these issues. This paper proposes and investigates the use of service oriented architecture (SOA) and the mobile spatial information service in crime control. SOA plays an important role in crime control as an appropriate way to support data exchange and model sharing from heterogeneous sources. Crime control also needs to facilitate mobile spatial information services in order to exchange, receive, share and release information based on location to mobile users anytime and anywhere.

Keywords: Crime Control, Geographic Information System (GIS), Mobile GIS, Service Oriented Architecture (SOA).

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2521

8044 Multidimensional and Data Mining Analysis for Property Investment Risk Analysis

Authors: Nur Atiqah Rochin Demong, Jie Lu, Farookh Khadeer Hussain

Abstract:

Property investment in the real estate industry has a high risk due to the uncertainty factors that will affect the decisions made and high cost. Analytic hierarchy process has existed for some time in which referred to an expert-s opinion to measure the uncertainty of the risk factors for the risk analysis. Therefore, different level of experts- experiences will create different opinion and lead to the conflict among the experts in the field. The objective of this paper is to propose a new technique to measure the uncertainty of the risk factors based on multidimensional data model and data mining techniques as deterministic approach. The propose technique consist of a basic framework which includes four modules: user, technology, end-user access tools and applications. The property investment risk analysis defines as a micro level analysis as the features of the property will be considered in the analysis in this paper.

Keywords: Uncertainty factors, data mining, multidimensional data model, risk analysis.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2910

8043 Computational Aspects of Regression Analysis of Interval Data

Authors: Michal Cerny

Abstract:

We consider linear regression models where both input data (the values of independent variables) and output data (the observations of the dependent variable) are interval-censored. We introduce a possibilistic generalization of the least squares estimator, so called OLS-set for the interval model. This set captures the impact of the loss of information on the OLS estimator caused by interval censoring and provides a tool for quantification of this effect. We study complexity-theoretic properties of the OLS-set. We also deal with restricted versions of the general interval linear regression model, in particular the crisp input – interval output model. We give an argument that natural descriptions of the OLS-set in the crisp input – interval output cannot be computed in polynomial time. Then we derive easily computable approximations for the OLS-set which can be used instead of the exact description. We illustrate the approach by an example.

Keywords: Linear regression, interval-censored data, computational complexity.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1464

8042 Time-Derivative Estimation of Noisy Movie Data using Adaptive Control Theory

Authors: Soon-Hyun Park, Takami Matsuo

Abstract:

This paper presents an adaptive differentiator of sequential data based on the adaptive control theory. The algorithm is applied to detect moving objects by estimating a temporal gradient of sequential data at a specified pixel. We adopt two nonlinear intensity functions to reduce the influence of noises. The derivatives of the nonlinear intensity functions are estimated by an adaptive observer with σ-modification update law.

Keywords: Adaptive estimation, parameter adjustmentlaw, motion detection, temporal gradient, differential filter.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1867