Search results for: mining software repositories
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 2627

Search results for: mining software repositories

1517 High Securing Cover-File of Hidden Data Using Statistical Technique and AES Encryption Algorithm

Authors: A. A. Zaidan, Anas Majeed, B. B. Zaidan

Abstract:

Nowadays, the rapid development of multimedia and internet allows for wide distribution of digital media data. It becomes much easier to edit, modify and duplicate digital information Besides that, digital documents are also easy to copy and distribute, therefore it will be faced by many threatens. It-s a big security and privacy issue with the large flood of information and the development of the digital format, it become necessary to find appropriate protection because of the significance, accuracy and sensitivity of the information. Nowadays protection system classified with more specific as hiding information, encryption information, and combination between hiding and encryption to increase information security, the strength of the information hiding science is due to the non-existence of standard algorithms to be used in hiding secret messages. Also there is randomness in hiding methods such as combining several media (covers) with different methods to pass a secret message. In addition, there are no formal methods to be followed to discover the hidden data. For this reason, the task of this research becomes difficult. In this paper, a new system of information hiding is presented. The proposed system aim to hidden information (data file) in any execution file (EXE) and to detect the hidden file and we will see implementation of steganography system which embeds information in an execution file. (EXE) files have been investigated. The system tries to find a solution to the size of the cover file and making it undetectable by anti-virus software. The system includes two main functions; first is the hiding of the information in a Portable Executable File (EXE), through the execution of four process (specify the cover file, specify the information file, encryption of the information, and hiding the information) and the second function is the extraction of the hiding information through three process (specify the steno file, extract the information, and decryption of the information). The system has achieved the main goals, such as make the relation of the size of the cover file and the size of information independent and the result file does not make any conflict with anti-virus software.

Keywords: Cryptography, Steganography, Portable ExecutableFile.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1802
1516 Identification of Conserved Domains and Motifs for GRF Gene Family

Authors: Jafar Ahmadi, Nafiseh Noormohammadi, Sedigheh Fabriki Ourang

Abstract:

GRF, Growth regulating factor, genes encode a novel class of plant-specific transcription factors. The GRF proteins play a role in the regulation of cell numbers in young and growing tissues and may act as transcription activations in growth and development of plants. Identification of GRF genes and their expression are important in plants to performance of the growth and development of various organs. In this study, to better understanding the structural and functional differences of GRFs family, 45 GRF proteins sequences in A. thaliana, Z. mays, O. sativa, B. napus, B. rapa, H. vulgare and S. bicolor, have been collected and analyzed through bioinformatics data mining. As a result, in secondary structure of GRFs, the number of alpha helices was more than beta sheets and in all of them QLQ domains were completely in the biggest alpha helix. In all GRFs, QLQ and WRC domains were completely protected except in AtGRF9. These proteins have no trans-membrane domain and due to have nuclear localization signals act in nuclear and they are component of unstable proteins in the test tube.

Keywords: Domain, Gene Family, GRF, Motif.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2330
1515 Photo Mosaic Smartphone Application in Client-Server Based Large-Scale Image Databases

Authors: Sang-Hun Lee, Bum-Soo Kim, Yang-Sae Moon, Jinho Kim

Abstract:

In this paper we present a photo mosaic smartphone application in client-server based large-scale image databases. Photo mosaic is not a new concept, but there are very few smartphone applications especially for a huge number of images in the client-server environment. To support large-scale image databases, we first propose an overall framework working as a client-server model. We then present a concept of image-PAA features to efficiently handle a huge number of images and discuss its lower bounding property. We also present a best-match algorithm that exploits the lower bounding property of image-PAA. We finally implement an efficient Android-based application and demonstrate its feasibility.

Keywords: smartphone applications; photo mosaic; similarity search; data mining; large-scale image databases.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1671
1514 On Speeding Up Support Vector Machines: Proximity Graphs Versus Random Sampling for Pre-Selection Condensation

Authors: Xiaohua Liu, Juan F. Beltran, Nishant Mohanchandra, Godfried T. Toussaint

Abstract:

Support vector machines (SVMs) are considered to be the best machine learning algorithms for minimizing the predictive probability of misclassification. However, their drawback is that for large data sets the computation of the optimal decision boundary is a time consuming function of the size of the training set. Hence several methods have been proposed to speed up the SVM algorithm. Here three methods used to speed up the computation of the SVM classifiers are compared experimentally using a musical genre classification problem. The simplest method pre-selects a random sample of the data before the application of the SVM algorithm. Two additional methods use proximity graphs to pre-select data that are near the decision boundary. One uses k-Nearest Neighbor graphs and the other Relative Neighborhood Graphs to accomplish the task.

Keywords: Machine learning, data mining, support vector machines, proximity graphs, relative-neighborhood graphs, k-nearestneighbor graphs, random sampling, training data condensation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1919
1513 Study on the Integration Schemes and Performance Comparisons of Different Integrated Solar Combined Cycle-Direct Steam Generation Systems

Authors: Liqiang Duan, Ma Jingkai, Lv Zhipeng, Haifan Cai

Abstract:

The integrated solar combined cycle (ISCC) system has a series of advantages such as increasing the system power generation, reducing the cost of solar power generation, less pollutant and CO2 emission. In this paper, the parabolic trough collectors with direct steam generation (DSG) technology are considered to replace the heat load of heating surfaces in heat regenerator steam generation (HRSG) of a conventional natural gas combined cycle (NGCC) system containing a PG9351FA gas turbine and a triple pressure HRSG with reheat. The detailed model of the NGCC system is built in ASPEN PLUS software and the parabolic trough collectors with DSG technology is modeled in EBSILON software. ISCC-DSG systems with the replacement of single, two, three and four heating surfaces are studied in this paper. Results show that: (1) the ISCC-DSG systems with the replacement heat load of HPB, HPB+LPE, HPE2+HPB+HPS, HPE1+HPE2+ HPB+HPS are the best integration schemes when single, two, three and four stages of heating surfaces are partly replaced by the parabolic trough solar energy collectors with DSG technology. (2) Both the changes of feed water flow and the heat load of the heating surfaces in ISCC-DSG systems with the replacement of multi-stage heating surfaces are smaller than those in ISCC-DSG systems with the replacement of single heating surface. (3) ISCC-DSG systems with the replacement of HPB+LPE heating surfaces can increase the solar power output significantly. (4) The ISCC-DSG systems with the replacement of HPB heating surfaces has the highest solar-thermal-to-electricity efficiency (47.45%) and the solar radiation energy-to-electricity efficiency (30.37%), as well as the highest exergy efficiency of solar field (33.61%).

Keywords: HRSG, integration scheme, parabolic trough collectors with DSG technology, solar power generation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 892
1512 Belt Conveyor Dynamics in Transient Operation for Speed Control

Authors: D. He, Y. Pang, G. Lodewijks

Abstract:

Belt conveyors play an important role in continuous dry bulk material transport, especially at the mining industry. Speed control is expected to reduce the energy consumption of belt conveyors. Transient operation is the operation of increasing or decreasing conveyor speed for speed control. According to literature review, current research rarely takes the conveyor dynamics in transient operation into account. However, in belt conveyor speed control, the conveyor dynamic behaviors are significantly important since the poor dynamics might result in risks. In this paper, the potential risks in transient operation will be analyzed. An existing finite element model will be applied to build a conveyor model, and simulations will be carried out to analyze the conveyor dynamics. In order to realize the soft speed regulation, Harrison’s sinusoid acceleration profile will be applied, and Lodewijks estimator will be built to approximate the required acceleration time. A long inclined belt conveyor will be studied with two major simulations. The conveyor dynamics will be given.

Keywords: Belt conveyor, speed control, transient operation, dynamics

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2331
1511 Yield Prediction Using Support Vectors Based Under-Sampling in Semiconductor Process

Authors: Sae-Rom Pak, Seung Hwan Park, Jeong Ho Cho, Daewoong An, Cheong-Sool Park, Jun Seok Kim, Jun-Geol Baek

Abstract:

It is important to predict yield in semiconductor test process in order to increase yield. In this study, yield prediction means finding out defective die, wafer or lot effectively. Semiconductor test process consists of some test steps and each test includes various test items. In other world, test data has a big and complicated characteristic. It also is disproportionably distributed as the number of data belonging to FAIL class is extremely low. For yield prediction, general data mining techniques have a limitation without any data preprocessing due to eigen properties of test data. Therefore, this study proposes an under-sampling method using support vector machine (SVM) to eliminate an imbalanced characteristic. For evaluating a performance, randomly under-sampling method is compared with the proposed method using actual semiconductor test data. As a result, sampling method using SVM is effective in generating robust model for yield prediction.

Keywords: Yield Prediction, Semiconductor Test Process, Support Vector Machine, Under Sampling

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2397
1510 In situ Real-Time Multivariate Analysis of Methanolysis Monitoring of Sunflower Oil Using FTIR

Authors: Pascal Mwenge, Tumisang Seodigeng

Abstract:

The combination of world population and the third industrial revolution led to high demand for fuels. On the other hand, the decrease of global fossil 8fuels deposits and the environmental air pollution caused by these fuels has compounded the challenges the world faces due to its need for energy. Therefore, new forms of environmentally friendly and renewable fuels such as biodiesel are needed. The primary analytical techniques for methanolysis yield monitoring have been chromatography and spectroscopy, these methods have been proven reliable but are more demanding, costly and do not provide real-time monitoring. In this work, the in situ monitoring of biodiesel from sunflower oil using FTIR (Fourier Transform Infrared) has been studied; the study was performed using EasyMax Mettler Toledo reactor equipped with a DiComp (Diamond) probe. The quantitative monitoring of methanolysis was performed by building a quantitative model with multivariate calibration using iC Quant module from iC IR 7.0 software. 15 samples of known concentrations were used for the modelling which were taken in duplicate for model calibration and cross-validation, data were pre-processed using mean centering and variance scale, spectrum math square root and solvent subtraction. These pre-processing methods improved the performance indexes from 7.98 to 0.0096, 11.2 to 3.41, 6.32 to 2.72, 0.9416 to 0.9999, RMSEC, RMSECV, RMSEP and R2Cum, respectively. The R2 value of 1 (training), 0.9918 (test), 0.9946 (cross-validation) indicated the fitness of the model built. The model was tested against univariate model; small discrepancies were observed at low concentration due to unmodelled intermediates but were quite close at concentrations above 18%. The software eliminated the complexity of the Partial Least Square (PLS) chemometrics. It was concluded that the model obtained could be used to monitor methanol of sunflower oil at industrial and lab scale.

Keywords: Biodiesel, calibration, chemometrics, FTIR, methanolysis, multivariate analysis, transesterification.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 934
1509 A Comparison and Analysis of Name Matching Algorithms

Authors: Chakkrit Snae

Abstract:

Names are important in many societies, even in technologically oriented ones which use e.g. ID systems to identify individual people. Names such as surnames are the most important as they are used in many processes, such as identifying of people and genealogical research. On the other hand variation of names can be a major problem for the identification and search for people, e.g. web search or security reasons. Name matching presumes a-priori that the recorded name written in one alphabet reflects the phonetic identity of two samples or some transcription error in copying a previously recorded name. We add to this the lode that the two names imply the same person. This paper describes name variations and some basic description of various name matching algorithms developed to overcome name variation and to find reasonable variants of names which can be used to further increasing mismatches for record linkage and name search. The implementation contains algorithms for computing a range of fuzzy matching based on different types of algorithms, e.g. composite and hybrid methods and allowing us to test and measure algorithms for accuracy. NYSIIS, LIG2 and Phonex have been shown to perform well and provided sufficient flexibility to be included in the linkage/matching process for optimising name searching.

Keywords: Data mining, name matching algorithm, nominaldata, searching system.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 11090
1508 Unsupervised Text Mining Approach to Early Warning System

Authors: Ichihan Tai, Bill Olson, Paul Blessner

Abstract:

Traditional early warning systems that alarm against crisis are generally based on structured or numerical data; therefore, a system that can make predictions based on unstructured textual data, an uncorrelated data source, is a great complement to the traditional early warning systems. The Chicago Board Options Exchange (CBOE) Volatility Index (VIX), commonly referred to as the fear index, measures the cost of insurance against market crash, and spikes in the event of crisis. In this study, news data is consumed for prediction of whether there will be a market-wide crisis by predicting the movement of the fear index, and the historical references to similar events are presented in an unsupervised manner. Topic modeling-based prediction and representation are made based on daily news data between 1990 and 2015 from The Wall Street Journal against VIX index data from CBOE.

Keywords: Early Warning System, Knowledge Management, Topic Modeling, Market Prediction.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1920
1507 FCNN-MR: A Parallel Instance Selection Method Based on Fast Condensed Nearest Neighbor Rule

Authors: Lu Si, Jie Yu, Shasha Li, Jun Ma, Lei Luo, Qingbo Wu, Yongqi Ma, Zhengji Liu

Abstract:

Instance selection (IS) technique is used to reduce the data size to improve the performance of data mining methods. Recently, to process very large data set, several proposed methods divide the training set into some disjoint subsets and apply IS algorithms independently to each subset. In this paper, we analyze the limitation of these methods and give our viewpoint about how to divide and conquer in IS procedure. Then, based on fast condensed nearest neighbor (FCNN) rule, we propose a large data sets instance selection method with MapReduce framework. Besides ensuring the prediction accuracy and reduction rate, it has two desirable properties: First, it reduces the work load in the aggregation node; Second and most important, it produces the same result with the sequential version, which other parallel methods cannot achieve. We evaluate the performance of FCNN-MR on one small data set and two large data sets. The experimental results show that it is effective and practical.

Keywords: Instance selection, data reduction, MapReduce, kNN.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1017
1506 Unsupervised Clustering Methods for Identifying Rare Events in Anomaly Detection

Authors: Witcha Chimphlee, Abdul Hanan Abdullah, Mohd Noor Md Sap, Siriporn Chimphlee, Surat Srinoy

Abstract:

It is important problems to increase the detection rates and reduce false positive rates in Intrusion Detection System (IDS). Although preventative techniques such as access control and authentication attempt to prevent intruders, these can fail, and as a second line of defence, intrusion detection has been introduced. Rare events are events that occur very infrequently, detection of rare events is a common problem in many domains. In this paper we propose an intrusion detection method that combines Rough set and Fuzzy Clustering. Rough set has to decrease the amount of data and get rid of redundancy. Fuzzy c-means clustering allow objects to belong to several clusters simultaneously, with different degrees of membership. Our approach allows us to recognize not only known attacks but also to detect suspicious activity that may be the result of a new, unknown attack. The experimental results on Knowledge Discovery and Data Mining-(KDDCup 1999) Dataset show that the method is efficient and practical for intrusion detection systems.

Keywords: Network and security, intrusion detection, fuzzy cmeans, rough set.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2861
1505 Development of Innovative Islamic Web Applications

Authors: Farrukh Shahzad

Abstract:

The rich Islamic resources related to religious text, Islamic sciences, and history are widely available in print and in electronic format online. However, most of these works are only available in Arabic language. In this research, an attempt is made to utilize these resources to create interactive web applications in Arabic, English and other languages. The system utilizes the Pattern Recognition, Knowledge Management, Data Mining, Information Retrieval and Management, Indexing, storage and data-analysis techniques to parse, store, convert and manage the information from authentic Arabic resources. These interactive web Apps provide smart multi-lingual search, tree based search, on-demand information matching and linking. In this paper, we provide details of application architecture, design, implementation and technologies employed. We also presented the summary of web applications already developed. We have also included some screen shots from the corresponding web sites. These web applications provide an Innovative On-line Learning Systems (eLearning and computer based education).

Keywords: Islamic resources, Muslim scholars, hadith, narrators, history, fiqh.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1302
1504 Semi-Automatic Method to Assist Expert for Association Rules Validation

Authors: Amdouni Hamida, Gammoudi Mohamed Mohsen

Abstract:

In order to help the expert to validate association rules extracted from data, some quality measures are proposed in the literature. We distinguish two categories: objective and subjective measures. The first one depends on a fixed threshold and on data quality from which the rules are extracted. The second one consists on providing to the expert some tools in the objective to explore and visualize rules during the evaluation step. However, the number of extracted rules to validate remains high. Thus, the manually mining rules task is very hard. To solve this problem, we propose, in this paper, a semi-automatic method to assist the expert during the association rule's validation. Our method uses rule-based classification as follow: (i) We transform association rules into classification rules (classifiers), (ii) We use the generated classifiers for data classification. (iii) We visualize association rules with their quality classification to give an idea to the expert and to assist him during validation process.

Keywords: Association rules, Rule-based classification, Classification quality, Validation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1791
1503 Radioactivity Assessment of Sediments in Negombo Lagoon Sri Lanka

Authors: H. M. N. L. Handagiripathira

Abstract:

The distributions of naturally occurring and anthropogenic radioactive materials were determined in surface sediments taken at 27 different locations along the bank of Negombo Lagoon in Sri Lanka. Hydrographic parameters of lagoon water and the grain size analyses of the sediment samples were also carried out for this study. The conductivity of the adjacent water was varied from 13.6 mS/cm to 55.4 mS/cm near to the southern end and the northern end of the lagoon, respectively, and equally salinity levels varied from 7.2 psu to 32.1 psu. The average pH in the water was 7.6 and average water temperature was 28.7 °C. The grain size analysis emphasized the mass fractions of the samples as sand (60.9%), fine sand (30.6%) and fine silt+clay (1.3%) in the sampling locations. The surface sediment samples of wet weight, 1 kg each from upper 5-10 cm layer, were oven dried at 105 °C for 24 hours to get a constant weight, homogenized and sieved through a 2 mm sieve (IAEA technical series no. 295). The radioactivity concentrations were determined using gamma spectrometry technique. Ultra Low Background Broad Energy High Purity Ge Detector, BEGe (Model BE5030, Canberra) was used for radioactivity measurement with Canberra Industries' Laboratory Source-less Calibration Software (LabSOCS) mathematical efficiency calibration approach and Geometry composer software. The mean activity concentration was found to be 24 ± 4, 67 ± 9, 181 ± 10, 59 ± 8, 3.5 ± 0.4 and 0.47 ± 0.08 Bq/kg for 238U, 232Th, 40K, 210Pb, 235U and 137Cs respectively. The mean absorbed dose rate in air, radium equivalent activity, external hazard index, annual gonadal dose equivalent and annual effective dose equivalent were 60.8 nGy/h, 137.3 Bq/kg, 0.4, 425.3 mSv/year and 74.6 mSv/year, respectively. The results of this study will provide baseline information on the natural and artificial radioactive isotopes and environmental pollution associated with information on radiological risk.

Keywords: Gamma spectrometry, lagoon, radioactivity, sediments.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 524
1502 Translator Design to Model Cpp Files

Authors: Er. Satwinder Singh, Dr. K.S. Kahlon, Rakesh Kumar, Er. Gurjeet Singh

Abstract:

The most reliable and accurate description of the actual behavior of a software system is its source code. However, not all questions about the system can be answered directly by resorting to this repository of information. What the reverse engineering methodology aims at is the extraction of abstract, goal-oriented “views" of the system, able to summarize relevant properties of the computation performed by the program. While concentrating on reverse engineering we had modeled the C++ files by designing the translator.

Keywords: Translator, Modeling, UML, DYNO, ISVis, TED.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1534
1501 The Effect of Fine Aggregate Properties on the Fatigue Behavior of the Conventional and Polymer Modified Bituminous Mixtures Using Two Types of Sand as Fine Aggregate

Authors: S. G. Yasreen, N. B. Madzlan, K. Ibrahim

Abstract:

Fatigue cracking continues to be the main challenges in improving the performance of bituminous mixture pavements. The purpose of this paper is to look at some aspects of the effects of fine aggregate properties on the fatigue behaviour of hot mixture asphalt. Two types of sand (quarry and mining sand) with two conventional bitumen (PEN 50/60 & PEN 80/100) and four polymers modified bitumen PMB (PM1_82, PM1_76, PM2_82 and PM2_76) were used. Physical, chemical and mechanical tests were performed on the sands to determine their effect when incorporated with a bituminous mixture. According to the beam fatigue results, quarry sand that has more angularity, rougher, higher shear strength and a higher percentage of Aluminium oxide presented higher resistance to fatigue. Also a PMB mixture gives better fatigue results than conventional mixtures, this is due to the PMB having better viscosity property than that of the conventional bitumen.

Keywords: Beam fatigue test, chemical property, mechanical property, physical property

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2813
1500 Experimental and Finite Element Study of Bending Fatigue Failure: A Case Study on Main Shaft of a Gyrator Crusher

Authors: Rahim Sotoudeh Bahreini, Alireza Foroughi Nematollahi, Akbar Jafari

Abstract:

This study investigates the mechanism of a Gyratory crusher-located in Golgohar mining and industrial Co. specifically with a focus on stresses distribution and fatigue failure of its main shaft. At first step, the cross section of the fractured shaft is studied, and the crack growth is analyzed. Then, the rotational motion of the shaft and the oil temperature of oil circuit of equipment are monitored. Condition monitoring is used to help finding a better modification. Based on the results of this study, the main causes of shaft failure are identified, and corrective solution is offered to increase crusher performance, especially its main shaft life. To predict the efficiency of the proposed modification, finite element simulation is performed, and its results are compared with the similar modified cases. The comparison and interpretation of simulation results confirm the efficiency of proposed corrective method.

Keywords: Fatigue failure, finite element method, gyratory crusher, condition monitoring.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1635
1499 Dynamic Coupling Metrics for Service – Oriented Software

Authors: Pham Thi Quynh, Huynh Quyet Thang

Abstract:

Service-oriented systems have become popular and presented many advantages in develop and maintain process. The coupling is the most important attribute of services when they are integrated into a system. In this paper, we propose a suite of metrics to evaluate service-s quality according to its ability of coupling. We use the coupling metrics to measure the maintainability, reliability, testability, and reusability of services. Our proposed metrics are operated in run-time which bring more exact results.

Keywords: Dynamic coupling metric, SOA, web service, SOAP Extension.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1586
1498 Feature Selection with Kohonen Self Organizing Classification Algorithm

Authors: Francesco Maiorana

Abstract:

In this paper a one-dimension Self Organizing Map algorithm (SOM) to perform feature selection is presented. The algorithm is based on a first classification of the input dataset on a similarity space. From this classification for each class a set of positive and negative features is computed. This set of features is selected as result of the procedure. The procedure is evaluated on an in-house dataset from a Knowledge Discovery from Text (KDT) application and on a set of publicly available datasets used in international feature selection competitions. These datasets come from KDT applications, drug discovery as well as other applications. The knowledge of the correct classification available for the training and validation datasets is used to optimize the parameters for positive and negative feature extractions. The process becomes feasible for large and sparse datasets, as the ones obtained in KDT applications, by using both compression techniques to store the similarity matrix and speed up techniques of the Kohonen algorithm that take advantage of the sparsity of the input matrix. These improvements make it feasible, by using the grid, the application of the methodology to massive datasets.

Keywords: Clustering algorithm, Data mining, Feature selection, Grid, Kohonen Self Organizing Map.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3052
1497 Analytical Investigation of Sediment Formation and Transport in the Vicinity of the Water Intake Structures - A Case Study of the Dez Diversion Weir in Greater Dezful

Authors: M.karavanmasjedi, N.Hedayat , A.Rohani, H.Shirin

Abstract:

Sedimentation process resulting from soil erosion in the water basin especially in arid and semi-arid where poor vegetation cover in the slope of the mountains upstream could contribute to sediment formation. The consequence of sedimentation not only makes considerable change in the morphology of the river and the hydraulic characteristics but would also have a major challenge for the operation and maintenance of the canal network which depend on water flow to meet the stakeholder-s requirements. For this reason mathematical modeling can be used to simulate the effective factors on scouring, sediment transport and their settling along the waterways. This is particularly important behind the reservoirs which enable the operators to estimate the useful life of these hydraulic structures. The aim of this paper is to simulate the sedimentation and erosion in the eastern and western water intake structures of the Dez Diversion weir using GSTARS-3 software. This is done to estimate the sedimentation and investigate the ways in which to optimize the process and minimize the operational problems. Results indicated that the at the furthest point upstream of the diversion weir, the coarser sediment grains tended to settle. The reason for this is the construction of the phantom bridge and the outstanding rocks just upstream of the structure. The construction of these along the river course has reduced the momentum energy require to push the sediment loads and make it possible for them to settle wherever the river regime allows it. Results further indicated a trend for the sediment size in such a way that as the focus of study shifts downstream the size of grains get smaller and vice versa. It was also found that the finding of the GSTARS-3 had a close proximity with the sets of the observed data. This suggests that the software is a powerful analytical tool which can be applied in the river engineering project with a minimum of costs and relatively accurate results.

Keywords: Erosion, sedimentation, Dez Diversion weir, GSTARS-3

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1618
1496 Automatic Extraction of Features and Opinion-Oriented Sentences from Customer Reviews

Authors: Khairullah Khan, Baharum B. Baharudin, Aurangzeb Khan, Fazal_e_Malik

Abstract:

Opinion extraction about products from customer reviews is becoming an interesting area of research. Customer reviews about products are nowadays available from blogs and review sites. Also tools are being developed for extraction of opinion from these reviews to help the user as well merchants to track the most suitable choice of product. Therefore efficient method and techniques are needed to extract opinions from review and blogs. As reviews of products mostly contains discussion about the features, functions and services, therefore, efficient techniques are required to extract user comments about the desired features, functions and services. In this paper we have proposed a novel idea to find features of product from user review in an efficient way. Our focus in this paper is to get the features and opinion-oriented words about products from text through auxiliary verbs (AV) {is, was, are, were, has, have, had}. From the results of our experiments we found that 82% of features and 85% of opinion-oriented sentences include AVs. Thus these AVs are good indicators of features and opinion orientation in customer reviews.

Keywords: Classification, Customer Reviews, Helping Verbs, Opinion Mining.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2096
1495 Topic Modeling Using Latent Dirichlet Allocation and Latent Semantic Indexing on South African Telco Twitter Data

Authors: Phumelele P. Kubheka, Pius A. Owolawi, Gbolahan Aiyetoro

Abstract:

Twitter is one of the most popular social media platforms where users share their opinions on different subjects. Twitter can be considered a great source for mining text due to the high volumes of data generated through the platform daily. Many industries such as telecommunication companies can leverage the availability of Twitter data to better understand their markets and make an appropriate business decision. This study performs topic modeling on Twitter data using Latent Dirichlet Allocation (LDA). The obtained results are benchmarked with another topic modeling technique, Latent Semantic Indexing (LSI). The study aims to retrieve topics on a Twitter dataset containing user tweets on South African Telcos. Results from this study show that LSI is much faster than LDA. However, LDA yields better results with higher topic coherence by 8% for the best-performing model in this experiment. A higher topic coherence score indicates better performance of the model.

Keywords: Big data, latent Dirichlet allocation, latent semantic indexing, Telco, topic modeling, Twitter.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 459
1494 An Intelligent System for Phish Detection, using Dynamic Analysis and Template Matching

Authors: Chinmay Soman, Hrishikesh Pathak, Vishal Shah, Aniket Padhye, Amey Inamdar

Abstract:

Phishing, or stealing of sensitive information on the web, has dealt a major blow to Internet Security in recent times. Most of the existing anti-phishing solutions fail to handle the fuzziness involved in phish detection, thus leading to a large number of false positives. This fuzziness is attributed to the use of highly flexible and at the same time, highly ambiguous HTML language. We introduce a new perspective against phishing, that tries to systematically prove, whether a given page is phished or not, using the corresponding original page as the basis of the comparison. It analyzes the layout of the pages under consideration to determine the percentage distortion between them, indicative of any form of malicious alteration. The system design represents an intelligent system, employing dynamic assessment which accurately identifies brand new phishing attacks and will prove effective in reducing the number of false positives. This framework could potentially be used as a knowledge base, in educating the internet users against phishing.

Keywords: World Wide Web, Phishing, Internet security, data mining.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1832
1493 A Distance Function for Data with Missing Values and Its Application

Authors: Loai AbdAllah, Ilan Shimshoni

Abstract:

Missing values in data are common in real world applications. Since the performance of many data mining algorithms depend critically on it being given a good metric over the input space, we decided in this paper to define a distance function for unlabeled datasets with missing values. We use the Bhattacharyya distance, which measures the similarity of two probability distributions, to define our new distance function. According to this distance, the distance between two points without missing attributes values is simply the Mahalanobis distance. When on the other hand there is a missing value of one of the coordinates, the distance is computed according to the distribution of the missing coordinate. Our distance is general and can be used as part of any algorithm that computes the distance between data points. Because its performance depends strongly on the chosen distance measure, we opted for the k nearest neighbor classifier to evaluate its ability to accurately reflect object similarity. We experimented on standard numerical datasets from the UCI repository from different fields. On these datasets we simulated missing values and compared the performance of the kNN classifier using our distance to other three basic methods. Our  experiments show that kNN using our distance function outperforms the kNN using other methods. Moreover, the runtime performance of our method is only slightly higher than the other methods.

Keywords: Missing values, Distance metric, Bhattacharyya distance.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2751
1492 Intelligent Process and Model Applied for E-Learning Systems

Authors: Mafawez Alharbi, Mahdi Jemmali

Abstract:

E-learning is a developing area especially in education. E-learning can provide several benefits to learners. An intelligent system to collect all components satisfying user preferences is so important. This research presents an approach that it capable to personalize e-information and give the user their needs following their preferences. This proposal can make some knowledge after more evaluations made by the user. In addition, it can learn from the habit from the user. Finally, we show a walk-through to prove how intelligent process work.

Keywords: Artificial intelligence, architecture, e-learning, software engineering, processing.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1093
1491 The Effectiveness of Synthesizing A-Pillar Structures in Passenger Cars

Authors: Chris Phan, Yong Seok Park

Abstract:

The Toyota Camry is one of the best-selling cars in America. It is economical, reliable, and most importantly, safe. These attributes allowed the Camry to be the trustworthy choice when choosing dependable vehicle. However, a new finding brought question to the Camry’s safety. Since 1997, the Camry received a “good” rating on its moderate overlap front crash test through the Insurance Institute of Highway Safety. In 2012, the Insurance Institute of Highway Safety introduced a frontal small overlap crash test into the overall evaluation of vehicle occupant safety test. The 2012 Camry received a “poor” rating on this new test, while the 2015 Camry redeemed itself with a “good” rating once again. This study aims to find a possible solution that Toyota implemented to reduce the severity of a frontal small overlap crash in the Camry during a mid-cycle update. The purpose of this study is to analyze and evaluate the performance of various A-pillar shapes as energy absorbing structures in improving passenger safety in a frontal crash. First, A-pillar structures of the 2012 and 2015 Camry were modeled using CAD software, namely SolidWorks. Then, a crash test simulation using ANSYS software, was applied to the A-pillars to analyze the behavior of the structures in similar conditions. Finally, the results were compared to safety values of cabin intrusion to determine the crashworthy behaviors of both A-pillar structures by measuring total deformation. This study highlights that it is possible that Toyota improved the shape of the A-pillar in the 2015 Camry in order to receive a “good” rating from the IIHS safety evaluation once again. These findings can possibly be used to increase safety performance in future vehicles to decrease passenger injury or fatality.

Keywords: A-pillar, crashworthiness, design synthesis, finite element analysis.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 776
1490 Application of Artificial Neural Network to Classification Surface Water Quality

Authors: S. Wechmongkhonkon, N.Poomtong, S. Areerachakul

Abstract:

Water quality is a subject of ongoing concern. Deterioration of water quality has initiated serious management efforts in many countries. This study endeavors to automatically classify water quality. The water quality classes are evaluated using 6 factor indices. These factors are pH value (pH), Dissolved Oxygen (DO), Biochemical Oxygen Demand (BOD), Nitrate Nitrogen (NO3N), Ammonia Nitrogen (NH3N) and Total Coliform (TColiform). The methodology involves applying data mining techniques using multilayer perceptron (MLP) neural network models. The data consisted of 11 sites of canals in Dusit district in Bangkok, Thailand. The data is obtained from the Department of Drainage and Sewerage Bangkok Metropolitan Administration during 2007-2011. The results of multilayer perceptron neural network exhibit a high accuracy multilayer perception rate at 96.52% in classifying the water quality of Dusit district canal in Bangkok Subsequently, this encouraging result could be applied with plan and management source of water quality.

Keywords: artificial neural network, classification, surface water quality

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3209
1489 Adaptive Network Intrusion Detection Learning: Attribute Selection and Classification

Authors: Dewan Md. Farid, Jerome Darmont, Nouria Harbi, Nguyen Huu Hoa, Mohammad Zahidur Rahman

Abstract:

In this paper, a new learning approach for network intrusion detection using naïve Bayesian classifier and ID3 algorithm is presented, which identifies effective attributes from the training dataset, calculates the conditional probabilities for the best attribute values, and then correctly classifies all the examples of training and testing dataset. Most of the current intrusion detection datasets are dynamic, complex and contain large number of attributes. Some of the attributes may be redundant or contribute little for detection making. It has been successfully tested that significant attribute selection is important to design a real world intrusion detection systems (IDS). The purpose of this study is to identify effective attributes from the training dataset to build a classifier for network intrusion detection using data mining algorithms. The experimental results on KDD99 benchmark intrusion detection dataset demonstrate that this new approach achieves high classification rates and reduce false positives using limited computational resources.

Keywords: Attributes selection, Conditional probabilities, information gain, network intrusion detection.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2698
1488 Security Risk Analysis Based on the Policy Formalization and the Modeling of Big Systems

Authors: Luc Cessieux, French Navy, Adrien Derock, DCNS/IMATH

Abstract:

Security risk models have been successful in estimating the likelihood of attack for simple security threats. However, modeling complex system and their security risk is even a challenge. Many methods have been proposed to face this problem. Often difficult to manipulate, and not enough all-embracing they are not as famous as they should with administrators and deciders. We propose in this paper a new tool to model big systems on purpose. The software, takes into account attack threats and security strength.

Keywords: Security, risk management, threat, modelization.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1324