Search results for: Fawzy Torkey
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 6

Search results for: Fawzy Torkey

6 A Text Mining Technique Using Association Rules Extraction

Authors: Hany Mahgoub, Dietmar Rösner, Nabil Ismail, Fawzy Torkey

Abstract:

This paper describes text mining technique for automatically extracting association rules from collections of textual documents. The technique called, Extracting Association Rules from Text (EART). It depends on keyword features for discover association rules amongst keywords labeling the documents. In this work, the EART system ignores the order in which the words occur, but instead focusing on the words and their statistical distributions in documents. The main contributions of the technique are that it integrates XML technology with Information Retrieval scheme (TFIDF) (for keyword/feature selection that automatically selects the most discriminative keywords for use in association rules generation) and use Data Mining technique for association rules discovery. It consists of three phases: Text Preprocessing phase (transformation, filtration, stemming and indexing of the documents), Association Rule Mining (ARM) phase (applying our designed algorithm for Generating Association Rules based on Weighting scheme GARW) and Visualization phase (visualization of results). Experiments applied on WebPages news documents related to the outbreak of the bird flu disease. The extracted association rules contain important features and describe the informative news included in the documents collection. The performance of the EART system compared with another system that uses the Apriori algorithm throughout the execution time and evaluating extracted association rules.

Keywords: Text mining, data mining, association rule mining

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 4406
5 K-Means for Spherical Clusters with Large Variance in Sizes

Authors: A. M. Fahim, G. Saake, A. M. Salem, F. A. Torkey, M. A. Ramadan

Abstract:

Data clustering is an important data exploration technique with many applications in data mining. The k-means algorithm is well known for its efficiency in clustering large data sets. However, this algorithm is suitable for spherical shaped clusters of similar sizes and densities. The quality of the resulting clusters decreases when the data set contains spherical shaped with large variance in sizes. In this paper, we introduce a competent procedure to overcome this problem. The proposed method is based on shifting the center of the large cluster toward the small cluster, and recomputing the membership of small cluster points, the experimental results reveal that the proposed algorithm produces satisfactory results.

Keywords: K-Means, Data Clustering, Cluster Analysis.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3259
4 Efficiency Evaluation of E-Commerce Websites

Authors: A. K. Abd El-Aleem, W. F. Abd El-wahed, N. A. Ismail, F. A. Torkey

Abstract:

This study suggests a model of a new set of evaluation criteria that will be used to measure the efficiency of real-world E-commerce websites. Evaluation criteria include design, usability and performance for websites, the Data Envelopment Analysis (DEA) technique has been used to measure the websites efficiency. An efficient Web site is defined as a site that generates the most outputs, using the smallest amount of inputs. Inputs refer to measurements representing the amount of effort required to build, maintain and perform the site. Output is amount of traffic the site generates. These outputs are measured as the average number of daily hits and the average number of daily unique visitors.

Keywords: Data Envelopment Analysis, E-commerce, Efficiency.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 4074
3 Density Clustering Based On Radius of Data (DCBRD)

Authors: A.M. Fahim, A. M. Salem, F. A. Torkey, M. A. Ramadan

Abstract:

Clustering algorithms are attractive for the task of class identification in spatial databases. However, the application to large spatial databases rises the following requirements for clustering algorithms: minimal requirements of domain knowledge to determine the input parameters, discovery of clusters with arbitrary shape and good efficiency on large databases. The well-known clustering algorithms offer no solution to the combination of these requirements. In this paper, a density based clustering algorithm (DCBRD) is presented, relying on a knowledge acquired from the data by dividing the data space into overlapped regions. The proposed algorithm discovers arbitrary shaped clusters, requires no input parameters and uses the same definitions of DBSCAN algorithm. We performed an experimental evaluation of the effectiveness and efficiency of it, and compared this results with that of DBSCAN. The results of our experiments demonstrate that the proposed algorithm is significantly efficient in discovering clusters of arbitrary shape and size.

Keywords: Clustering Algorithms, Arbitrary Shape of clusters, cluster Analysis.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1847
2 DCBOR: A Density Clustering Based on Outlier Removal

Authors: A. M. Fahim, G. Saake, A. M. Salem, F. A. Torkey, M. A. Ramadan

Abstract:

Data clustering is an important data exploration technique with many applications in data mining. We present an enhanced version of the well known single link clustering algorithm. We will refer to this algorithm as DCBOR. The proposed algorithm alleviates the chain effect by removing the outliers from the given dataset. So this algorithm provides outlier detection and data clustering simultaneously. This algorithm does not need to update the distance matrix, since the algorithm depends on merging the most k-nearest objects in one step and the cluster continues grow as long as possible under specified condition. So the algorithm consists of two phases; at the first phase, it removes the outliers from the input dataset. At the second phase, it performs the clustering process. This algorithm discovers clusters of different shapes, sizes, densities and requires only one input parameter; this parameter represents a threshold for outlier points. The value of the input parameter is ranging from 0 to 1. The algorithm supports the user in determining an appropriate value for it. We have tested this algorithm on different datasets contain outlier and connecting clusters by chain of density points, and the algorithm discovers the correct clusters. The results of our experiments demonstrate the effectiveness and the efficiency of DCBOR.

Keywords: Data Clustering, Clustering Algorithms, Handling Noise, Arbitrary Shape of Clusters.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1911
1 Improving Production Traits for El-Salam and Mandarah Chicken Strains by Crossing II-Estimation of Crossbreeding Effects on Egg Production and Egg Quality Traits

Authors: Ayman E. Taha, Fawzy A. Abd El-Ghany

Abstract:

A crossbreeding experiment was carried out between two Egyptian strains of chickens namely Mandarah (MM) and El-Salam (SS). The two purebred strains and their reciprocal crosses (MS and SM) were used to estimate the effect of crossing on egg laying and egg quality parameters, direct additive and maternal additive effects as well as heterosis and direct heterosis percentages for studied traits. Results revealed that SM cross recorded the highest significant averages for most of egg production traits including body weight at sexual maturity (BW1), egg numbers at first 90 days, 42 weeks and 65 weeks of age (EN1, EN2 and EN3; respectively), egg weight at 90 days, 42 weeks of age (EW1 and EW2), egg mass at 90 days, 42 weeks and 65 weeks of age (EM1, EM2 and EM3; respectively), feed conversion ratio to egg production at 90 days , 42 weeks and 65 weeks of age (FCR1, FCR2 and FCR3; respectively), fertility and commercial hatchability percentages. Moreover, SM line reached the age sexual maturity (ASM) and period to the first ten eggs (Pf10 egg) at earlier age than other lines. On the other hand, crossing did not well improve egg quality parameters. Estimates and percentages of direct additive effect (GI) were negative for most of the studied traits except for EN1, EN2, EN3, FCR3, fertility, scientific and commercial hatchability percentages that were positive. But Estimates and percentages of maternal heterosis (Gm) were positive for all the studied traits of egg production, except for BW2, BW3, ASM, Pf10, FCR1, FCR2, FCR3 and scientific hatchability that were negative. Also, positive estimates and percentages of heterosis were recorded for most of egg production and egg quality traits. It was concluded that using of SS strain as a sire line and MM strain as a dam line resulting in best new commercial egg line (SM) which is of great concern for poultry breeder in Egypt.

Keywords: Mandarahand El-Salam chickens, Crossing, Egg production, Egg quality, Crossbreeding components.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2846