Search results for: Semantic Association Rule Mining

840 Clustering Unstructured Text Documents Using Fading Function

Authors: Pallav Roxy, Durga Toshniwal

Abstract:

Clustering unstructured text documents is an important issue in data mining community and has a number of applications such as document archive filtering, document organization and topic detection and subject tracing. In the real world, some of the already clustered documents may not be of importance while new documents of more significance may evolve. Most of the work done so far in clustering unstructured text documents overlooks this aspect of clustering. This paper, addresses this issue by using the Fading Function. The unstructured text documents are clustered. And for each cluster a statistics structure called Cluster Profile (CP) is implemented. The cluster profile incorporates the Fading Function. This Fading Function keeps an account of the time-dependent importance of the cluster. The work proposes a novel algorithm Clustering n-ary Merge Algorithm (CnMA) for unstructured text documents, that uses Cluster Profile and Fading Function. Experimental results illustrating the effectiveness of the proposed technique are also included.

Keywords: Clustering, Text Mining, Unstructured TextDocuments, Fading Function.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1942

839 Body Mass Index and Dietary Habits among Nursing College Students Living in the University Residence in Kirkuk City, Iraq

Authors: Jenan Shakoor

Abstract:

Obesity prevalence is increasing worldwide. University life is a challenging period especially for students who have to leave their familiar surroundings and settle in a new environment. The current study aimed to assess the diet and exercise habits and their association with body mass index (BMI) among nursing college students living at Kirkuk University residence. This was a descriptive study. A non-probability (purposive) sample of 101 students living in Kirkuk University residence was recruited during the period from the 15^th November 2015 to the 5^th May 2016. A questionnaire was constructed for the purpose of the study which consisted of four parts: the demographic characteristics of the study sample, eating habits, eating at college and healthy habits. The data were collected by interviewing the study sample and the weight and height were measured by a trained researcher at the college. Descriptive statistical analysis was undertaken. Data were prepared, organized and entered into the computer file; the Statistical Package for Social Science (SPSS 20) was used for data analysis. A p value≤ 0.05 was accepted as statistical significant. A total of 63 (62.4%) of the sample were aged20-21with a mean age of 22.1 (SD±0.653). A third of the sample 38 (37.6%) were from level four at college, 67 (66.3%) were female and 46 45.5% of participants were from a middle socio-economic status. 14 (13.9%) of the study sample were overweight (BMI =25-29.9kg/m²) and 6 (5.9%) were obese (BMI≥30kg/m²) compared to 73 (72.3%) were of normal weight (BMI =18.5-24.9kg/m²). With regard to eating habits and exercise, 42 (41.6%) of the students rarely ate breakfast, 79 (78.2%) eat lunch at university residence, 77 (78.2%) of the students reported rarely doing exercise and 62 (61.4%) of them were sleeping for less than eight hours. No significant association was found between the variables age, sex, level of college and socio-economic status and BMI, while there was a significant association between eating lunch at university and BMI (p =0.03). No significant association was found between eating habits, healthy habits and BMI. The prevalence of overweight and obesity among the study sample was 19.8% with female students being more obese than males. Further studies are needed to identify BMI among residence students in other colleges and increasing the awareness of undergraduate students to healthy food habits.

Keywords: Body mass index, diet, obesity, university residence.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1225

838 Discovery of Time Series Event Patterns based on Time Constraints from Textual Data

Authors: Shigeaki Sakurai, Ken Ueno, Ryohei Orihara

Abstract:

This paper proposes a method that discovers time series event patterns from textual data with time information. The patterns are composed of sequences of events and each event is extracted from the textual data, where an event is characteristic content included in the textual data such as a company name, an action, and an impression of a customer. The method introduces 7 types of time constraints based on the analysis of the textual data. The method also evaluates these constraints when the frequency of a time series event pattern is calculated. We can flexibly define the time constraints for interesting combinations of events and can discover valid time series event patterns which satisfy these conditions. The paper applies the method to daily business reports collected by a sales force automation system and verifies its effectiveness through numerical experiments.

Keywords: Text mining, sequential mining, time constraints, daily business reports.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1441

837 Knowledge Mining in Web-based Learning Environments

Authors: Nittaya Kerdprasop, Kittisak Kerdprasop

Abstract:

The state of the art in instructional design for computer-assisted learning has been strongly influenced by advances in information technology, Internet and Web-based systems. The emphasis of educational systems has shifted from training to learning. The course delivered has also been changed from large inflexible content to sequential small chunks of learning objects. The concepts of learning objects together with the advanced technologies of Web and communications support the reusability, interoperability, and accessibility design criteria currently exploited by most learning systems. These concepts enable just-in-time learning. We propose to extend theses design criteria further to include the learnability concept that will help adapting content to the needs of learners. The learnability concept offers a better personalization leading to the creation and delivery of course content more appropriate to performance and interest of each learner. In this paper we present a new framework of learning environments containing knowledge discovery as a tool to automatically learn patterns of learning behavior from learners' profiles and history.

Keywords: Knowledge mining, Web-based learning, Learning environments.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1739

836 A New Version of Annotation Method with a XML-based Knowledge Base

Authors: Mohammad Yasrebi, Somayeh Khosravi

Abstract:

Machine-understandable data when strongly interlinked constitutes the basis for the SemanticWeb. Annotating web documents is one of the major techniques for creating metadata on the Web. Annotating websitexs defines the containing data in a form which is suitable for interpretation by machines. In this paper, we present a better and improved approach than previous [1] to annotate the texts of the websites depends on the knowledge base.

Keywords: Knowledge base, ontology, semantic annotation, XML.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1526

835 Spatial Structure and Spatial Impacts of the Jakarta Metropolitan Area: A Southeast Asian EMR Perspective

Authors: Ikhwan Hakim, Bruno Parolin

Abstract:

This paper investigates the spatial structure of employment in the Jakarta Metropolitan Area (JMA), with reference to the concept of the Southeast Asian extended metropolitan region (EMR). A combination of factor analysis and local Getis-Ord (Gi*) hot-spot analysis is used to identify clusters of employment in the region, including those of the urban and agriculture sectors. Spatial statistical analysis is further used to probe the spatial association of identified employment clusters with their surroundings on several dimensions, including the spatial association between the central business district (CBD) in Jakarta city on employment density in the region, the spatial impacts of urban expansion on population growth and the degree of urban-rural interaction. The degree of spatial interaction for the whole JMA is measured by the patterns of commuting trips destined to the various employment clusters. Results reveal the strong role of the urban core of Jakarta, and the regional CBD, as the centre for mixed job sectors such as retail, wholesale, services and finance. Manufacturing and local government services, on the other hand, form corridors radiating out of the urban core, reaching out to the agriculture zones in the fringes. Strong associations between the urban expansion corridors and population growth, and urban-rural mix, are revealed particularly in the eastern and western parts of JMA. Metropolitan wide commuting patterns are focussed on the urban core of Jakarta and the CBD, while relatively local commuting patterns are shown to be prevalent for the employment corridors.

Keywords: Jakarta Metropolitan Area, Southeast Asian EMR, spatial association, spatial statistics, spatial structure.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2542

834 CoP-Networks: Virtual Spaces for New Faculty’s Professional Development in the 21st Higher Education

Authors: Eman AbuKhousa, Marwan Z. Bataineh

Abstract:

The 21^st century higher education and globalization challenge new faculty members to build effective professional networks and partnership with industry in order to accelerate their growth and success. This creates the need for community of practice (CoP)-oriented development approaches that focus on cognitive apprenticeship while considering individual predisposition and future career needs. This work adopts data mining, clustering analysis, and social networking technologies to present the CoP-Network as a virtual space that connects together similar career-aspiration individuals who are socially influenced to join and engage in a process for domain-related knowledge and practice acquisitions. The CoP-Network model can be integrated into higher education to extend traditional graduate and professional development programs.

Keywords: Clustering analysis, community of practice, data mining, higher education, new faculty challenges, social networks, social influence, professional development.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 922

833 Anomaly Based On Frequent-Outlier for Outbreak Detection in Public Health Surveillance

Authors: Zalizah Awang Long, Abdul Razak Hamdan, Azuraliza Abu Bakar

Abstract:

Public health surveillance system focuses on outbreak detection and data sources used. Variation or aberration in the frequency distribution of health data, compared to historical data is often used to detect outbreaks. It is important that new techniques be developed to improve the detection rate, thereby reducing wastage of resources in public health. Thus, the objective is to developed technique by applying frequent mining and outlier mining techniques in outbreak detection. 14 datasets from the UCI were tested on the proposed technique. The performance of the effectiveness for each technique was measured by t-test. The overall performance shows that DTK can be used to detect outlier within frequent dataset. In conclusion the outbreak detection technique using anomaly-based on frequent-outlier technique can be used to identify the outlier within frequent dataset.

Keywords: Outlier detection, frequent-outlier, outbreak, anomaly, surveillance, public health

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2231

832 Determination of the Bank's Customer Risk Profile: Data Mining Applications

Authors: Taner Ersoz, Filiz Ersoz, Seyma Ozbilge

Abstract:

In this study, the clients who applied to a bank branch for loan were analyzed through data mining. The study was composed of the information such as amounts of loans received by personal and SME clients working with the bank branch, installment numbers, number of delays in loan installments, payments available in other banks and number of banks to which they are in debt between 2010 and 2013. The client risk profile was examined through Classification and Regression Tree (CART) analysis, one of the decision tree classification methods. At the end of the study, 5 different types of customers have been determined on the decision tree. The classification of these types of customers has been created with the rating of those posing a risk for the bank branch and the customers have been classified according to the risk ratings.

Keywords: Client classification, loan suitability, risk rating, CART analysis, decision tree.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1032

831 Characteristics and Outcomes of COVID-19 Related Stroke: A Cohort Study

Authors: Kasra Afsahi, Maryam Soheilifar

Abstract:

Cerebrovascular accident (CVA) is a neurological side effect of COVID-19 disease wit high rate in pandemics. Effect of COVID-19 disease on disorder is unclear. In this cohort, patients with COVID-19 disease were assessed. 60 CVA cases were assessed in a referral hospital in 2020. The major factor was mortality and the cases were those with and without death. The groups were compared for all features about mortality in the patients with COVID-19 and CVA. Totally 23 out of 60 cases (38.3%) were expired. In univariate analysis there was significant association for death by ischemic heart disease (P = 0.015), high-severity stroke (P = 0.012), high C-reactive protein (CRP) (P = 0.001), high ESR (P = 0.009), pleural effusion (P = 0.005), pericardial effusion (P = 0.027), cardiomegaly (P = 0.005), ground glass opacity (P = 0.001), and consolidation (P = 0.001). Among these factors, there was significant association only for CRP (P = 0.001) and consolidation (P = 0.003) in multivariate analysis. Mortality in the cases with COVID-19-related CVA is one-third and it has relationship to elevated CRP and finding the consolidation in the computerized tomography scan of the lungs.

Keywords: COVID-19, stroke, prognosis, C-reactive protein, CRP.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 272

830 Improving University Operations with Data Mining: Predicting Student Performance

Authors: Mladen Dragičević, Mirjana Pejić Bach, Vanja Šimičević

Abstract:

The purpose of this paper is to develop models that would enable predicting student success. These models could improve allocation of students among colleges and optimize the newly introduced model of government subsidies for higher education. For the purpose of collecting data, an anonymous survey was carried out in the last year of undergraduate degree student population using random sampling method. Decision trees were created of which two have been chosen that were most successful in predicting student success based on two criteria: Grade Point Average (GPA) and time that a student needs to finish the undergraduate program (time-to-degree). Decision trees have been shown as a good method of classification student success and they could be even more improved by increasing survey sample and developing specialized decision trees for each type of college. These types of methods have a big potential for use in decision support systems.

Keywords: Data mining, knowledge discovery in databases, prediction models, student success.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2455

829 Data Mining on the Router Logs for Statistical Application Classification

Authors: M. Rahmati, S.M. Mirzababaei

Abstract:

With the advance of information technology in the new era the applications of Internet to access data resources has steadily increased and huge amount of data have become accessible in various forms. Obviously, the network providers and agencies, look after to prevent electronic attacks that may be harmful or may be related to terrorist applications. Thus, these have facilitated the authorities to under take a variety of methods to protect the special regions from harmful data. One of the most important approaches is to use firewall in the network facilities. The main objectives of firewalls are to stop the transfer of suspicious packets in several ways. However because of its blind packet stopping, high process power requirements and expensive prices some of the providers are reluctant to use the firewall. In this paper we proposed a method to find a discriminate function to distinguish between usual packets and harmful ones by the statistical processing on the network router logs. By discriminating these data, an administrator may take an approach action against the user. This method is very fast and can be used simply in adjacent with the Internet routers.

Keywords: Data Mining, Firewall, Optimization, Packetclassification, Statistical Pattern Recognition.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1603

828 Bayesian Meta-Analysis to Account for Heterogeneity in Studies Relating Life Events to Disease

Authors: Elizabeth Stojanovski

Abstract:

Associations between life events and various forms of cancers have been identified. The purpose of a recent random-effects meta-analysis was to identify studies that examined the association between adverse events associated with changes to financial status including decreased income and breast cancer risk. The same association was studied in four separate studies which displayed traits that were not consistent between studies such as the study design, location, and time frame. It was of interest to pool information from various studies to help identify characteristics that differentiated study results. Two random-effects Bayesian meta-analysis models are proposed to combine the reported estimates of the described studies. The proposed models allow major sources of variation to be taken into account, including study level characteristics, between study variance and within study variance, and illustrate the ease with which uncertainty can be incorporated using a hierarchical Bayesian modelling approach.

Keywords: Random-effects, meta-analysis, Bayesian, variation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 612

827 Composite Kernels for Public Emotion Recognition from Twitter

Authors: Chien-Hung Chen, Yan-Chun Hsing, Yung-Chun Chang

Abstract:

The Internet has grown into a powerful medium for information dispersion and social interaction that leads to a rapid growth of social media which allows users to easily post their emotions and perspectives regarding certain topics online. Our research aims at using natural language processing and text mining techniques to explore the public emotions expressed on Twitter by analyzing the sentiment behind tweets. In this paper, we propose a composite kernel method that integrates tree kernel with the linear kernel to simultaneously exploit both the tree representation and the distributed emotion keyword representation to analyze the syntactic and content information in tweets. The experiment results demonstrate that our method can effectively detect public emotion of tweets while outperforming the other compared methods.

Keywords: Public emotion recognition, natural language processing, composite kernel, sentiment analysis, text mining.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 714

826 Text Mining Analysis of the Reconstruction Plans after the Great East Japan Earthquake

Authors: Minami Ito, Akihiro Iijima

Abstract:

On March 11, 2011, the Great East Japan Earthquake occurred off the coast of Sanriku, Japan. It is important to build a sustainable society through the reconstruction process rather than simply restoring the infrastructure. To compare the goals of reconstruction plans of quake-stricken municipalities, Japanese language morphological analysis was performed by using text mining techniques. Frequently-used nouns were sorted into four main categories of “life”, “disaster prevention”, “economy”, and “harmony with environment”. Because Soma City is affected by nuclear accident, sentences tagged to “harmony with environment” tended to be frequent compared to the other municipalities. Results from cluster analysis and principle component analysis clearly indicated that the local government reinforces the efforts to reduce risks from radiation exposure as a top priority.

Keywords: Eco-friendly reconstruction, harmony with environment, decontamination, nuclear disaster.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1919

825 The Benefits of End-To-End Integrated Planning from the Mine to Client Supply for Minimizing Penalties

Authors: G. Martino, F. Silva, E. Marchal

Abstract:

The control over delivered iron ore blend characteristics is one of the most important aspects of the mining business. The iron ore price is a function of its composition, which is the outcome of the beneficiation process. So, end-to-end integrated planning of mine operations can reduce risks of penalties on the iron ore price. In a standard iron mining company, the production chain is composed of mining, ore beneficiation, and client supply. When mine planning and client supply decisions are made uncoordinated, the beneficiation plant struggles to deliver the best blend possible. Technological improvements in several fields allowed bridging the gap between departments and boosting integrated decision-making processes. Clusterization and classification algorithms over historical production data generate reasonable previsions for quality and volume of iron ore produced for each pile of run-of-mine (ROM) processed. Mathematical modeling can use those deterministic relations to propose iron ore blends that better-fit specifications within a delivery schedule. Additionally, a model capable of representing the whole production chain can clearly compare the overall impact of different decisions in the process. This study shows how flexibilization combined with a planning optimization model between the mine and the ore beneficiation processes can reduce risks of out of specification deliveries. The model capabilities are illustrated on a hypothetical iron ore mine with magnetic separation process. Finally, this study shows ways of cost reduction or profit increase by optimizing process indicators across the production chain and integrating the different plannings with the sales decisions.

Keywords: Clusterization and classification algorithms, integrated planning, optimization, mathematical modeling, penalty minimization.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 588

824 An Efficient Graph Query Algorithm Based on Important Vertices and Decision Features

Authors: Xiantong Li, Jianzhong Li

Abstract:

Graph has become increasingly important in modeling complicated structures and schemaless data such as proteins, chemical compounds, and XML documents. Given a graph query, it is desirable to retrieve graphs quickly from a large database via graph-based indices. Different from the existing methods, our approach, called VFM (Vertex to Frequent Feature Mapping), makes use of vertices and decision features as the basic indexing feature. VFM constructs two mappings between vertices and frequent features to answer graph queries. The VFM approach not only provides an elegant solution to the graph indexing problem, but also demonstrates how database indexing and query processing can benefit from data mining, especially frequent pattern mining. The results show that the proposed method not only avoids the enumeration method of getting subgraphs of query graph, but also effectively reduces the subgraph isomorphism tests between the query graph and graphs in candidate answer set in verification stage.

Keywords: Decision Feature, Frequent Feature, Graph Dataset, Graph Query

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1827

823 Computational Method for Annotation of Protein Sequence According to Gene Ontology Terms

Authors: Razib M. Othman, Safaai Deris, Rosli M. Illias

Abstract:

Annotation of a protein sequence is pivotal for the understanding of its function. Accuracy of manual annotation provided by curators is still questionable by having lesser evidence strength and yet a hard task and time consuming. A number of computational methods including tools have been developed to tackle this challenging task. However, they require high-cost hardware, are difficult to be setup by the bioscientists, or depend on time intensive and blind sequence similarity search like Basic Local Alignment Search Tool. This paper introduces a new method of assigning highly correlated Gene Ontology terms of annotated protein sequences to partially annotated or newly discovered protein sequences. This method is fully based on Gene Ontology data and annotations. Two problems had been identified to achieve this method. The first problem relates to splitting the single monolithic Gene Ontology RDF/XML file into a set of smaller files that can be easy to assess and process. Thus, these files can be enriched with protein sequences and Inferred from Electronic Annotation evidence associations. The second problem involves searching for a set of semantically similar Gene Ontology terms to a given query. The details of macro and micro problems involved and their solutions including objective of this study are described. This paper also describes the protein sequence annotation and the Gene Ontology. The methodology of this study and Gene Ontology based protein sequence annotation tool namely extended UTMGO is presented. Furthermore, its basic version which is a Gene Ontology browser that is based on semantic similarity search is also introduced.

Keywords: automatic clustering, bioinformatics tool, gene ontology, protein sequence annotation, semantic similarity search

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3089

822 Collective Redress in Consumer Protection in South East Europe: Cross-National Comparisons, Issues of Commonality and Difference

Authors: Veronika Efremova

Abstract:

In recent decades, there have been significant developments in the European Union in the field of collective consumer redress. South East European countries (SEE) covered by this paper, in line with their EU accession priorities and duties under Stabilisation and Association Agreements, have to harmonize their national laws with the relevant EU acquis for consumer protection (Chapter 28: Health and Consumer). In these countries, only minimal compliance is achieved. SEE countries have introduced rudimentary collective redress mechanisms, with modest enforcement of collective redress and case law. This paper is based on comprehensive interdisciplinary research conducted for SEE countries on common principles for injunctive and compensatory collective redress mechanisms, emphasizing cross-national comparisons, underlining issues of commonality and difference aiming to develop recommendations for an adequate enforcement of collective redress. SEE countries are recognized by the sectoral approach for regulating collective redress contrary to the majority of EU Member States with having adopted horizontal approach to collective redress. In most SEE countries, the laws do not recognize compensatory but only injunctive collective redress in consumer protection. All responsible stakeholders for implementation of collective redress in SEE countries, lack information and awareness on collective redress mechanisms and the way they function in practice. Therefore, specific actions are needed in these countries to make the whole system of collective redress for consumer protection operational and efficient. Taking into consideration the various designated stakeholders in collective redress in each SEE countries, there is a need of their mutual coordination and cooperation in order to develop consumer protection system and policies. By putting into practice the national collective redress mechanisms, effective access to justice for all consumers, the principle of rule of law will be secured and appropriate procedural guarantees to avoid abusive litigation will be ensured.

Keywords: Collective redress mechanism, consumer protection, commonality and difference, South East Europe.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 884

821 Influence of Apo E Polymorphism on Coronary Artery Disease

Authors: S. Fallah, M. Seifi, M. Firoozrai, T. Godarzi, M. Jafarzadeh, L. H. Ghohari

Abstract:

The ε4 allele of the ε2, ε3 and ε4 protein isoform polymorphism in the gene encoding apolipoprotein E (Apo E) has previously been associated with increased cardiac artery disease (CAD); therefore to investigate the significance of this polymorphism in pathogenesis of CAD in Iranian patients with stenosis and control subjects. To investigate the association between Apo E polymorphism and coronary artery disease we performed a comparative case control study of the frequency of Apo E polymorphism in One hundred CAD patients with stenosis who underwent coronary angiography (>50% stenosis) and 100 control subjects (<10% stenosis). The Apo E alleles and genotypes were determined by polymerase chain reaction (PCR) and restriction fragment length polymorphism (RFLP). We observed an association between the Apo E polymorphism and CAD in this study. These data suggest that the Apo ε4 and ε2 alleles increase the risk for CAD in Iranian population (χ2 =4.26, p= 0.05, OR=2 and χ2 =0.38, p=0.53, OR=1.2). These results suggest that ε4 and ε2 alleles are risk factors for stenosis.

Keywords: Arterial blood vessels, atherosclerosis, cholesterol.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1685

820 Performance Comparison of ADTree and Naive Bayes Algorithms for Spam Filtering

Authors: Thanh Nguyen, Andrei Doncescu, Pierre Siegel

Abstract:

Classification is an important data mining technique and could be used as data filtering in artificial intelligence. The broad application of classification for all kind of data leads to be used in nearly every field of our modern life. Classification helps us to put together different items according to the feature items decided as interesting and useful. In this paper, we compare two classification methods Naïve Bayes and ADTree use to detect spam e-mail. This choice is motivated by the fact that Naive Bayes algorithm is based on probability calculus while ADTree algorithm is based on decision tree. The parameter settings of the above classifiers use the maximization of true positive rate and minimization of false positive rate. The experiment results present classification accuracy and cost analysis in view of optimal classifier choice for Spam Detection. It is point out the number of attributes to obtain a tradeoff between number of them and the classification accuracy.

Keywords: Classification, data mining, spam filtering, naive Bayes, decision tree.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1448

819 Reuse of Huge Industrial Areas

Authors: Martina Perinkova, Lenka Kolarcikova, Marketa Twrda

Abstract:

Brownfields are one of the most important problems that must be solved by today's cities. The topic of this article is description of developing a comprehensive transformation of postindustrial area of the former iron factory national cultural heritage lower Vítkovice. City of Ostrava used to be industrial superpower of the Czechoslovak Republic, especially in the area of coal mining and iron production, after declining industrial production and mining in the 80s left many unused areas of former factories generally brownfields and backfields. Since the late 90s we are observing how the city officials or private entities seeking to remedy this situation. Regeneration of brownfields is a very expensive and long-term process. The area is now rebuilt for tourists and residents of the city in the entertainment, cultural, and social center. It was necessary do the reconstruction of the industrial monuments. Equally important was the construction of new buildings, which helped reusing of the entire complex. This is a unique example of transformation of technical monuments and completion of necessary new objects, so that the area could start working again and reintegrate back into the urban system.

Keywords: Brownfields, conversion, historical and industrial buildings, reconstruction.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1537

818 LiDAR Based Real Time Multiple Vehicle Detection and Tracking

Authors: Zhongzhen Luo, Saeid Habibi, Martin v. Mohrenschildt

Abstract:

Self-driving vehicle require a high level of situational awareness in order to maneuver safely when driving in real world condition. This paper presents a LiDAR based real time perception system that is able to process sensor raw data for multiple target detection and tracking in dynamic environment. The proposed algorithm is nonparametric and deterministic that is no assumptions and priori knowledge are needed from the input data and no initializations are required. Additionally, the proposed method is working on the three-dimensional data directly generated by LiDAR while not scarifying the rich information contained in the domain of 3D. Moreover, a fast and efficient for real time clustering algorithm is applied based on a radially bounded nearest neighbor (RBNN). Hungarian algorithm procedure and adaptive Kalman filtering are used for data association and tracking algorithm. The proposed algorithm is able to run in real time with average run time of 70ms per frame.

Keywords: LiDAR, real-time system, clustering, tracking, data association.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 4619

817 Job in Modern Arabic Poetry: A Semantic and Comparative Approach to Two Poems Referring to the Poet Al-Sayyab

Authors: Jeries Khoury

Abstract:

The use of legendary, folkloric and religious symbols is one of the most important phenomena in modern Arabic poetry. Interestingly enough, most of the modern Arabic poetry’s pioneers were so fascinated by the biblical symbols and they managed to use many modern techniques to make these symbols adequate for their personal life from one side and fit to their Islamic beliefs from the other. One of the most famous poets to do so was al-Sayya:b. The way he employed one of these symbols ‘job’, the new features he adds to this character and the link between this character and his personal life will be discussed in this study. Besides, the study will examine the influence of al-Sayya:b on another modern poet Saadi Yusuf, who, following al-Sayya:b, used the character of Job in a special way, by mixing its features with al-Sayya:b’s personal features and in this way creating a new mixed character. A semantic, cultural and comparative analysis of the poems written by al-Sayya:b himself and the other poets who evoked the mixed image of al-Sayya:b-Job, can reveal the changes Arab poets made to the original biblical figure of Job to bring it closer to Islamic culture. The paper will make an intensive use of intertextuality idioms in order to shed light on the network of relations between three kinds of texts (indeed three ‘palimpsests’: 1- biblical- the primary text; 2- poetic- al-Syya:b’s secondary version; 3- re-poetic- Sa’di Yusuf’s tertiary version). The bottom line in this paper is that that al-Sayya:b was directly influenced by the dramatic biblical story of Job more than the brief Quranic version of the story. In fact, the ‘new’ character of Job designed by al-Sayya:b himself differs from the original one in many aspects that we can safely say it is the Sayyabian-Job that cannot be found in the poems of any other poets, unless they are evoking the own tragedy of al-Sayya:b himself, like what Saadi Yusuf did.

Keywords: Arabic poetry, intertextuality, job, meter, modernism, symbolism.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 615

816 Acute Coronary Syndrome Prediction Using Data Mining Techniques- An Application

Authors: Tahseen A. Jilani, Huda Yasin, Madiha Yasin, C. Ardil

Abstract:

In this paper we use data mining techniques to investigate factors that contribute significantly to enhancing the risk of acute coronary syndrome. We assume that the dependent variable is diagnosis – with dichotomous values showing presence or absence of disease. We have applied binary regression to the factors affecting the dependent variable. The data set has been taken from two different cardiac hospitals of Karachi, Pakistan. We have total sixteen variables out of which one is assumed dependent and other 15 are independent variables. For better performance of the regression model in predicting acute coronary syndrome, data reduction techniques like principle component analysis is applied. Based on results of data reduction, we have considered only 14 out of sixteen factors.

Keywords: Acute coronary syndrome (ACS), binary logistic regression analyses, myocardial ischemia (MI), principle component analysis, unstable angina (U.A.).

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2074

815 Can Physical Activity and Dietary Fat Intake Influence Body Mass Index in a Cross-sectional Correlational Design?

Authors: D.O. Omondi, L.O.A. Othuon, G.M. Mbagaya

Abstract:

The purpose of this study was to determine the influence of physical activity and dietary fat intake on Body Mass Index (BMI) of lecturers within a higher learning institutionalized setting. The study adopted a Cross-sectional Correlational Design and included 120 lecturers selected proportionately by simple random sampling techniques from a population of 600 lecturers. Data was collected using questionnaires, which had sections including physical activity checklist adopted from the international physical activity questionnaire (IPAQ), 24-hour food recall, anthropometric measurements mainly weight and height. Analysis involved the use of bivariate correlations and linear regression. A significant inverse association was registered between BMI and duration (in minutes) spent doing moderate intense physical activity per day (r=-0.322, p<0.01). Physical activity also predicted BMI (r2=0.096, F=13.616, β=-3.22, t=-3.69, n=120, P<0.01). However, the association between Body Mass Index and dietary fat was not significant (r=0.038, p>0.05). Physical activity emerged as a more powerful determinant of BMI compared to dietary fat intake.

Keywords: Physical activity, dietary fat intake, Body MassIndex, Kenya.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1657

814 Data Mining Determination of Sunlight Average Input for Solar Power Plant

Authors: Fl. Loury, P. Sablonière, C. Lamoureux, G. Magnier, Th. Gutierrez

Abstract:

A method is proposed to extract faithful representative patterns from data set of observations when they are suffering from non-negligible fluctuations. Supposing time interval between measurements to be extremely small compared to observation time, it consists in defining first a subset of intermediate time intervals characterizing coherent behavior. Data projection on these intervals gives a set of curves out of which an ideally “perfect” one is constructed by taking the sup limit of them. Then comparison with average real curve in corresponding interval gives an efficiency parameter expressing the degradation consecutive to fluctuation effect. The method is applied to sunlight data collected in a specific place, where ideal sunlight is the one resulting from direct exposure at location latitude over the year, and efficiency is resulting from action of meteorological parameters, mainly cloudiness, at different periods of the year. The extracted information already gives interesting element of decision, before being used for analysis of plant control.

Keywords: Base Input Reconstruction, Data Mining, Efficiency Factor, Information Pattern Operator.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1479

813 The Sequestration of Heavy Metals Contaminating the Wonderfonteinspruit Catchment Area using Natural Zeolite

Authors: P.P. Diale, S.S.L. Mkhize, E. Muzenda, J. Zimba

Abstract:

For more than 120 years, gold mining formed the backbone the South Africa-s economy. The consequence of mine closure was observed in large-scale land degradation and widespread pollution of surface water and groundwater. This paper investigates the feasibility of using natural zeolite in removing heavy metals contaminating the Wonderfonteinspruit Catchment Area (WCA), a water stream with high levels of heavy metals and radionuclide pollution. Batch experiments were conducted to study the adsorption behavior of natural zeolite with respect to Fe2+, Mn2+, Ni2+, and Zn2+. The data was analysed using the Langmuir and Freudlich isotherms. Langmuir was found to correlate the adsorption of Fe2+, Mn2+, Ni2+, and Zn2+ better, with the adsorption capacity of 11.9 mg/g, 1.2 mg/g, 1.3 mg/g, and 14.7 mg/g, respectively. Two kinetic models namely, pseudo-first order and pseudo second order were also tested to fit the data. Pseudo-second order equation was found to be the best fit for the adsorption of heavy metals by natural zeolite. Zeolite functionalization with humic acid increased its uptake ability.

Keywords: gold-mining, natural zeolites, water pollution, WestRand.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2467

812 Determining Cluster Boundaries Using Particle Swarm Optimization

Authors: Anurag Sharma, Christian W. Omlin

Abstract:

Self-organizing map (SOM) is a well known data reduction technique used in data mining. Data visualization can reveal structure in data sets that is otherwise hard to detect from raw data alone. However, interpretation through visual inspection is prone to errors and can be very tedious. There are several techniques for the automatic detection of clusters of code vectors found by SOMs, but they generally do not take into account the distribution of code vectors; this may lead to unsatisfactory clustering and poor definition of cluster boundaries, particularly where the density of data points is low. In this paper, we propose the use of a generic particle swarm optimization (PSO) algorithm for finding cluster boundaries directly from the code vectors obtained from SOMs. The application of our method to unlabeled call data for a mobile phone operator demonstrates its feasibility. PSO algorithm utilizes U-matrix of SOMs to determine cluster boundaries; the results of this novel automatic method correspond well to boundary detection through visual inspection of code vectors and k-means algorithm.

Keywords: Particle swarm optimization, self-organizing maps, clustering, data mining.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1674

811 Comparison of Adsorbents for Ammonia Removal from Mining Wastewater

Authors: Farooq A. Al-Sheikh, Carol Moralejo, Mark Pritzker, William A. Anderson, Ali Elkamel

Abstract:

Ammonia in mining wastewater is a significant problem, and treatment can be especially difficult in cold climates where biological treatment is not feasible. An adsorption process is one of the alternative processes that can be used to reduce ammonia concentrations to acceptable limits, and therefore a LEWATIT resin strongly acidic H+ form ion exchange resin and a Bowie Chabazite Na form AZLB-Na zeolite were tested to assess their effectiveness. For these adsorption tests, two packed bed columns (a mini-column constructed from a 32-cm long x 1-cm diameter piece of glass tubing, and a 60-cm long x 2.5-cm diameter Ace Glass chromatography column) were used containing varying quantities of the adsorbents. A mining wastewater with ammonia concentrations of 22.7 mg/L was fed through the columns at controlled flowrates. In the experimental work, maximum capacities of the LEWATIT ion exchange resin were 0.438, 0.448, and 1.472 mg/g for 3, 6, and 9 g respectively in a mini column and 1.739 mg/g for 141.5 g in a larger Ace column while the capacities for the AZLB-Na zeolite were 0.424, and 0.784 mg/g for 3, and 6 g respectively in the mini column and 1.1636 mg/g for 38.5 g in the Ace column. In the theoretical work, Thomas, Adams-Bohart, and Yoon-Nelson models were constructed to describe a breakthrough curve of the adsorption process and find the constants of the above-mentioned models. In the regeneration tests, 5% hydrochloric acid, HCl (v/v) and 10% sodium hydroxide, NaOH (w/v) were used to regenerate the LEWATIT resin and AZLB-Na zeolite with 44 and 63.8% recovery, respectively. In conclusion, continuous flow adsorption using a LEWATIT ion exchange resin and an AZLB-Na zeolite is efficient when using a co-flow technique for removal of the ammonia from wastewater. Thomas, Adams-Bohart, and Yoon-Nelson models satisfactorily fit the data with R² closer to 1 in all cases.

Keywords: AZLB-Na zeolite, continuous adsorption, LEWATIT resin, models, regeneration.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1181