Search results for: scientific databases.
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 668

Search results for: scientific databases.

608 Application of Scientific Metrics to Evaluate Academic Reputation in Different Research Areas

Authors: Cristiano R. Cervi, Renata Galante, José Palazzo M. de Oliveira

Abstract:

In this paper, we address the problem of identifying academic reputation of researchers using scientific metrics in different research areas. Due to the characteristics of each area, researchers can present different behaviors. In previous work, we define Rep-Index that makes use of a profile template to individually identify the reputation of researchers. The Rep-Index is comprehensive and adaptive because involves hole trajectory of the researcher built throughout his career and can be used in different areas and in different contexts. Now, we compare our metric (Rep-Index) with the h-index and the g-index through experiments with researchers in the fields of Economics, Dentistry and Computer Science. We analyze the trajectory of 830 Brazilian researchers from the National Council of Technological and Scientific Development (CNPq), which receive grants research productivity. The grants are aimed at productivity researchers that stand out among their peers, enhancing their scientific normative criteria established by CNPq. Of the 830 researchers, 210 are in the area of Economics, 216 of Dentistry e 404 of Computer Science. The experiments show that our metric is strongly correlated with h-index, g-index and CNPq ranking. We also show good results for our hypothesis that our metric can be used to evaluate research in several areas. We apply our metric (Rep-Index) to compare the behavior of researchers in relation to their h-index and g-index through extensive experiments. The experiments showed that our metric is strongly correlated with h-index, g-index and CNPq ranking.

Keywords: Researcher reputation, profile model, scientific metrics.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1975
607 Simulation Data Summarization Based on Spatial Histograms

Authors: Jing Zhao, Yoshiharu Ishikawa, Chuan Xiao, Kento Sugiura

Abstract:

In order to analyze large-scale scientific data, research on data exploration and visualization has gained popularity. In this paper, we focus on the exploration and visualization of scientific simulation data, and define a spatial V-Optimal histogram for data summarization. We propose histogram construction algorithms based on a general binary hierarchical partitioning as well as a more specific one, the l-grid partitioning. For effective data summarization and efficient data visualization in scientific data analysis, we propose an optimal algorithm as well as a heuristic algorithm for histogram construction. To verify the effectiveness and efficiency of the proposed methods, we conduct experiments on the massive evacuation simulation data.

Keywords: Simulation data, data summarization, spatial histograms, exploration and visualization.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 724
606 A Tree Based Association Rule Approach for XML Data with Semantic Integration

Authors: D. Sasikala, K. Premalatha

Abstract:

The use of eXtensible Markup Language (XML) in web, business and scientific databases lead to the development of methods, techniques and systems to manage and analyze XML data. Semi-structured documents suffer due to its heterogeneity and dimensionality. XML structure and content mining represent convergence for research in semi-structured data and text mining. As the information available on the internet grows drastically, extracting knowledge from XML documents becomes a harder task. Certainly, documents are often so large that the data set returned as answer to a query may also be very big to convey the required information. To improve the query answering, a Semantic Tree Based Association Rule (STAR) mining method is proposed. This method provides intentional information by considering the structure, content and the semantics of the content. The method is applied on Reuter’s dataset and the results show that the proposed method outperforms well.

Keywords: Semi--structured Document, Tree based Association Rule (TAR), Semantic Association Rule Mining.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2321
605 Mapping of Adrenal Gland Diseases Research in Middle East Countries: A Scientometric Analysis, 2007-2013

Authors: Zahra Emami, Mohammad Ebrahim Khamseh, Nahid Hashemi Madani, Iman Kermani

Abstract:

The aim of the study was to map scientific research on adrenal gland diseases in the Middle East countries through the Web of Science database using scientometric analysis. Data were analyzed with Excel software; and HistCite was used for mapping of the scientific texts. In this study, from a total of 268 retrieved records, 1125 authors from 328 institutions published their texts in 138 journals. Among 17 Middle East countries, Turkey ranked first with 164 documents (61.19%), Israel ranked second with 47 documents (15.53%) and Iran came in the third place with 26 documents. Most of the publications (185 documents, 69.2%) were articles. Among the universities of the Middle East, Istanbul University had the highest science production rate (9.7%). The Journal of Clinical Endocrinology & Metabolism had the highest TGCS (243 citations). In the scientific mapping, 7 clusters were formed based on TLCS (Total Local Citation Score) & TGCS (Total Global Citation Score). considering the study results, establishment of scientific connections and collaboration with other countries and use of publications on adrenal gland diseases from high ranking universities can help in the development of this field and promote the medical practice in this regard. Moreover, investigation of the formed clusters in relation to Congenital Hyperplasia and puberty related disorders can be research priorities for investigators.

Keywords: Mapping, scientific research, adrenal gland diseases, scientometric.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1340
604 Visualization and Indexing of Spectral Databases

Authors: Tibor Kulcsar, Gabor Sarossy, Gabor Bereznai, Robert Auer, Janos Abonyi

Abstract:

On-line (near infrared) spectroscopy is widely used to support the operation of complex process systems. Information extracted from spectral database can be used to estimate unmeasured product properties and monitor the operation of the process. These techniques are based on looking for similar spectra by nearest neighborhood algorithms and distance based searching methods. Search for nearest neighbors in the spectral space is an NP-hard problem, the computational complexity increases by the number of points in the discrete spectrum and the number of samples in the database. To reduce the calculation time some kind of indexing could be used. The main idea presented in this paper is to combine indexing and visualization techniques to reduce the computational requirement of estimation algorithms by providing a two dimensional indexing that can also be used to visualize the structure of the spectral database. This 2D visualization of spectral database does not only support application of distance and similarity based techniques but enables the utilization of advanced clustering and prediction algorithms based on the Delaunay tessellation of the mapped spectral space. This means the prediction has not to use the high dimension space but can be based on the mapped space too. The results illustrate that the proposed method is able to segment (cluster) spectral databases and detect outliers that are not suitable for instance based learning algorithms.

Keywords: indexing high dimensional databases, dimensional reduction, clustering, similarity, k-nn algorithm.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1750
603 NOHIS-Tree: High-Dimensional Index Structure for Similarity Search

Authors: Mounira Taileb, Sami Touati

Abstract:

In Content-Based Image Retrieval systems it is important to use an efficient indexing technique in order to perform and accelerate the search in huge databases. The used indexing technique should also support the high dimensions of image features. In this paper we present the hierarchical index NOHIS-tree (Non Overlapping Hierarchical Index Structure) when we scale up to very large databases. We also present a study of the influence of clustering on search time. The performance test results show that NOHIS-tree performs better than SR-tree. Tests also show that NOHIS-tree keeps its performances in high dimensional spaces. We include the performance test that try to determine the number of clusters in NOHIS-tree to have the best search time.

Keywords: High-dimensional indexing, k-nearest neighborssearch.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1423
602 Migration of the Relational Data Base (RDB) to the Object Relational Data Base (ORDB)

Authors: Alae El Alami, Mohamed Bahaj

Abstract:

This paper proposes an approach for translating an existing relational database (RDB) schema into ORDB. The transition is done with methods that can extract various functions from a RDB which is based on aggregations, associations between the various tables, and the reflexive relationships. These methods can extract even the inheritance knowing that no process of reverse engineering can know that it is an Inheritance; therefore, our approach exceeded all of the previous studies made for ​​the transition from RDB to ORDB. In summation, the creation of the New Data Model (NDM) that stocks the RDB in a form of a structured table, and from the NDM we create our navigational model in order to simplify the implementation object from which we develop our different types. Through these types we precede to the last step, the creation of tables.

The step mentioned above does not require any human interference. All this is done automatically, and a prototype has already been created which proves the effectiveness of this approach.

Keywords: Relational databases, Object-relational databases, Semantic enrichment.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1924
601 Web–Based Tools and Databases for Micro-RNA Analysis: A Review

Authors: Sitansu Kumar Verma, Soni Yadav, Jitendra Singh, Shraddha, Ajay Kumar

Abstract:

MicroRNAs (miRNAs), a class of approximately 22 nucleotide long non coding RNAs which play critical role in different biological processes. The mature microRNA is usually 19–27 nucleotides long and is derived from a bigger precursor that folds into a flawed stem-loop structure. Mature micro RNAs are involved in many cellular processes that encompass development, proliferation, stress response, apoptosis, and fat metabolism by gene regulation. Resent finding reveals that certain viruses encode their own miRNA that processed by cellular RNAi machinery. In recent research indicate that cellular microRNA can target the genetic material of invading viruses. Cellular microRNA can be used in the virus life cycle; either to up regulate or down regulate viral gene expression Computational tools use in miRNA target prediction has been changing drastically in recent years. Many of the methods have been made available on the web and can be used by experimental researcher and scientist without expert knowledge of bioinformatics. With the development and ease of use of genomic technologies and computational tools in the field of microRNA biology has superior tremendously over the previous decade. This review attempts to give an overview over the genome wide approaches that have allow for the discovery of new miRNAs and development of new miRNA target prediction tools and databases.

Keywords: MicroRNAs, computational tools, gene regulation, databases, RNAi.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3159
600 Structure of Doctoral Students- Research Competences in Sustainability Context

Authors: I. Bolgzda, E. Olehnovica

Abstract:

Qualification of doctoral students- and the candidates for a scientific degree is evaluated by the ability to solve scientific ideas in an innovative way, consequently, being a potential of research and science they play a significant role in the sustainability context of the society. The article deals with the analysis of the results of the pilot project, the aim of which has been to study the structure of doctoral students- research competences in the sustainability context. With the existance of variety of theories on research competence development, their analysis focuses on the attained aim approach. Three competence groups have been identified in this study: informative, communicative and instrumental. Within the study the doctoral students and candidates for a scientific degree (N=64) made their self-assessment of research competences. The study results depict their present research competence development level and its dynamics according to the aim to attain.

Keywords: competence structure, doctoral students, researchactivity, sustainability.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1462
599 Multidimensional Data Mining by Means of Randomly Travelling Hyper-Ellipsoids

Authors: Pavel Y. Tabakov, Kevin Duffy

Abstract:

The present study presents a new approach to automatic data clustering and classification problems in large and complex databases and, at the same time, derives specific types of explicit rules describing each cluster. The method works well in both sparse and dense multidimensional data spaces. The members of the data space can be of the same nature or represent different classes. A number of N-dimensional ellipsoids are used for enclosing the data clouds. Due to the geometry of an ellipsoid and its free rotation in space the detection of clusters becomes very efficient. The method is based on genetic algorithms that are used for the optimization of location, orientation and geometric characteristics of the hyper-ellipsoids. The proposed approach can serve as a basis for the development of general knowledge systems for discovering hidden knowledge and unexpected patterns and rules in various large databases.

Keywords: Classification, clustering, data minig, genetic algorithms.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1752
598 Contributions of Non-Formal Educational Spaces for the Scientific Literacy of Deaf Students

Authors: Rafael Dias Silva

Abstract:

The school is a social institution that should promote learning situations that remain throughout life. Based on this, the teaching activities promoted in museum spaces can represent an educational strategy that contributes to the learning process in a more meaningful way. This article systematizes a series of elements that guide the use of these spaces for the scientific literacy of deaf students and as experiences of this nature are favorable for the school development through the concept of the circularity. The methodology for the didactic use of these spaces of non-formal education is one of the reflections developed in this study and how such environments can contribute to the learning in the classroom. To develop in the student the idea of ​​association making him create connections with the curricular proposal and notice how the proposed activity is articulated. It is in our interest that the experience lived in the museum be shared collaborating for the construction of a scientific literacy and cultural identity through the research.

Keywords: Accessibility in museums, Brazilian sign language, deaf students, teacher training.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 780
597 The Strategy for Increasing the Competitiveness of Georgia

Authors: G. Erkomaishvili

Abstract:

The paper discusses economic policy of Georgia aiming to increase national competitiveness as well as the tools and means which will help to improve the competitiveness of the country. The sectors of the economy, in which the country can achieve the competitive advantage, are studied. It is noted that the country’s economic policy plays an important role in obtaining and maintaining the competitive advantage - authority should take measures to ensure high level of education; scientific and research activities should be funded by the state; foreign direct investments should be attracted mainly in science-intensive industries; adaptation with the latest scientific achievements of the modern world and deepening of scientific and technical cooperation. Stable business environment and export oriented strategy is the basis for the country’s economic growth. As the outcome of the research, the paper suggests the strategy for improving competitiveness in Georgia; recommendations are provided based on relevant conclusions.

Keywords: Competitive advantage, competitiveness, competitiveness improvement strategy, competitiveness of Georgia.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1871
596 Indexing and Searching of Image Data in Multimedia Databases Using Axial Projection

Authors: Khalid A. Kaabneh

Abstract:

This paper introduces and studies new indexing techniques for content-based queries in images databases. Indexing is the key to providing sophisticated, accurate and fast searches for queries in image data. This research describes a new indexing approach, which depends on linear modeling of signals, using bases for modeling. A basis is a set of chosen images, and modeling an image is a least-squares approximation of the image as a linear combination of the basis images. The coefficients of the basis images are taken together to serve as index for that image. The paper describes the implementation of the indexing scheme, and presents the findings of our extensive evaluation that was conducted to optimize (1) the choice of the basis matrix (B), and (2) the size of the index A (N). Furthermore, we compare the performance of our indexing scheme with other schemes. Our results show that our scheme has significantly higher performance.

Keywords: Axial Projection, images, indexing, multimedia database, searching.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1372
595 Database Placement on Large-Scale Systems

Authors: Cherif Haddad, Faouzi Ben Charrada

Abstract:

Large-scale systems such as Grids offer infrastructures for both data distribution and parallel processing. The use of Grid infrastructures is a more recent issue that is already impacting the Distributed Database Management System industry. In DBMS, distributed query processing has emerged as a fundamental technique for ensuring high performance in distributed databases. Database placement is particularly important in large-scale systems because it reduces communication costs and improves resource usage. In this paper, we propose a dynamic database placement policy that depends on query patterns and Grid sites capabilities. We evaluate the performance of the proposed database placement policy using simulations. The obtained results show that dynamic database placement can significantly improve the performance of distributed query processing.

Keywords: Large-scale systems, Grid environment, Distributed Databases, Distributed query processing, Database placement

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1476
594 Novelty as a Measure of Interestingness in Knowledge Discovery

Authors: Vasudha Bhatnagar, Ahmed Sultan Al-Hegami, Naveen Kumar

Abstract:

Rule Discovery is an important technique for mining knowledge from large databases. Use of objective measures for discovering interesting rules leads to another data mining problem, although of reduced complexity. Data mining researchers have studied subjective measures of interestingness to reduce the volume of discovered rules to ultimately improve the overall efficiency of KDD process. In this paper we study novelty of the discovered rules as a subjective measure of interestingness. We propose a hybrid approach based on both objective and subjective measures to quantify novelty of the discovered rules in terms of their deviations from the known rules (knowledge). We analyze the types of deviation that can arise between two rules and categorize the discovered rules according to the user specified threshold. We implement the proposed framework and experiment with some public datasets. The experimental results are promising.

Keywords: Knowledge Discovery in Databases (KDD), Interestingness, Subjective Measures, Novelty Index.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1783
593 Extracting the Coupled Dynamics in Thin-Walled Beams from Numerical Data Bases

Authors: Mohammad A. Bani-Khaled

Abstract:

In this work we use the Discrete Proper Orthogonal Decomposition transform to characterize the properties of coupled dynamics in thin-walled beams by exploiting numerical simulations obtained from finite element simulations. The outcomes of the will improve our understanding of the linear and nonlinear coupled behavior of thin-walled beams structures. Thin-walled beams have widespread usage in modern engineering application in both large scale structures (aeronautical structures), as well as in nano-structures (nano-tubes). Therefore, detailed knowledge in regard to the properties of coupled vibrations and buckling in these structures are of great interest in the research community. Due to the geometric complexity in the overall structure and in particular in the cross-sections it is necessary to involve computational mechanics to numerically simulate the dynamics. In using numerical computational techniques, it is not necessary to over simplify a model in order to solve the equations of motions. Computational dynamics methods produce databases of controlled resolution in time and space. These numerical databases contain information on the properties of the coupled dynamics. In order to extract the system dynamic properties and strength of coupling among the various fields of the motion, processing techniques are required. Time- Proper Orthogonal Decomposition transform is a powerful tool for processing databases for the dynamics. It will be used to study the coupled dynamics of thin-walled basic structures. These structures are ideal to form a basis for a systematic study of coupled dynamics in structures of complex geometry.

Keywords: Coupled dynamics, geometric complexity, Proper Orthogonal Decomposition (POD), thin walled beams.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 993
592 A Framework for Data Mining Based Multi-Agent: An Application to Spatial Data

Authors: H. Baazaoui Zghal, S. Faiz, H. Ben Ghezala

Abstract:

Data mining is an extraordinarily demanding field referring to extraction of implicit knowledge and relationships, which are not explicitly stored in databases. A wide variety of methods of data mining have been introduced (classification, characterization, generalization...). Each one of these methods includes more than algorithm. A system of data mining implies different user categories,, which mean that the user-s behavior must be a component of the system. The problem at this level is to know which algorithm of which method to employ for an exploratory end, which one for a decisional end, and how can they collaborate and communicate. Agent paradigm presents a new way of conception and realizing of data mining system. The purpose is to combine different algorithms of data mining to prepare elements for decision-makers, benefiting from the possibilities offered by the multi-agent systems. In this paper the agent framework for data mining is introduced, and its overall architecture and functionality are presented. The validation is made on spatial data. Principal results will be presented.

Keywords: Databases, data mining, multi-agent, spatial datamart.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2025
591 Analysis of Scientific Attitude, Computer Anxiety, Educational Internet Use, Problematic Internet Use, and Academic Achievement of Middle School Students According to Demographic Variables

Authors: Mehmet Bekmezci, Ismail Celik, Ismail Sahin, Ahmet Kiray, A. Oguz Akturk

Abstract:

In this research, students’ scientific attitude, computer anxiety, educational use of the Internet, academic achievement, and problematic use of the Internet are analyzed based on different variables (gender, parents’ educational level and daily access to the Internet). The research group involves 361 students from two middle schools which are located in the center of Konya. The “general survey method” is adopted in the research. In accordance with the purpose of the study, percentage, mean, standard deviation, independent samples t--‐test, ANOVA (variance) are employed in the study. A total of four scales are implemented. These four scales include a total of 13 sub-dimensions. The scores from these scales and their subscales are studied in terms of various variables. In the research, students’ scientific attitude, computer anxiety, educational use of the Internet, the problematic Internet use and academic achievement (gender, parent educational level, and daily access to the Internet) are investigated based on various variables and some significant relations are found.

Keywords: Scientific Attitude, Educational use of the Internet, Computer Anxiety, Problematic use of the Internet, Academic Achievement.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1444
590 Algorithm for Information Retrieval Optimization

Authors: Kehinde K. Agbele, Kehinde Daniel Aruleba, Eniafe F. Ayetiran

Abstract:

When using Information Retrieval Systems (IRS), users often present search queries made of ad-hoc keywords. It is then up to the IRS to obtain a precise representation of the user’s information need and the context of the information. This paper investigates optimization of IRS to individual information needs in order of relevance. The study addressed development of algorithms that optimize the ranking of documents retrieved from IRS. This study discusses and describes a Document Ranking Optimization (DROPT) algorithm for information retrieval (IR) in an Internet-based or designated databases environment. Conversely, as the volume of information available online and in designated databases is growing continuously, ranking algorithms can play a major role in the context of search results. In this paper, a DROPT technique for documents retrieved from a corpus is developed with respect to document index keywords and the query vectors. This is based on calculating the weight (

Keywords: Internet ranking,

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1453
589 Face Detection in Color Images using Color Features of Skin

Authors: Fattah Alizadeh, Saeed Nalousi, Chiman Savari

Abstract:

Because of increasing demands for security in today-s society and also due to paying much more attention to machine vision, biometric researches, pattern recognition and data retrieval in color images, face detection has got more application. In this article we present a scientific approach for modeling human skin color, and also offer an algorithm that tries to detect faces within color images by combination of skin features and determined threshold in the model. Proposed model is based on statistical data in different color spaces. Offered algorithm, using some specified color threshold, first, divides image pixels into two groups: skin pixel group and non-skin pixel group and then based on some geometric features of face decides which area belongs to face. Two main results that we received from this research are as follow: first, proposed model can be applied easily on different databases and color spaces to establish proper threshold. Second, our algorithm can adapt itself with runtime condition and its results demonstrate desirable progress in comparison with similar cases.

Keywords: face detection, skin color modeling, color, colorfulimages, face recognition.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2280
588 UTMGO: A Tool for Searching a Group of Semantically Related Gene Ontology Terms and Application to Annotation of Anonymous Protein Sequence

Authors: Razib M. Othman, Safaai Deris, Rosli M. Illias

Abstract:

Gene Ontology terms have been actively used to annotate various protein sets. SWISS-PROT, TrEMBL, and InterPro are protein databases that are annotated according to the Gene Ontology terms. However, direct implementation of the Gene Ontology terms for annotation of anonymous protein sequences is not easy, especially for species not commonly represented in biological databases. UTMGO is developed as a tool that allows the user to quickly and easily search for a group of semantically related Gene Ontology terms. The applicability of the UTMGO is demonstrated by applying it to annotation of anonymous protein sequence. The extended UTMGO uses the Gene Ontology terms together with protein sequences associated with the terms to perform the annotation task. GOPET, GOtcha, GoFigure, and JAFA are used to compare the performance of the extended UTMGO.

Keywords: Anonymous protein sequence, Gene Ontology, Protein sequence annotation, Protein sequence alignment

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1419
587 Incremental Mining of Shocking Association Patterns

Authors: Eiad Yafi, Ahmed Sultan Al-Hegami, M. A. Alam, Ranjit Biswas

Abstract:

Association rules are an important problem in data mining. Massively increasing volume of data in real life databases has motivated researchers to design novel and incremental algorithms for association rules mining. In this paper, we propose an incremental association rules mining algorithm that integrates shocking interestingness criterion during the process of building the model. A new interesting measure called shocking measure is introduced. One of the main features of the proposed approach is to capture the user background knowledge, which is monotonically augmented. The incremental model that reflects the changing data and the user beliefs is attractive in order to make the over all KDD process more effective and efficient. We implemented the proposed approach and experiment it with some public datasets and found the results quite promising.

Keywords: Knowledge discovery in databases (KDD), Data mining, Incremental Association rules, Domain knowledge, Interestingness, Shocking rules (SHR).

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1848
586 Applying Spanning Tree Graph Theory for Automatic Database Normalization

Authors: Chetneti Srisa-an

Abstract:

In Knowledge and Data Engineering field, relational database is the best repository to store data in a real world. It has been using around the world more than eight decades. Normalization is the most important process for the analysis and design of relational databases. It aims at creating a set of relational tables with minimum data redundancy that preserve consistency and facilitate correct insertion, deletion, and modification. Normalization is a major task in the design of relational databases. Despite its importance, very few algorithms have been developed to be used in the design of commercial automatic normalization tools. It is also rare technique to do it automatically rather manually. Moreover, for a large and complex database as of now, it make even harder to do it manually. This paper presents a new complete automated relational database normalization method. It produces the directed graph and spanning tree, first. It then proceeds with generating the 2NF, 3NF and also BCNF normal forms. The benefit of this new algorithm is that it can cope with a large set of complex function dependencies.

Keywords: Relational Database, Functional Dependency, Automatic Normalization, Primary Key, Spanning tree.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2843
585 Multiple-Level Sequential Pattern Discovery from Customer Transaction Databases

Authors: An Chen, Huilin Ye

Abstract:

Mining sequential patterns from large customer transaction databases has been recognized as a key research topic in database systems. However, the previous works more focused on mining sequential patterns at a single concept level. In this study, we introduced concept hierarchies into this problem and present several algorithms for discovering multiple-level sequential patterns based on the hierarchies. An experiment was conducted to assess the performance of the proposed algorithms. The performances of the algorithms were measured by the relative time spent on completing the mining tasks on two different datasets. The experimental results showed that the performance depends on the characteristics of the datasets and the pre-defined threshold of minimal support for each level of the concept hierarchy. Based on the experimental results, some suggestions were also given for how to select appropriate algorithm for a certain datasets.

Keywords: Data Mining, Multiple-Level Sequential Pattern, Concept Hierarchy, Customer Transaction Database.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1434
584 A General Framework for Modeling Replicated Real-Time Database

Authors: Hala Abdel hameed, Hazem M. El-Bakry, Torky Sultan

Abstract:

There are many issues that affect modeling and designing real-time databases. One of those issues is maintaining consistency between the actual state of the real-time object of the external environment and its images as reflected by all its replicas distributed over multiple nodes. The need to improve the scalability is another important issue. In this paper, we present a general framework to design a replicated real-time database for small to medium scale systems and maintain all timing constrains. In order to extend the idea for modeling a large scale database, we present a general outline that consider improving the scalability by using an existing static segmentation algorithm applied on the whole database, with the intent to lower the degree of replication, enables segments to have individual degrees of replication with the purpose of avoiding excessive resource usage, which all together contribute in solving the scalability problem for DRTDBS.

Keywords: Database modeling, Distributed database, Real time databases, Replication

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1344
583 A Hybrid Approach for Quantification of Novelty in Rule Discovery

Authors: Vasudha Bhatnagar, Ahmed Sultan Al-Hegami, Naveen Kumar

Abstract:

Rule Discovery is an important technique for mining knowledge from large databases. Use of objective measures for discovering interesting rules lead to another data mining problem, although of reduced complexity. Data mining researchers have studied subjective measures of interestingness to reduce the volume of discovered rules to ultimately improve the overall efficiency of KDD process. In this paper we study novelty of the discovered rules as a subjective measure of interestingness. We propose a hybrid approach that uses objective and subjective measures to quantify novelty of the discovered rules in terms of their deviations from the known rules. We analyze the types of deviation that can arise between two rules and categorize the discovered rules according to the user specified threshold. We implement the proposed framework and experiment with some public datasets. The experimental results are quite promising.

Keywords: Knowledge Discovery in Databases (KDD), Data Mining, Rule Discovery, Interestingness, Subjective Measures, Novelty Measure.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1334
582 Mining Sequential Patterns Using Hybrid Evolutionary Algorithm

Authors: Mourad Ykhlef, Hebah ElGibreen

Abstract:

Mining Sequential Patterns in large databases has become an important data mining task with broad applications. It is an important task in data mining field, which describes potential sequenced relationships among items in a database. There are many different algorithms introduced for this task. Conventional algorithms can find the exact optimal Sequential Pattern rule but it takes a long time, particularly when they are applied on large databases. Nowadays, some evolutionary algorithms, such as Particle Swarm Optimization and Genetic Algorithm, were proposed and have been applied to solve this problem. This paper will introduce a new kind of hybrid evolutionary algorithm that combines Genetic Algorithm (GA) with Particle Swarm Optimization (PSO) to mine Sequential Pattern, in order to improve the speed of evolutionary algorithms convergence. This algorithm is referred to as SP-GAPSO.

Keywords: Genetic Algorithm, Hybrid Evolutionary Algorithm, Particle Swarm Optimization algorithm, Sequential Pattern mining.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2005
581 Deep Web Content Mining

Authors: Shohreh Ajoudanian, Mohammad Davarpanah Jazi

Abstract:

The rapid expansion of the web is causing the constant growth of information, leading to several problems such as increased difficulty of extracting potentially useful knowledge. Web content mining confronts this problem gathering explicit information from different web sites for its access and knowledge discovery. Query interfaces of web databases share common building blocks. After extracting information with parsing approach, we use a new data mining algorithm to match a large number of schemas in databases at a time. Using this algorithm increases the speed of information matching. In addition, instead of simple 1:1 matching, they do complex (m:n) matching between query interfaces. In this paper we present a novel correlation mining algorithm that matches correlated attributes with smaller cost. This algorithm uses Jaccard measure to distinguish positive and negative correlated attributes. After that, system matches the user query with different query interfaces in special domain and finally chooses the nearest query interface with user query to answer to it.

Keywords: Content mining, complex matching, correlation mining, information extraction.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2260
580 Methods for Data Selection in Medical Databases: The Binary Logistic Regression -Relations with the Calculated Risks

Authors: Cristina G. Dascalu, Elena Mihaela Carausu, Daniela Manuc

Abstract:

The medical studies often require different methods for parameters selection, as a second step of processing, after the database-s designing and filling with information. One common task is the selection of fields that act as risk factors using wellknown methods, in order to find the most relevant risk factors and to establish a possible hierarchy between them. Different methods are available in this purpose, one of the most known being the binary logistic regression. We will present the mathematical principles of this method and a practical example of using it in the analysis of the influence of 10 different psychiatric diagnostics over 4 different types of offences (in a database made from 289 psychiatric patients involved in different types of offences). Finally, we will make some observations about the relation between the risk factors hierarchy established through binary logistic regression and the individual risks, as well as the results of Chi-squared test. We will show that the hierarchy built using the binary logistic regression doesn-t agree with the direct order of risk factors, even if it was naturally to assume this hypothesis as being always true.

Keywords: Databases, risk factors, binary logisticregression, hierarchy.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1306
579 Rhetorical Communication in the CogSci Discourse Community: The Cognitive Neurosciences (2004) in the Context of Scientific Dissemination

Authors: Lucia Abbamonte, Olimpia Matarazzo

Abstract:

In recent years linguistic research has turned increasing attention to covert/overt strategies to modulate authorial stance and positioning in scientific texts, and to the recipients' response. This study discussed some theoretical implications of the use of rhetoric in scientific communication and analysed qualitative data from the authoritative The Cognitive Neurosciences III (2004) volume. Its genre-identity, status and readability were considered, in the social interactive context of contemporary disciplinary discourses – in their polyphony of traditional and new, emerging genres. Evidence was given of the ways its famous authors negotiate and shape knowledge and research results – explicitly appraising team work and promoting faith in the fast-paced progress of Cognitive Neuroscience, also through experiential metaphors – by presenting a set of examples, ordered according to their dominant rhetorical quality.

Keywords: Appraisal, disciplinary discourses, experientialmetaphors, genre, identity, knowledge, readability, rhetoric, strategies, theoretical implications.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1365