Search results for: data discovery
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 24519

Search results for: data discovery

24519 Research of Data Cleaning Methods Based on Dependency Rules

Authors: Yang Bao, Shi Wei Deng, WangQun Lin

Abstract:

This paper introduces the concept and principle of data cleaning, analyzes the types and causes of dirty data, and proposes several key steps of typical cleaning process, puts forward a well scalability and versatility data cleaning framework, in view of data with attribute dependency relation, designs several of violation data discovery algorithms by formal formula, which can obtain inconsistent data to all target columns with condition attribute dependent no matter data is structured (SQL) or unstructured (NoSQL), and gives 6 data cleaning methods based on these algorithms.

Keywords: data cleaning, dependency rules, violation data discovery, data repair

Procedia PDF Downloads 533
24518 Knowledge Discovery from Production Databases for Hierarchical Process Control

Authors: Pavol Tanuska, Pavel Vazan, Michal Kebisek, Dominika Jurovata

Abstract:

The paper gives the results of the project that was oriented on the usage of knowledge discoveries from production systems for needs of the hierarchical process control. One of the main project goals was the proposal of knowledge discovery model for process control. Specifics data mining methods and techniques was used for defined problems of the process control. The gained knowledge was used on the real production system, thus, the proposed solution has been verified. The paper documents how it is possible to apply new discovery knowledge to be used in the real hierarchical process control. There are specified the opportunities for application of the proposed knowledge discovery model for hierarchical process control.

Keywords: hierarchical process control, knowledge discovery from databases, neural network, process control

Procedia PDF Downloads 450
24517 Application of Data Mining Techniques for Tourism Knowledge Discovery

Authors: Teklu Urgessa, Wookjae Maeng, Joong Seek Lee

Abstract:

Application of five implementations of three data mining classification techniques was experimented for extracting important insights from tourism data. The aim was to find out the best performing algorithm among the compared ones for tourism knowledge discovery. Knowledge discovery process from data was used as a process model. 10-fold cross validation method is used for testing purpose. Various data preprocessing activities were performed to get the final dataset for model building. Classification models of the selected algorithms were built with different scenarios on the preprocessed dataset. The outperformed algorithm tourism dataset was Random Forest (76%) before applying information gain based attribute selection and J48 (C4.5) (75%) after selection of top relevant attributes to the class (target) attribute. In terms of time for model building, attribute selection improves the efficiency of all algorithms. Artificial Neural Network (multilayer perceptron) showed the highest improvement (90%). The rules extracted from the decision tree model are presented, which showed intricate, non-trivial knowledge/insight that would otherwise not be discovered by simple statistical analysis with mediocre accuracy of the machine using classification algorithms.

Keywords: classification algorithms, data mining, knowledge discovery, tourism

Procedia PDF Downloads 267
24516 Knowledge-Driven Decision Support System Based on Knowledge Warehouse and Data Mining by Improving Apriori Algorithm with Fuzzy Logic

Authors: Pejman Hosseinioun, Hasan Shakeri, Ghasem Ghorbanirostam

Abstract:

In recent years, we have seen an increasing importance of research and study on knowledge source, decision support systems, data mining and procedure of knowledge discovery in data bases and it is considered that each of these aspects affects the others. In this article, we have merged information source and knowledge source to suggest a knowledge based system within limits of management based on storing and restoring of knowledge to manage information and improve decision making and resources. In this article, we have used method of data mining and Apriori algorithm in procedure of knowledge discovery one of the problems of Apriori algorithm is that, a user should specify the minimum threshold for supporting the regularity. Imagine that a user wants to apply Apriori algorithm for a database with millions of transactions. Definitely, the user does not have necessary knowledge of all existing transactions in that database, and therefore cannot specify a suitable threshold. Our purpose in this article is to improve Apriori algorithm. To achieve our goal, we tried using fuzzy logic to put data in different clusters before applying the Apriori algorithm for existing data in the database and we also try to suggest the most suitable threshold to the user automatically.

Keywords: decision support system, data mining, knowledge discovery, data discovery, fuzzy logic

Procedia PDF Downloads 306
24515 Data Mining As A Tool For Knowledge Management: A Review

Authors: Maram Saleh

Abstract:

Knowledge has become an essential resource in today’s economy and become the most important asset of maintaining competition advantage in organizations. The importance of knowledge has made organizations to manage their knowledge assets and resources through all multiple knowledge management stages such as: Knowledge Creation, knowledge storage, knowledge sharing and knowledge use. Researches on data mining are continues growing over recent years on both business and educational fields. Data mining is one of the most important steps of the knowledge discovery in databases process aiming to extract implicit, unknown but useful knowledge and it is considered as significant subfield in knowledge management. Data miming have the great potential to help organizations to focus on extracting the most important information on their data warehouses. Data mining tools and techniques can predict future trends and behaviors, allowing businesses to make proactive, knowledge-driven decisions. This review paper explores the applications of data mining techniques in supporting knowledge management process as an effective knowledge discovery technique. In this paper, we identify the relationship between data mining and knowledge management, and then focus on introducing some application of date mining techniques in knowledge management for some real life domains.

Keywords: Data Mining, Knowledge management, Knowledge discovery, Knowledge creation.

Procedia PDF Downloads 179
24514 Algorithms used in Spatial Data Mining GIS

Authors: Vahid Bairami Rad

Abstract:

Extracting knowledge from spatial data like GIS data is important to reduce the data and extract information. Therefore, the development of new techniques and tools that support the human in transforming data into useful knowledge has been the focus of the relatively new and interdisciplinary research area ‘knowledge discovery in databases’. Thus, we introduce a set of database primitives or basic operations for spatial data mining which are sufficient to express most of the spatial data mining algorithms from the literature. This approach has several advantages. Similar to the relational standard language SQL, the use of standard primitives will speed-up the development of new data mining algorithms and will also make them more portable. We introduced a database-oriented framework for spatial data mining which is based on the concepts of neighborhood graphs and paths. A small set of basic operations on these graphs and paths were defined as database primitives for spatial data mining. Furthermore, techniques to efficiently support the database primitives by a commercial DBMS were presented.

Keywords: spatial data base, knowledge discovery database, data mining, spatial relationship, predictive data mining

Procedia PDF Downloads 425
24513 Review and Comparison of Associative Classification Data Mining Approaches

Authors: Suzan Wedyan

Abstract:

Data mining is one of the main phases in the Knowledge Discovery Database (KDD) which is responsible of finding hidden and useful knowledge from databases. There are many different tasks for data mining including regression, pattern recognition, clustering, classification, and association rule. In recent years a promising data mining approach called associative classification (AC) has been proposed, AC integrates classification and association rule discovery to build classification models (classifiers). This paper surveys and critically compares several AC algorithms with reference of the different procedures are used in each algorithm, such as rule learning, rule sorting, rule pruning, classifier building, and class allocation for test cases.

Keywords: associative classification, classification, data mining, learning, rule ranking, rule pruning, prediction

Procedia PDF Downloads 507
24512 Signal Strength Based Multipath Routing for Mobile Ad Hoc Networks

Authors: Chothmal

Abstract:

In this paper, we present a route discovery process which uses the signal strength on a link as a parameter of its inclusion in the route discovery method. The proposed signal-to-interference and noise ratio (SINR) based multipath reactive routing protocol is named as SINR-MP protocol. The proposed SINR-MP routing protocols has two following two features: a) SINR-MP protocol selects routes based on the SINR of the links during the route discovery process therefore it select the routes which has long lifetime and low frame error rate for data transmission, and b) SINR-MP protocols route discovery process is multipath which discovers more than one SINR based route between a given source destination pair. The multiple routes selected by our SINR-MP protocol are node-disjoint in nature which increases their robustness against link failures, as failure of one route will not affect the other route. The secondary route is very useful in situations where the primary route is broken because we can now use the secondary route without causing a new route discovery process. Due to this, the network overhead caused by a route discovery process is avoided. This increases the network performance greatly. The proposed SINR-MP routing protocol is implemented in the trail version of network simulator called Qualnet.

Keywords: ad hoc networks, quality of service, video streaming, H.264/SVC, multiple routes, video traces

Procedia PDF Downloads 218
24511 Network Word Discovery Framework Based on Sentence Semantic Vector Similarity

Authors: Ganfeng Yu, Yuefeng Ma, Shanliang Yang

Abstract:

The word discovery is a key problem in text information retrieval technology. Methods in new word discovery tend to be closely related to words because they generally obtain new word results by analyzing words. With the popularity of social networks, individual netizens and online self-media have generated various network texts for the convenience of online life, including network words that are far from standard Chinese expression. How detect network words is one of the important goals in the field of text information retrieval today. In this paper, we integrate the word embedding model and clustering methods to propose a network word discovery framework based on sentence semantic similarity (S³-NWD) to detect network words effectively from the corpus. This framework constructs sentence semantic vectors through a distributed representation model, uses the similarity of sentence semantic vectors to determine the semantic relationship between sentences, and finally realizes network word discovery by the meaning of semantic replacement between sentences. The experiment verifies that the framework not only completes the rapid discovery of network words but also realizes the standard word meaning of the discovery of network words, which reflects the effectiveness of our work.

Keywords: text information retrieval, natural language processing, new word discovery, information extraction

Procedia PDF Downloads 61
24510 CERD: Cost Effective Route Discovery in Mobile Ad Hoc Networks

Authors: Anuradha Banerjee

Abstract:

A mobile ad hoc network is an infrastructure less network, where nodes are free to move independently in any direction. The nodes have limited battery power; hence, we require energy efficient route discovery technique to enhance their lifetime and network performance. In this paper, we propose an energy-efficient route discovery technique CERD that greatly reduces the number of route requests flooded into the network and also gives priority to the route request packets sent from the routers that has communicated with the destination very recently, in single or multi-hop paths. This does not only enhance the lifetime of nodes but also decreases the delay in tracking the destination.

Keywords: ad hoc network, energy efficiency, flooding, node lifetime, route discovery

Procedia PDF Downloads 317
24509 A Review on Existing Challenges of Data Mining and Future Research Perspectives

Authors: Hema Bhardwaj, D. Srinivasa Rao

Abstract:

Technology for analysing, processing, and extracting meaningful data from enormous and complicated datasets can be termed as "big data." The technique of big data mining and big data analysis is extremely helpful for business movements such as making decisions, building organisational plans, researching the market efficiently, improving sales, etc., because typical management tools cannot handle such complicated datasets. Special computational and statistical issues, such as measurement errors, noise accumulation, spurious correlation, and storage and scalability limitations, are brought on by big data. These unique problems call for new computational and statistical paradigms. This research paper offers an overview of the literature on big data mining, its process, along with problems and difficulties, with a focus on the unique characteristics of big data. Organizations have several difficulties when undertaking data mining, which has an impact on their decision-making. Every day, terabytes of data are produced, yet only around 1% of that data is really analyzed. The idea of the mining and analysis of data and knowledge discovery techniques that have recently been created with practical application systems is presented in this study. This article's conclusion also includes a list of issues and difficulties for further research in the area. The report discusses the management's main big data and data mining challenges.

Keywords: big data, data mining, data analysis, knowledge discovery techniques, data mining challenges

Procedia PDF Downloads 80
24508 Recent Advances in Data Warehouse

Authors: Fahad Hanash Alzahrani

Abstract:

This paper describes some recent advances in a quickly developing area of data storing and processing based on Data Warehouses and Data Mining techniques, which are associated with software, hardware, data mining algorithms and visualisation techniques having common features for any specific problems and tasks of their implementation.

Keywords: data warehouse, data mining, knowledge discovery in databases, on-line analytical processing

Procedia PDF Downloads 367
24507 Proposing an Architecture for Drug Response Prediction by Integrating Multiomics Data and Utilizing Graph Transformers

Authors: Nishank Raisinghani

Abstract:

Efficiently predicting drug response remains a challenge in the realm of drug discovery. To address this issue, we propose four model architectures that combine graphical representation with varying positions of multiheaded self-attention mechanisms. By leveraging two types of multi-omics data, transcriptomics and genomics, we create a comprehensive representation of target cells and enable drug response prediction in precision medicine. A majority of our architectures utilize multiple transformer models, one with a graph attention mechanism and the other with a multiheaded self-attention mechanism, to generate latent representations of both drug and omics data, respectively. Our model architectures apply an attention mechanism to both drug and multiomics data, with the goal of procuring more comprehensive latent representations. The latent representations are then concatenated and input into a fully connected network to predict the IC-50 score, a measure of cell drug response. We experiment with all four of these architectures and extract results from all of them. Our study greatly contributes to the future of drug discovery and precision medicine by looking to optimize the time and accuracy of drug response prediction.

Keywords: drug discovery, transformers, graph neural networks, multiomics

Procedia PDF Downloads 112
24506 Study of Evaluation Model Based on Information System Success Model and Flow Theory Using Web-scale Discovery System

Authors: June-Jei Kuo, Yi-Chuan Hsieh

Abstract:

Because of the rapid growth of information technology, more and more libraries introduce the new information retrieval systems to enhance the users’ experience, improve the retrieval efficiency, and increase the applicability of the library resources. Nevertheless, few of them are discussed the usability from the users’ aspect. The aims of this study are to understand that the scenario of the information retrieval system utilization, and to know why users are willing to continuously use the web-scale discovery system to improve the web-scale discovery system and promote their use of university libraries. Besides of questionnaires, observations and interviews, this study employs both Information System Success Model introduced by DeLone and McLean in 2003 and the flow theory to evaluate the system quality, information quality, service quality, use, user satisfaction, flow, and continuing to use web-scale discovery system of students from National Chung Hsing University. Then, the results are analyzed through descriptive statistics and structural equation modeling using AMOS. The results reveal that in web-scale discovery system, the user’s evaluation of system quality, information quality, and service quality is positively related to the use and satisfaction; however, the service quality only affects user satisfaction. User satisfaction and the flow show a significant impact on continuing to use. Moreover, user satisfaction has a significant impact on user flow. According to the results of this study, to maintain the stability of the information retrieval system, to improve the information content quality, and to enhance the relationship between subject librarians and students are recommended for the academic libraries. Meanwhile, to improve the system user interface, to minimize layer from system-level, to strengthen the data accuracy and relevance, to modify the sorting criteria of the data, and to support the auto-correct function are required for system provider. Finally, to establish better communication with librariana commended for all users.

Keywords: web-scale discovery system, discovery system, information system success model, flow theory, academic library

Procedia PDF Downloads 73
24505 Intuitional Insight in Islamic Mysticism

Authors: Maryam Bakhtyar, Pegah Akrami

Abstract:

Intuitional insight or mystical cognition is a different insight from common, concrete and intellectual insights. This kind of insight is not achieved by visionary contemplation but by the recitation of God, self-purification, and mystical life. In this insight, there is no distance or medium between the subject of cognition and its object, and they have a sort of unification, unison, and incorporation. As a result, knowledgeable consider this insight as direct, immediate, and personal. The goal of this insight is God, cosmos’ creatures, and the general inner and hidden aspect of the world that is nothing except God’s manifestations in the view of mystics. AS our common cognitions have diversity and stages, intuitional insight also has diversity and levels. As our senses are divided into concrete and rational, mystical discovery is divided into superficial discovery and spiritual one. Based on Islamic mystics, the preferable way to know God and believe in him is intuitional insight. There are two important criteria for evaluating mystical intuition, especially for beginner mystics of intellect and revelation. Indeed, the conclusion and a brief evaluation of Islamic mystics’ viewpoint is the main subject of this paper.

Keywords: intuition, discovery, mystical insight, personal knowledge, superficial discovery, spiritual discovery

Procedia PDF Downloads 65
24504 Application of Knowledge Discovery in Database Techniques in Cost Overruns of Construction Projects

Authors: Mai Ghazal, Ahmed Hammad

Abstract:

Cost overruns in construction projects are considered as worldwide challenges since the cost performance is one of the main measures of success along with schedule performance. To overcome this problem, studies were conducted to investigate the cost overruns' factors, also projects' historical data were analyzed to extract new and useful knowledge from it. This research is studying and analyzing the effect of some factors causing cost overruns using the historical data from completed construction projects. Then, using these factors to estimate the probability of cost overrun occurrence and predict its percentage for future projects. First, an intensive literature review was done to study all the factors that cause cost overrun in construction projects, then another review was done for previous researcher papers about mining process in dealing with cost overruns. Second, a proposed data warehouse was structured which can be used by organizations to store their future data in a well-organized way so it can be easily analyzed later. Third twelve quantitative factors which their data are frequently available at construction projects were selected to be the analyzed factors and suggested predictors for the proposed model.

Keywords: construction management, construction projects, cost overrun, cost performance, data mining, data warehousing, knowledge discovery, knowledge management

Procedia PDF Downloads 333
24503 Tuberculosis Massive Active Case Discovery in East Jakarta 2016-2017: The Role of Ketuk Pintu Layani Dengan Hati and Juru Pemantau Batuk (Jumantuk) Cadre Programs

Authors: Ngabilas Salama

Abstract:

Background: Indonesia has the 2nd highest number of incidents of tuberculosis (TB). It accounts for 1.020.000 new cases per year, only 30% of which has been reported. To find the lost 70%, a massive active case discovery was conducted through two programs: Ketuk Pintu Layani Dengan Hati (KPLDH) and Kader Juru Pemantau Batuk (Jumantuk cadres), who also plays a role in child TB screening. Methods: Data was collected and analyzed through Tuberculosis Integrated Online System from 2014 to 2017 involving 129 DOTS facility with 86 primary health centers in East Jakarta. Results: East Jakarta consists of 2.900.722 people. KPLDH program started in February 2016 consisting of 84 teams (310 people). Jumantuk cadres was formed 4 months later (218 orang). The number of new TB cases in East Jakarta (primary health center) from 2014 to June 2017 respectively is as follows: 6.499 (2.637), 7.438 (2.651), 8.948 (3.211), 5.701 (1.830). Meanwhile, the percentage of child TB case discovery in primary health center was 8,5%, 9,8%, 12,1% from 2014 to 2016 respectively. In 2017, child TB case discovery was 13,1% for the first 3 months and 16,5% for the next 3 months. Discussion: Increased TB incidence rate from 2014 to 2017 was 14,4%, 20,3%, and 27,4% respectively in East Jakarta, and 0,5%, 21,1%, and 14% in primary health center. This reveals the positive role of KPLDH and Jumantuk in TB detection and reporting. Likewise, these programs were responsible for the increase in child TB case discovery, especially in the first 3 months of 2017 (Ketuk Pintu TB Day program) and the next 3 months (active TB screening). Conclusion: KPLDH dan Jumantuk are actively involved in increasing TB case discovery in both adults and children.

Keywords: tuberculosis, case discovery program, primary health center, cadre

Procedia PDF Downloads 293
24502 Ontology-Driven Knowledge Discovery and Validation from Admission Databases: A Structural Causal Model Approach for Polytechnic Education in Nigeria

Authors: Bernard Igoche Igoche, Olumuyiwa Matthew, Peter Bednar, Alexander Gegov

Abstract:

This study presents an ontology-driven approach for knowledge discovery and validation from admission databases in Nigerian polytechnic institutions. The research aims to address the challenges of extracting meaningful insights from vast amounts of admission data and utilizing them for decision-making and process improvement. The proposed methodology combines the knowledge discovery in databases (KDD) process with a structural causal model (SCM) ontological framework. The admission database of Benue State Polytechnic Ugbokolo (Benpoly) is used as a case study. The KDD process is employed to mine and distill knowledge from the database, while the SCM ontology is designed to identify and validate the important features of the admission process. The SCM validation is performed using the conditional independence test (CIT) criteria, and an algorithm is developed to implement the validation process. The identified features are then used for machine learning (ML) modeling and prediction of admission status. The results demonstrate the adequacy of the SCM ontological framework in representing the admission process and the high predictive accuracies achieved by the ML models, with k-nearest neighbors (KNN) and support vector machine (SVM) achieving 92% accuracy. The study concludes that the proposed ontology-driven approach contributes to the advancement of educational data mining and provides a foundation for future research in this domain.

Keywords: admission databases, educational data mining, machine learning, ontology-driven knowledge discovery, polytechnic education, structural causal model

Procedia PDF Downloads 23
24501 Machine Learning Methods for Network Intrusion Detection

Authors: Mouhammad Alkasassbeh, Mohammad Almseidin

Abstract:

Network security engineers work to keep services available all the time by handling intruder attacks. Intrusion Detection System (IDS) is one of the obtainable mechanisms that is used to sense and classify any abnormal actions. Therefore, the IDS must be always up to date with the latest intruder attacks signatures to preserve confidentiality, integrity, and availability of the services. The speed of the IDS is a very important issue as well learning the new attacks. This research work illustrates how the Knowledge Discovery and Data Mining (or Knowledge Discovery in Databases) KDD dataset is very handy for testing and evaluating different Machine Learning Techniques. It mainly focuses on the KDD preprocess part in order to prepare a decent and fair experimental data set. The J48, MLP, and Bayes Network classifiers have been chosen for this study. It has been proven that the J48 classifier has achieved the highest accuracy rate for detecting and classifying all KDD dataset attacks, which are of type DOS, R2L, U2R, and PROBE.

Keywords: IDS, DDoS, MLP, KDD

Procedia PDF Downloads 202
24500 Lightweight Cryptographically Generated Address for IPv6 Neighbor Discovery

Authors: Amjed Sid Ahmed, Rosilah Hassan, Nor Effendy Othman

Abstract:

Limited functioning of the Internet Protocol version 4 (IPv4) has necessitated the development of the Internetworking Protocol next generation (IPng) to curb the challenges. Indeed, the IPng is also referred to as the Internet Protocol version 6 (IPv6) and includes the Neighbor Discovery Protocol (NDP). The latter performs the role of Address Auto-configuration, Router Discovery (RD), and Neighbor Discovery (ND). Furthermore, the role of the NDP entails redirecting the service, detecting the duplicate address, and detecting the unreachable services. Despite the fact that there is an NDP’s assumption regarding the existence of trust the links’ nodes, several crucial attacks may affect the Protocol. Internet Engineering Task Force (IETF) therefore has recommended implementation of Secure Neighbor Discovery Protocol (SEND) to tackle safety issues in NDP. The SEND protocol is mainly used for validation of address rights, malicious response inhibiting techniques and finally router certification procedures. For routine running of these tasks, SEND utilizes on the following options, Cryptographically Generated Address (CGA), RSA Signature, Nonce and Timestamp option. CGA is produced at extra high costs making it the most notable disadvantage of SEND. In this paper a clear description of the constituents of CGA, its operation and also recommendations for improvements in its generation are given.

Keywords: CGA, IPv6, NDP, SEND

Procedia PDF Downloads 363
24499 Classification Rule Discovery by Using Parallel Ant Colony Optimization

Authors: Waseem Shahzad, Ayesha Tahir Khan, Hamid Hussain Awan

Abstract:

Ant-Miner algorithm that lies under ACO algorithms is used to extract knowledge from data in the form of rules. A variant of Ant-Miner algorithm named as cAnt-MinerPB is used to generate list of rules using pittsburgh approach in order to maintain the rule interaction among the rules that are generated. In this paper, we propose a parallel Ant MinerPB in which Ant colony optimization algorithm runs parallel. In this technique, a data set is divided vertically (i-e attributes) into different subsets. These subsets are created based on the correlation among attributes using Mutual Information (MI). It generates rules in a parallel manner and then merged to form a final list of rules. The results have shown that the proposed technique achieved higher accuracy when compared with original cAnt-MinerPB and also the execution time has also reduced.

Keywords: ant colony optimization, parallel Ant-MinerPB, vertical partitioning, classification rule discovery

Procedia PDF Downloads 268
24498 A General Framework for Knowledge Discovery from Echocardiographic and Natural Images

Authors: S. Nandagopalan, N. Pradeep

Abstract:

The aim of this paper is to propose a general framework for storing, analyzing, and extracting knowledge from two-dimensional echocardiographic images, color Doppler images, non-medical images, and general data sets. A number of high performance data mining algorithms have been used to carry out this task. Our framework encompasses four layers namely physical storage, object identification, knowledge discovery, user level. Techniques such as active contour model to identify the cardiac chambers, pixel classification to segment the color Doppler echo image, universal model for image retrieval, Bayesian method for classification, parallel algorithms for image segmentation, etc., were employed. Using the feature vector database that have been efficiently constructed, one can perform various data mining tasks like clustering, classification, etc. with efficient algorithms along with image mining given a query image. All these facilities are included in the framework that is supported by state-of-the-art user interface (UI). The algorithms were tested with actual patient data and Coral image database and the results show that their performance is better than the results reported already.

Keywords: active contour, Bayesian, echocardiographic image, feature vector

Procedia PDF Downloads 415
24497 A General Framework for Knowledge Discovery Using High Performance Machine Learning Algorithms

Authors: S. Nandagopalan, N. Pradeep

Abstract:

The aim of this paper is to propose a general framework for storing, analyzing, and extracting knowledge from two-dimensional echocardiographic images, color Doppler images, non-medical images, and general data sets. A number of high performance data mining algorithms have been used to carry out this task. Our framework encompasses four layers namely physical storage, object identification, knowledge discovery, user level. Techniques such as active contour model to identify the cardiac chambers, pixel classification to segment the color Doppler echo image, universal model for image retrieval, Bayesian method for classification, parallel algorithms for image segmentation, etc., were employed. Using the feature vector database that have been efficiently constructed, one can perform various data mining tasks like clustering, classification, etc. with efficient algorithms along with image mining given a query image. All these facilities are included in the framework that is supported by state-of-the-art user interface (UI). The algorithms were tested with actual patient data and Coral image database and the results show that their performance is better than the results reported already.

Keywords: active contour, bayesian, echocardiographic image, feature vector

Procedia PDF Downloads 389
24496 Research on Fuzzy Test Framework Based on Concolic Execution

Authors: Xiong Xie, Yuhang Chen

Abstract:

Vulnerability discovery technology is a significant field of the current. In this paper, a fuzzy framework based on concolic execution has been proposed. Fuzzy test and symbolic execution are widely used in the field of vulnerability discovery technology. But each of them has its own advantages and disadvantages. During the path generation stage, path traversal algorithm based on generation is used to get more accurate path. During the constraint solving stage, dynamic concolic execution is used to avoid the path explosion. If there is external call, the concolic based on function summary is used. Experiments show that the framework can effectively improve the ability of triggering vulnerabilities and code coverage.

Keywords: concolic execution, constraint solving, fuzzy test, vulnerability discovery

Procedia PDF Downloads 198
24495 Unseen Classes: The Paradigm Shift in Machine Learning

Authors: Vani Singhal, Jitendra Parmar, Satyendra Singh Chouhan

Abstract:

Unseen class discovery has now become an important part of a machine-learning algorithm to judge new classes. Unseen classes are the classes on which the machine learning model is not trained on. With the advancement in technology and AI replacing humans, the amount of data has increased to the next level. So while implementing a model on real-world examples, we come across unseen new classes. Our aim is to find the number of unseen classes by using a hierarchical-based active learning algorithm. The algorithm is based on hierarchical clustering as well as active sampling. The number of clusters that we will get in the end will give the number of unseen classes. The total clusters will also contain some clusters that have unseen classes. Instead of first discovering unseen classes and then finding their number, we directly calculated the number by applying the algorithm. The dataset used is for intent classification. The target data is the intent of the corresponding query. We conclude that when the machine learning model will encounter real-world data, it will automatically find the number of unseen classes. In the future, our next work would be to label these unseen classes correctly.

Keywords: active sampling, hierarchical clustering, open world learning, unseen class discovery

Procedia PDF Downloads 136
24494 Intrapreneurship Discovery: Standard Strategy to Boost Innovation inside Companies

Authors: Chiara Mansanta, Daniela Sani

Abstract:

This paper studies the concept of intrapreneurship discovery for innovation and technology development related to the manufacturing industries set up in the center of Italy, in Marche Region. The study underlined the key drivers of the innovation process and the main factors that influence innovation. Starting from a literature study on open innovation, this paper examines the role of human capital to support company’s development. The empirical part of the study is based on a survey to 151 manufacturing companies that represent the 34% of that universe at the regional level. The survey underlined the main KPI’s that influence companies in their decision processes; then tools for these decision processes are presented.

Keywords: business model, decision making, intrapreneurship discovery, standard methodology

Procedia PDF Downloads 150
24493 Pattern Discovery from Student Feedback: Identifying Factors to Improve Student Emotions in Learning

Authors: Angelina A. Tzacheva, Jaishree Ranganathan

Abstract:

Interest in (STEM) Science Technology Engineering Mathematics education especially Computer Science education has seen a drastic increase across the country. This fuels effort towards recruiting and admitting a diverse population of students. Thus the changing conditions in terms of the student population, diversity and the expected teaching and learning outcomes give the platform for use of Innovative Teaching models and technologies. It is necessary that these methods adapted should also concentrate on raising quality of such innovations and have positive impact on student learning. Light-Weight Team is an Active Learning Pedagogy, which is considered to be low-stake activity and has very little or no direct impact on student grades. Emotion plays a major role in student’s motivation to learning. In this work we use the student feedback data with emotion classification using surveys at a public research institution in the United States. We use Actionable Pattern Discovery method for this purpose. Actionable patterns are patterns that provide suggestions in the form of rules to help the user achieve better outcomes. The proposed method provides meaningful insight in terms of changes that can be incorporated in the Light-Weight team activities, resources utilized in the course. The results suggest how to enhance student emotions to a more positive state, in particular focuses on the emotions ‘Trust’ and ‘Joy’.

Keywords: actionable pattern discovery, education, emotion, data mining

Procedia PDF Downloads 63
24492 Identification of Arglecins B and C and Actinofuranosin A from a Termite Gut-Associated Streptomyces Species

Authors: Christian A. Romero, Tanja Grkovic, John. R. J. French, D. İpek Kurtböke, Ronald J. Quinn

Abstract:

A high-throughput and automated 1H NMR metabolic fingerprinting dereplication approach was used to accelerate the discovery of unknown bioactive secondary metabolites. The applied dereplication strategy accelerated the discovery of natural products, provided rapid and competent identification and quantification of the known secondary metabolites and avoided time-consuming isolation procedures. The effectiveness of the technique was demonstrated by the isolation and elucidation of arglecins B (1), C (2) and actinofuranosin A (3) from a termite-gut associated Streptomyces sp. (USC 597) grown under solid state fermentation. The structures of these compounds were elucidated by extensive interpretation of 1H, 13C and 2D NMR spectroscopic data. These represent the first report of arglecin analogs isolated from a termite gut-associated Streptomyces species.

Keywords: actinomycetes, actinofuranosin, antibiotics, arglecins, NMR spectroscopy

Procedia PDF Downloads 27
24491 Sequential Pattern Mining from Data of Medical Record with Sequential Pattern Discovery Using Equivalent Classes (SPADE) Algorithm (A Case Study : Bolo Primary Health Care, Bima)

Authors: Rezky Rifaini, Raden Bagus Fajriya Hakim

Abstract:

This research was conducted at the Bolo primary health Care in Bima Regency. The purpose of the research is to find out the association pattern that is formed of medical record database from Bolo Primary health care’s patient. The data used is secondary data from medical records database PHC. Sequential pattern mining technique is the method that used to analysis. Transaction data generated from Patient_ID, Check_Date and diagnosis. Sequential Pattern Discovery Algorithms Using Equivalent Classes (SPADE) is one of the algorithm in sequential pattern mining, this algorithm find frequent sequences of data transaction, using vertical database and sequence join process. Results of the SPADE algorithm is frequent sequences that then used to form a rule. It technique is used to find the association pattern between items combination. Based on association rules sequential analysis with SPADE algorithm for minimum support 0,03 and minimum confidence 0,75 is gotten 3 association sequential pattern based on the sequence of patient_ID, check_Date and diagnosis data in the Bolo PHC.

Keywords: diagnosis, primary health care, medical record, data mining, sequential pattern mining, SPADE algorithm

Procedia PDF Downloads 370
24490 Multi-Source Data Fusion for Urban Comprehensive Management

Authors: Bolin Hua

Abstract:

In city governance, various data are involved, including city component data, demographic data, housing data and all kinds of business data. These data reflects different aspects of people, events and activities. Data generated from various systems are different in form and data source are different because they may come from different sectors. In order to reflect one or several facets of an event or rule, data from multiple sources need fusion together. Data from different sources using different ways of collection raised several issues which need to be resolved. Problem of data fusion include data update and synchronization, data exchange and sharing, file parsing and entry, duplicate data and its comparison, resource catalogue construction. Governments adopt statistical analysis, time series analysis, extrapolation, monitoring analysis, value mining, scenario prediction in order to achieve pattern discovery, law verification, root cause analysis and public opinion monitoring. The result of Multi-source data fusion is to form a uniform central database, which includes people data, location data, object data, and institution data, business data and space data. We need to use meta data to be referred to and read when application needs to access, manipulate and display the data. A uniform meta data management ensures effectiveness and consistency of data in the process of data exchange, data modeling, data cleansing, data loading, data storing, data analysis, data search and data delivery.

Keywords: multi-source data fusion, urban comprehensive management, information fusion, government data

Procedia PDF Downloads 349