Search results for: Repository
32 Heritage Tree Expert Assessment and Classification: Malaysian Perspective
Authors: B.-Y.-S. Lau, Y.-C.-T. Jonathan, M.-S. Alias
Abstract:
Heritage trees are natural large, individual trees with exceptionally value due to association with age or event or distinguished people. In Malaysia, there is an abundance of tropical heritage trees throughout the country. It is essential to set up a repository of heritage trees to prevent valuable trees from being cut down. In this cross domain study, a web-based online expert system namely the Heritage Tree Expert Assessment and Classification (HTEAC) is developed and deployed for public to nominate potential heritage trees. Based on the nomination, tree care experts or arborists would evaluate and verify the nominated trees as heritage trees. The expert system automatically rates the approved heritage trees according to pre-defined grades via Delphi technique. Features and usability test of the expert system are presented. Preliminary result is promising for the system to be used as a full scale public system.Keywords: Arboriculture, Delphi, expert system, heritage tree, urban forestry.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 143031 Development of Knowledge Portal using Open Source Tools: A Case Study of FIIT, UNISEL
Authors: Nur Razia Mohd Suradi, Hema Subramaniam, Marina Hassan, Siti Fatimah Omar
Abstract:
Knowledge sharing culture contributes to a positive working environment. Currently, there is no platform for the Faculty of Industrial Information Technology (FIIT), Unisel academic staff to share knowledge among them. As it is done manually, the sharing process is through common meeting or by any offline discussions. There is no repository for future retrieval. However, with open source solution the development of knowledge based application may reduce the cost tremendously. In this paper we discuss about the domain on which this knowledge portal is being developed and also the deployment of open source tools such as JOOMLA, PHP programming language and MySQL. This knowledge portal is evidence that open source tools also reliable in developing knowledge based portal. These recommendations will be useful to the open source community to produce more open source products in future.Keywords: Knowledge management, Portal, ContentManagement, JOOMLA.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 249330 Modeling of Reusability of Object Oriented Software System
Authors: Parvinder S. Sandhu, Harpreet Kaur, Amanpreet Singh
Abstract:
Automatic reusability appraisal is helpful in evaluating the quality of developed or developing reusable software components and in identification of reusable components from existing legacy systems; that can save cost of developing the software from scratch. But the issue of how to identify reusable components from existing systems has remained relatively unexplored. In this research work, structural attributes of software components are explored using software metrics and quality of the software is inferred by different Neural Network based approaches, taking the metric values as input. The calculated reusability value enables to identify a good quality code automatically. It is found that the reusability value determined is close to the manual analysis used to be performed by the programmers or repository managers. So, the developed system can be used to enhance the productivity and quality of software development.Keywords: Neural Network, Software Reusability, Software Metric, Accuracy, MAE, RMSE.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 208029 Policy Management Framework for Managing Enterprise Policies
Authors: Dahir A. Ga'al, Wardah Zainal Abidin
Abstract:
Policy management in organizations became rising issue in the last decade. It’s because of today’s regulatory requirements in the organizations. To manage policies in large organizations is an imperative work. However, major challenges facing organizations in the last decade is managing all the policies in the organization and making them an active documents rather than simple (inactive) documents stored in computer hard drive or on a shelf. Because of this challenge, organizations need policy management program. This policy management program can be either manual or automated. This paper presents suggestions towards managing policies in organizations. As well as possible policy management solution or program to be utilized, manual or automated. The research first examines the models and frameworks used for managing policies from various perspectives in the literature of the research area/domain. At the end of this paper, a policy management framework is proposed for managing enterprise policies effectively and in a simplified manner.
Keywords: Policy, policy management, policy management program, policy repository.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 261628 Multi-Agent Systems for Intelligent Clustering
Authors: Jung-Eun Park, Kyung-Whan Oh
Abstract:
Intelligent systems are required in order to quickly and accurately analyze enormous quantities of data in the Internet environment. In intelligent systems, information extracting processes can be divided into supervised learning and unsupervised learning. This paper investigates intelligent clustering by unsupervised learning. Intelligent clustering is the clustering system which determines the clustering model for data analysis and evaluates results by itself. This system can make a clustering model more rapidly, objectively and accurately than an analyzer. The methodology for the automatic clustering intelligent system is a multi-agent system that comprises a clustering agent and a cluster performance evaluation agent. An agent exchanges information about clusters with another agent and the system determines the optimal cluster number through this information. Experiments using data sets in the UCI Machine Repository are performed in order to prove the validity of the system.
Keywords: Intelligent Clustering, Multi-Agent System, PCA, SOM, VC(Variance Criterion)
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 172627 Unsupervised Feature Selection Using Feature Density Functions
Authors: Mina Alibeigi, Sattar Hashemi, Ali Hamzeh
Abstract:
Since dealing with high dimensional data is computationally complex and sometimes even intractable, recently several feature reductions methods have been developed to reduce the dimensionality of the data in order to simplify the calculation analysis in various applications such as text categorization, signal processing, image retrieval, gene expressions and etc. Among feature reduction techniques, feature selection is one the most popular methods due to the preservation of the original features. In this paper, we propose a new unsupervised feature selection method which will remove redundant features from the original feature space by the use of probability density functions of various features. To show the effectiveness of the proposed method, popular feature selection methods have been implemented and compared. Experimental results on the several datasets derived from UCI repository database, illustrate the effectiveness of our proposed methods in comparison with the other compared methods in terms of both classification accuracy and the number of selected features.Keywords: Feature, Feature Selection, Filter, Probability Density Function
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 207626 Analysis of a Population of Diabetic Patients Databases with Classifiers
Authors: Murat Koklu, Yavuz Unal
Abstract:
Data mining can be called as a technique to extract information from data. It is the process of obtaining hidden information and then turning it into qualified knowledge by statistical and artificial intelligence technique. One of its application areas is medical area to form decision support systems for diagnosis just by inventing meaningful information from given medical data. In this study a decision support system for diagnosis of illness that make use of data mining and three different artificial intelligence classifier algorithms namely Multilayer Perceptron, Naive Bayes Classifier and J.48. Pima Indian dataset of UCI Machine Learning Repository was used. This dataset includes urinary and blood test results of 768 patients. These test results consist of 8 different feature vectors. Obtained classifying results were compared with the previous studies. The suggestions for future studies were presented.
Keywords: Artificial Intelligence, Classifiers, Data Mining, Diabetic Patients.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 543125 Clustering Protein Sequences with Tailored General Regression Model Technique
Authors: G. Lavanya Devi, Allam Appa Rao, A. Damodaram, GR Sridhar, G. Jaya Suma
Abstract:
Cluster analysis divides data into groups that are meaningful, useful, or both. Analysis of biological data is creating a new generation of epidemiologic, prognostic, diagnostic and treatment modalities. Clustering of protein sequences is one of the current research topics in the field of computer science. Linear relation is valuable in rule discovery for a given data, such as if value X goes up 1, value Y will go down 3", etc. The classical linear regression models the linear relation of two sequences perfectly. However, if we need to cluster a large repository of protein sequences into groups where sequences have strong linear relationship with each other, it is prohibitively expensive to compare sequences one by one. In this paper, we propose a new technique named General Regression Model Technique Clustering Algorithm (GRMTCA) to benignly handle the problem of linear sequences clustering. GRMT gives a measure, GR*, to tell the degree of linearity of multiple sequences without having to compare each pair of them.Keywords: Clustering, General Regression Model, Protein Sequences, Similarity Measure.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 156624 Applying Spanning Tree Graph Theory for Automatic Database Normalization
Authors: Chetneti Srisa-an
Abstract:
In Knowledge and Data Engineering field, relational database is the best repository to store data in a real world. It has been using around the world more than eight decades. Normalization is the most important process for the analysis and design of relational databases. It aims at creating a set of relational tables with minimum data redundancy that preserve consistency and facilitate correct insertion, deletion, and modification. Normalization is a major task in the design of relational databases. Despite its importance, very few algorithms have been developed to be used in the design of commercial automatic normalization tools. It is also rare technique to do it automatically rather manually. Moreover, for a large and complex database as of now, it make even harder to do it manually. This paper presents a new complete automated relational database normalization method. It produces the directed graph and spanning tree, first. It then proceeds with generating the 2NF, 3NF and also BCNF normal forms. The benefit of this new algorithm is that it can cope with a large set of complex function dependencies.
Keywords: Relational Database, Functional Dependency, Automatic Normalization, Primary Key, Spanning tree.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 286623 A Minimum Spanning Tree-Based Method for Initializing the K-Means Clustering Algorithm
Authors: J. Yang, Y. Ma, X. Zhang, S. Li, Y. Zhang
Abstract:
The traditional k-means algorithm has been widely used as a simple and efficient clustering method. However, the algorithm often converges to local minima for the reason that it is sensitive to the initial cluster centers. In this paper, an algorithm for selecting initial cluster centers on the basis of minimum spanning tree (MST) is presented. The set of vertices in MST with same degree are regarded as a whole which is used to find the skeleton data points. Furthermore, a distance measure between the skeleton data points with consideration of degree and Euclidean distance is presented. Finally, MST-based initialization method for the k-means algorithm is presented, and the corresponding time complexity is analyzed as well. The presented algorithm is tested on five data sets from the UCI Machine Learning Repository. The experimental results illustrate the effectiveness of the presented algorithm compared to three existing initialization methods.
Keywords: Degree, initial cluster center, k-means, minimum spanning tree.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 155222 Comparative Evaluation of Color-Based Video Signatures in the Presence of Various Distortion Types
Authors: Aritz Sánchez de la Fuente, Patrick Ndjiki-Nya, Karsten Sühring, Tobias Hinz, Karsten Müller, Thomas Wiegand
Abstract:
The robustness of color-based signatures in the presence of a selection of representative distortions is investigated. Considered are five signatures that have been developed and evaluated within a new modular framework. Two signatures presented in this work are directly derived from histograms gathered from video frames. The other three signatures are based on temporal information by computing difference histograms between adjacent frames. In order to obtain objective and reproducible results, the evaluations are conducted based on several randomly assembled test sets. These test sets are extracted from a video repository that contains a wide range of broadcast content including documentaries, sports, news, movies, etc. Overall, the experimental results show the adequacy of color-histogram-based signatures for video fingerprinting applications and indicate which type of signature should be preferred in the presence of certain distortions.
Keywords: color histograms, robust hashing, video retrieval, video signature
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 144621 Re-Optimization MVPP Using Common Subexpression for Materialized View Selection
Authors: Boontita Suchyukorn, Raweewan Auepanwiriyakul
Abstract:
A Data Warehouses is a repository of information integrated from source data. Information stored in data warehouse is the form of materialized in order to provide the better performance for answering the queries. Deciding which appropriated views to be materialized is one of important problem. In order to achieve this requirement, the constructing search space close to optimal is a necessary task. It will provide effective result for selecting view to be materialized. In this paper we have proposed an approach to reoptimize Multiple View Processing Plan (MVPP) by using global common subexpressions. The merged queries which have query processing cost not close to optimal would be rewritten. The experiment shows that our approach can help to improve the total query processing cost of MVPP and sum of query processing cost and materialized view maintenance cost is reduced as well after views are selected to be materialized.
Keywords: Data Warehouse, materialized views, query rewriting, common subexpressions.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 167820 Automated Buffer Box Assembly Cell Concept for the Canadian Used Fuel Packing Plant
Authors: Dimitrie Marinceu, Alan Murchison
Abstract:
The Canadian Used Fuel Container (UFC) is a mid-size hemispherical headed copper coated steel container measuring 2.5 meters in length and 0.5 meters in diameter containing 48 used fuel bundles. The contained used fuel produces significant gamma radiation requiring automated assembly processes to complete the assembly. The design throughput of 2,500 UFCs per year places constraints on equipment and hot cell design for repeatability, speed of processing, robustness and recovery from upset conditions. After UFC assembly, the UFC is inserted into a Buffer Box (BB). The BB is made from adequately pre-shaped blocks (lower and upper block) and Highly Compacted Bentonite (HCB) material. The blocks are practically ‘sandwiching’ the UFC between them after assembly. This paper identifies one possible approach for the BB automatic assembly cell and processes. Automation of the BB assembly will have a significant positive impact on nuclear safety, quality, productivity, and reliability.
Keywords: Used fuel packing plant, automatic assembly cell, used fuel container, buffer box, deep geological repository.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 105519 The Resource Description Framework (RDF) as a Modern Structure for Medical Data
Authors: Gabriela Lindemann, Danilo Schmidt, Thomas Schrader, Dietmar Keune
Abstract:
The amount and heterogeneity of data in biomedical research, notably in interdisciplinary fields, requires new methods for the collection, presentation and analysis of information. Important data from laboratory experiments as well as patient trials are available but come out of distributed resources. The Charité - University Hospital Berlin has established together with the German Research Foundation (DFG) a new information service centre for kidney diseases and transplantation (Open European Nephrology Science Centre - OpEN.SC). Beside a collaborative aspect to create new research groups every single partner or institution of this science information centre making his own data available is allowed to search the whole data pool of the various involved centres. A core task is the implementation of a non-restricting open data structure for the various different data sources. We decided to use a modern RDF model and in a first phase transformed original data coming from the web-based Electronic Patient Record database TBase©.
Keywords: Medical databases, Resource Description Framework (RDF), metadata repository.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 203018 A Distance Function for Data with Missing Values and Its Application
Authors: Loai AbdAllah, Ilan Shimshoni
Abstract:
Missing values in data are common in real world applications. Since the performance of many data mining algorithms depend critically on it being given a good metric over the input space, we decided in this paper to define a distance function for unlabeled datasets with missing values. We use the Bhattacharyya distance, which measures the similarity of two probability distributions, to define our new distance function. According to this distance, the distance between two points without missing attributes values is simply the Mahalanobis distance. When on the other hand there is a missing value of one of the coordinates, the distance is computed according to the distribution of the missing coordinate. Our distance is general and can be used as part of any algorithm that computes the distance between data points. Because its performance depends strongly on the chosen distance measure, we opted for the k nearest neighbor classifier to evaluate its ability to accurately reflect object similarity. We experimented on standard numerical datasets from the UCI repository from different fields. On these datasets we simulated missing values and compared the performance of the kNN classifier using our distance to other three basic methods. Our experiments show that kNN using our distance function outperforms the kNN using other methods. Moreover, the runtime performance of our method is only slightly higher than the other methods.
Keywords: Missing values, Distance metric, Bhattacharyya distance.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 275117 The Design of the HL7 RIM-based Sharing Components for Clinical Information Systems
Authors: Wei-Yi Yang, Li-Hui Lee, Hsiao-Li Gien, Hsing-Yi Chu, Yi-Ting Chou, Der-Ming Liou
Abstract:
The American Health Level Seven (HL7) Reference Information Model (RIM) consists of six back-bone classes that have different specialized attributes. Furthermore, for the purpose of enforcing the semantic expression, there are some specific mandatory vocabulary domains have been defined for representing the content values of some attributes. In the light of the fact that it is a duplicated effort on spending a lot of time and human cost to develop and modify Clinical Information Systems (CIS) for most hospitals due to the variety of workflows. This study attempts to design and develop sharing RIM-based components of the CIS for the different business processes. Therefore, the CIS contains data of a consistent format and type. The programmers can do transactions with the RIM-based clinical repository by the sharing RIM-based components. And when developing functions of the CIS, the sharing components also can be adopted in the system. These components not only satisfy physicians- needs in using a CIS but also reduce the time of developing new components of a system. All in all, this study provides a new viewpoint that integrating the data and functions with the business processes, it is an easy and flexible approach to build a new CIS.
Keywords: HL7, Reference Information Model (RIM), web service, process management.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 188516 Mining Correlated Bicluster from Web Usage Data Using Discrete Firefly Algorithm Based Biclustering Approach
Authors: K. Thangavel, R. Rathipriya
Abstract:
For the past one decade, biclustering has become popular data mining technique not only in the field of biological data analysis but also in other applications like text mining, market data analysis with high-dimensional two-way datasets. Biclustering clusters both rows and columns of a dataset simultaneously, as opposed to traditional clustering which clusters either rows or columns of a dataset. It retrieves subgroups of objects that are similar in one subgroup of variables and different in the remaining variables. Firefly Algorithm (FA) is a recently-proposed metaheuristic inspired by the collective behavior of fireflies. This paper provides a preliminary assessment of discrete version of FA (DFA) while coping with the task of mining coherent and large volume bicluster from web usage dataset. The experiments were conducted on two web usage datasets from public dataset repository whereby the performance of FA was compared with that exhibited by other population-based metaheuristic called binary Particle Swarm Optimization (PSO). The results achieved demonstrate the usefulness of DFA while tackling the biclustering problem.
Keywords: Biclustering, Binary Particle Swarm Optimization, Discrete Firefly Algorithm, Firefly Algorithm, Usage profile Web usage mining.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 213315 A Hybrid Feature Selection by Resampling, Chi squared and Consistency Evaluation Techniques
Authors: Amir-Massoud Bidgoli, Mehdi Naseri Parsa
Abstract:
In this paper a combined feature selection method is proposed which takes advantages of sample domain filtering, resampling and feature subset evaluation methods to reduce dimensions of huge datasets and select reliable features. This method utilizes both feature space and sample domain to improve the process of feature selection and uses a combination of Chi squared with Consistency attribute evaluation methods to seek reliable features. This method consists of two phases. The first phase filters and resamples the sample domain and the second phase adopts a hybrid procedure to find the optimal feature space by applying Chi squared, Consistency subset evaluation methods and genetic search. Experiments on various sized datasets from UCI Repository of Machine Learning databases show that the performance of five classifiers (Naïve Bayes, Logistic, Multilayer Perceptron, Best First Decision Tree and JRIP) improves simultaneously and the classification error for these classifiers decreases considerably. The experiments also show that this method outperforms other feature selection methods.Keywords: feature selection, resampling, reliable features, Consistency Subset Evaluation.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 258214 Fast Adjustable Threshold for Uniform Neural Network Quantization
Authors: Alexander Goncharenko, Andrey Denisov, Sergey Alyamkin, Evgeny Terentev
Abstract:
The neural network quantization is highly desired procedure to perform before running neural networks on mobile devices. Quantization without fine-tuning leads to accuracy drop of the model, whereas commonly used training with quantization is done on the full set of the labeled data and therefore is both time- and resource-consuming. Real life applications require simplification and acceleration of quantization procedure that will maintain accuracy of full-precision neural network, especially for modern mobile neural network architectures like Mobilenet-v1, MobileNet-v2 and MNAS. Here we present a method to significantly optimize training with quantization procedure by introducing the trained scale factors for discretization thresholds that are separate for each filter. Using the proposed technique, we quantize the modern mobile architectures of neural networks with the set of train data of only ∼ 10% of the total ImageNet 2012 sample. Such reduction of train dataset size and small number of trainable parameters allow to fine-tune the network for several hours while maintaining the high accuracy of quantized model (accuracy drop was less than 0.5%). Ready-for-use models and code are available in the GitHub repository.Keywords: Distillation, machine learning, neural networks, quantization.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 73213 Generic Workload Management System Using Condor-Based Pilot Factory in PanDA Framework
Authors: Po-Hsiang Chiu, Torre Wenaus
Abstract:
In the current Grid environment, efficient workload management presents a significant challenge, for which there are exorbitant de facto standards encompassing resource discovery, brokerage, and data transfer, among others. In addition, the real-time resource status, essential for an optimal resource allocation strategy, is often not readily accessible. To address these issues and provide a cleaner abstraction of the Grid with the potential of generalizing into arbitrary resource-sharing environment, this paper proposes a new Condor-based pilot mechanism applied in the PanDA architecture, PanDA-PF WMS, with the goal of providing a more generic yet efficient resource allocating strategy. In this architecture, the PanDA server primarily acts as a repository of user jobs, responding to pilot requests from distributed, remote resources. Scheduling decisions are subsequently made according to the real-time resource information reported by pilots. Pilot Factory is a Condor-inspired solution for a scalable pilot dissemination and effectively functions as a resource provisioning mechanism through which the user-job server, PanDA, reaches out to the candidate resources only on demand.Keywords: Condor, glidein, PanDA, Pilot, Pilot Factory.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 209812 Use of Gaussian-Euclidean Hybrid Function Based Artificial Immune System for Breast Cancer Diagnosis
Authors: Cuneyt Yucelbas, Seral Ozsen, Sule Yucelbas, Gulay Tezel
Abstract:
Due to the fact that there exist only a small number of complex systems in artificial immune system (AIS) that work out nonlinear problems, nonlinear AIS approaches, among the well-known solution techniques, need to be developed. Gaussian function is usually used as similarity estimation in classification problems and pattern recognition. In this study, diagnosis of breast cancer, the second type of the most widespread cancer in women, was performed with different distance calculation functions that euclidean, gaussian and gaussian-euclidean hybrid function in the clonal selection model of classical AIS on Wisconsin Breast Cancer Dataset (WBCD), which was taken from the University of California, Irvine Machine-Learning Repository. We used 3-fold cross validation method to train and test the dataset. According to the results, the maximum test classification accuracy was reported as 97.35% by using of gaussian-euclidean hybrid function for fold-3. Also, mean of test classification accuracies for all of functions were obtained as 94.78%, 94.45% and 95.31% with use of euclidean, gaussian and gaussian-euclidean, respectively. With these results, gaussian-euclidean hybrid function seems to be a potential distance calculation method, and it may be considered as an alternative distance calculation method for hard nonlinear classification problems.
Keywords: Artificial Immune System, Breast Cancer Diagnosis, Euclidean Function, Gaussian Function.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 212111 Static Analysis of Security Issues of the Python Packages Ecosystem
Authors: Adam Gorine, Faten Spondon
Abstract:
Python is considered the most popular programming language and offers its own ecosystem for archiving and maintaining open-source software packages. This system is called the Python Package Index (PyPI), the repository of this programming language. Unfortunately, one-third of these software packages have vulnerabilities that allow attackers to execute code automatically when a vulnerable or malicious package is installed. This paper contributes to large-scale empirical studies investigating security issues in the Python ecosystem by evaluating package vulnerabilities. These provide a series of implications that can help the security of software ecosystems by improving the process of discovering, fixing, and managing package vulnerabilities. The vulnerable dataset is generated using the NVD, the National Vulnerability Database, and the Snyk vulnerability dataset. In addition, we evaluated 807 vulnerability reports in the NVD and 3900 publicly known security vulnerabilities in Python Package Manager (Pip) from the Snyk database from 2002 to 2022. As a result, many Python vulnerabilities appear in high severity, followed by medium severity. The most problematic areas have been improper input validation and denial of service attacks. A hybrid scanning tool that combines the three scanners, Bandit, Snyk and Dlint, which provide a clear report of the code vulnerability, is also described.
Keywords: Python vulnerabilities, Bandit, Snyk, Dlint, Python Package Index, ecosystem, static analysis, malicious attacks.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 24010 Distances over Incomplete Diabetes and Breast Cancer Data Based on Bhattacharyya Distance
Authors: Loai AbdAllah, Mahmoud Kaiyal
Abstract:
Missing values in real-world datasets are a common problem. Many algorithms were developed to deal with this problem, most of them replace the missing values with a fixed value that was computed based on the observed values. In our work, we used a distance function based on Bhattacharyya distance to measure the distance between objects with missing values. Bhattacharyya distance, which measures the similarity of two probability distributions. The proposed distance distinguishes between known and unknown values. Where the distance between two known values is the Mahalanobis distance. When, on the other hand, one of them is missing the distance is computed based on the distribution of the known values, for the coordinate that contains the missing value. This method was integrated with Wikaya, a digital health company developing a platform that helps to improve prevention of chronic diseases such as diabetes and cancer. In order for Wikaya’s recommendation system to work distance between users need to be measured. Since there are missing values in the collected data, there is a need to develop a distance function distances between incomplete users profiles. To evaluate the accuracy of the proposed distance function in reflecting the actual similarity between different objects, when some of them contain missing values, we integrated it within the framework of k nearest neighbors (kNN) classifier, since its computation is based only on the similarity between objects. To validate this, we ran the algorithm over diabetes and breast cancer datasets, standard benchmark datasets from the UCI repository. Our experiments show that kNN classifier using our proposed distance function outperforms the kNN using other existing methods.Keywords: Missing values, distance metric, Bhattacharyya distance.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 7819 A Fuzzy-Rough Feature Selection Based on Binary Shuffled Frog Leaping Algorithm
Authors: Javad Rahimipour Anaraki, Saeed Samet, Mahdi Eftekhari, Chang Wook Ahn
Abstract:
Feature selection and attribute reduction are crucial problems, and widely used techniques in the field of machine learning, data mining and pattern recognition to overcome the well-known phenomenon of the Curse of Dimensionality. This paper presents a feature selection method that efficiently carries out attribute reduction, thereby selecting the most informative features of a dataset. It consists of two components: 1) a measure for feature subset evaluation, and 2) a search strategy. For the evaluation measure, we have employed the fuzzy-rough dependency degree (FRFDD) of the lower approximation-based fuzzy-rough feature selection (L-FRFS) due to its effectiveness in feature selection. As for the search strategy, a modified version of a binary shuffled frog leaping algorithm is proposed (B-SFLA). The proposed feature selection method is obtained by hybridizing the B-SFLA with the FRDD. Nine classifiers have been employed to compare the proposed approach with several existing methods over twenty two datasets, including nine high dimensional and large ones, from the UCI repository. The experimental results demonstrate that the B-SFLA approach significantly outperforms other metaheuristic methods in terms of the number of selected features and the classification accuracy.Keywords: Binary shuffled frog leaping algorithm, feature selection, fuzzy-rough set, minimal reduct.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 7318 Routing Medical Images with Tabu Search and Simulated Annealing: A Study on Quality of Service
Authors: Mejía M. Paula, Ramírez L. Leonardo, Puerta A. Gabriel
Abstract:
In telemedicine, the image repository service is important to increase the accuracy of diagnostic support of medical personnel. This study makes comparison between two routing algorithms regarding the quality of service (QoS), to be able to analyze the optimal performance at the time of loading and/or downloading of medical images. This study focused on comparing the performance of Tabu Search with other heuristic and metaheuristic algorithms that improve QoS in telemedicine services in Colombia. For this, Tabu Search and Simulated Annealing heuristic algorithms are chosen for their high usability in this type of applications; the QoS is measured taking into account the following metrics: Delay, Throughput, Jitter and Latency. In addition, routing tests were carried out on ten images in digital image and communication in medicine (DICOM) format of 40 MB. These tests were carried out for ten minutes with different traffic conditions, reaching a total of 25 tests, from a server of Universidad Militar Nueva Granada (UMNG) in Bogotá-Colombia to a remote user in Universidad de Santiago de Chile (USACH) - Chile. The results show that Tabu search presents a better QoS performance compared to Simulated Annealing, managing to optimize the routing of medical images, a basic requirement to offer diagnostic images services in telemedicine.
Keywords: Medical image, QoS, simulated annealing, Tabu search, telemedicine.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 9567 Web Content Mining: A Solution to Consumer's Product Hunt
Authors: Syed Salman Ahmed, Zahid Halim, Rauf Baig, Shariq Bashir
Abstract:
With the rapid growth in business size, today's businesses orient towards electronic technologies. Amazon.com and e-bay.com are some of the major stakeholders in this regard. Unfortunately the enormous size and hugely unstructured data on the web, even for a single commodity, has become a cause of ambiguity for consumers. Extracting valuable information from such an everincreasing data is an extremely tedious task and is fast becoming critical towards the success of businesses. Web content mining can play a major role in solving these issues. It involves using efficient algorithmic techniques to search and retrieve the desired information from a seemingly impossible to search unstructured data on the Internet. Application of web content mining can be very encouraging in the areas of Customer Relations Modeling, billing records, logistics investigations, product cataloguing and quality management. In this paper we present a review of some very interesting, efficient yet implementable techniques from the field of web content mining and study their impact in the area specific to business user needs focusing both on the customer as well as the producer. The techniques we would be reviewing include, mining by developing a knowledge-base repository of the domain, iterative refinement of user queries for personalized search, using a graphbased approach for the development of a web-crawler and filtering information for personalized search using website captions. These techniques have been analyzed and compared on the basis of their execution time and relevance of the result they produced against a particular search.
Keywords: Data mining, web mining, search engines, knowledge discovery.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 20526 An Extensible Software Infrastructure for Computer Aided Custom Monitoring of Patients in Smart Homes
Authors: Ritwik Dutta, Marilyn Wolf
Abstract:
This paper describes the tradeoffs and the design from scratch of a self-contained, easy-to-use health dashboard software system that provides customizable data tracking for patients in smart homes. The system is made up of different software modules and comprises a front-end and a back-end component. Built with HTML, CSS, and JavaScript, the front-end allows adding users, logging into the system, selecting metrics, and specifying health goals. The backend consists of a NoSQL Mongo database, a Python script, and a SimpleHTTPServer written in Python. The database stores user profiles and health data in JSON format. The Python script makes use of the PyMongo driver library to query the database and displays formatted data as a daily snapshot of user health metrics against target goals. Any number of standard and custom metrics can be added to the system, and corresponding health data can be fed automatically, via sensor APIs or manually, as text or picture data files. A real-time METAR request API permits correlating weather data with patient health, and an advanced query system is implemented to allow trend analysis of selected health metrics over custom time intervals. Available on the GitHub repository system, the project is free to use for academic purposes of learning and experimenting, or practical purposes by building on it.
Keywords: Flask, Java, JavaScript, health monitoring, long term care, Mongo, Python, smart home, software engineering, webserver.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 21335 A Specification-Based Approach for Retrieval of Reusable Business Component for Software Reuse
Authors: Meng Fanchao, Zhan Dechen, Xu Xiaofei
Abstract:
Software reuse can be considered as the most realistic and promising way to improve software engineering productivity and quality. Automated assistance for software reuse involves the representation, classification, retrieval and adaptation of components. The representation and retrieval of components are important to software reuse in Component-Based on Software Development (CBSD). However, current industrial component models mainly focus on the implement techniques and ignore the semantic information about component, so it is difficult to retrieve the components that satisfy user-s requirements. This paper presents a method of business component retrieval based on specification matching to solve the software reuse of enterprise information system. First, a business component model oriented reuse is proposed. In our model, the business data type is represented as sign data type based on XML, which can express the variable business data type that can describe the variety of business operations. Based on this model, we propose specification match relationships in two levels: business operation level and business component level. In business operation level, we use input business data types, output business data types and the taxonomy of business operations evaluate the similarity between business operations. In the business component level, we propose five specification matches between business components. To retrieval reusable business components, we propose the measure of similarity degrees to calculate the similarities between business components. Finally, a business component retrieval command like SQL is proposed to help user to retrieve approximate business components from component repository.Keywords: Business component, business operation, business data type, specification matching.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 14084 Development of an Ensemble Classification Model Based on Hybrid Filter-Wrapper Feature Selection for Email Phishing Detection
Authors: R. B. Ibrahim, M. S. Argungu, I. M. Mungadi
Abstract:
It is obvious in this present time, internet has become an indispensable part of human life since its inception. The Internet has provided diverse opportunities to make life so easy for human beings, through the adoption of various channels. Among these channels are email, internet banking, video conferencing, and the like. Email is one of the easiest means of communication hugely accepted among individuals and organizations globally. But over decades the security integrity of this platform has been challenged with malicious activities like Phishing. Email phishing is designed by phishers to fool the recipient into handing over sensitive personal information such as passwords, credit card numbers, account credentials, social security numbers, etc. This activity has caused a lot of financial damage to email users globally which has resulted in bankruptcy, sudden death of victims, and other health-related sicknesses. Although many methods have been proposed to detect email phishing, in this research, the results of multiple machine-learning methods for predicting email phishing have been compared with the use of filter-wrapper feature selection. It is worth noting that all three models performed substantially but one outperformed the other. The dataset used for these models is obtained from Kaggle online data repository, while three classifiers: decision tree, Naïve Bayes, and Logistic regression are ensemble (Bagging) respectively. Results from the study show that the Decision Tree (CART) bagging ensemble recorded the highest accuracy of 98.13% using PEF (Phishing Essential Features). This result further demonstrates the dependability of the proposed model.
Keywords: Ensemble, hybrid, filter-wrapper, phishing.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1783 Designing for Inclusion within the Learning Management System: Social Justice, Identities, and Online Design for Digital Spaces in Higher Education
Authors: Christina Van Wingerden
Abstract:
The aim of this paper is to propose pedagogical design for learning management systems (LMS) that offers greater inclusion for students based on a number of theoretical perspectives and delineated through an example. Considering the impact of COVID-19, including on student mental health, the research suggesting the importance of student sense of belonging on retention, success, and student well-being, the author describes intentional LMS design incorporating theoretically based practices informed by critical theory, feminist theory, indigenous theory and practices, and new materiality. This article considers important aspects of these theories and practices which attend to inclusion, identities, and socially just learning environments. Additionally, increasing student sense of belonging and mental health through LMS design influenced by adult learning theory and the community of inquiry model are described. The process of thinking through LMS pedagogical design with inclusion intentionally in mind affords the opportunity to allow LMS to go beyond course use as a repository of documents, to an intentional community of practice that facilitates belonging and connection, something much needed in our times. In virtual learning environments it has been harder to discern how students are doing, especially in feeling connected to their courses, their faculty, and their student peers. Increasingly at the forefront of public universities is addressing the needs of students with multiple and intersecting identities and the multiplicity of needs and accommodations. Education in 2020, and moving forward, calls for embedding critical theories and inclusive ideals and pedagogies to the ways instructors design and teach in online platforms. Through utilization of critical theoretical frameworks and instructional practices, students may experience the LMS as a welcoming place with intentional plans for welcoming diversity in identities.
Keywords: Belonging, critical pedagogy, instructional design, Learning Management System, LMS.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 834