Search results for: clustered data
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 24192

Search results for: clustered data

24192 Design and Analysis of a Clustered Nozzle Configuration and Comparison of Its Thrust

Authors: Abdul Hadi Butt, Asfandyar Arshad

Abstract:

The purpose of this paper is to study the thrust variation in different configurations of clustered nozzles. It involves the design and analysis of clustered configuration of nozzles using Ansys fluent. Clustered nozzles with different configurations are simulated and compared on basis of effective exhaust thrust. Mixing length for the flow interaction is also calculated. Further clustered configurations are analyzed over different altitudes. An optimum value of the thrust among different configurations is proposed at the end of comparisons.

Keywords: CD nozzle, cluster, thrust, fluent, ANSYS

Procedia PDF Downloads 372
24191 An Approach for Estimation in Hierarchical Clustered Data Applicable to Rare Diseases

Authors: Daniel C. Bonzo

Abstract:

Practical considerations lead to the use of unit of analysis within subjects, e.g., bleeding episodes or treatment-related adverse events, in rare disease settings. This is coupled with data augmentation techniques such as extrapolation to enlarge the subject base. In general, one can think about extrapolation of data as extending information and conclusions from one estimand to another estimand. This approach induces hierarchichal clustered data with varying cluster sizes. Extrapolation of clinical trial data is being accepted increasingly by regulatory agencies as a means of generating data in diverse situations during drug development process. Under certain circumstances, data can be extrapolated to a different population, a different but related indication, and different but similar product. We consider here the problem of estimation (point and interval) using a mixed-models approach under an extrapolation. It is proposed that estimators (point and interval) be constructed using weighting schemes for the clusters, e.g., equally weighted and with weights proportional to cluster size. Simulated data generated under varying scenarios are then used to evaluate the performance of this approach. In conclusion, the evaluation result showed that the approach is a useful means for improving statistical inference in rare disease settings and thus aids not only signal detection but risk-benefit evaluation as well.

Keywords: clustered data, estimand, extrapolation, mixed model

Procedia PDF Downloads 104
24190 Developing a Clustered-Based Model and Strategy for Waterfront Urban Tourism in Manado, Indonesia

Authors: Bet El Silisna Lagarense, Agustinus Walansendow

Abstract:

Manado Waterfront Development (MWD) occurs along the coastline of the city to meet the communities’ various needs and interests. Manado waterfront, with its various kinds of tourist attractions, is being developed to strengthen opportunities for both tourism and other businesses. There are many buildings that are used for trade and business purposes. The spatial distributions of tourism, commercial and residential land uses overlap. Field research at the study site consisted desktop scan, questionnaire-based survey, observation and in-depth interview with key informants and Focus Group Discussion (FGD) identified how MWD was initially planned and designed in the whole process of decision making in terms of resource and environmental management particularly for the waterfront tourism development in the long run. The study developed a clustered-based model for waterfront urban tourism in Manado through evaluation of spatial distribution of tourism uses along the waterfront.

Keywords: clustered-based model, Manado, urban tourism, waterfront

Procedia PDF Downloads 269
24189 Improving Similarity Search Using Clustered Data

Authors: Deokho Kim, Wonwoo Lee, Jaewoong Lee, Teresa Ng, Gun-Ill Lee, Jiwon Jeong

Abstract:

This paper presents a method for improving object search accuracy using a deep learning model. A major limitation to provide accurate similarity with deep learning is the requirement of huge amount of data for training pairwise similarity scores (metrics), which is impractical to collect. Thus, similarity scores are usually trained with a relatively small dataset, which comes from a different domain, causing limited accuracy on measuring similarity. For this reason, this paper proposes a deep learning model that can be trained with a significantly small amount of data, a clustered data which of each cluster contains a set of visually similar images. In order to measure similarity distance with the proposed method, visual features of two images are extracted from intermediate layers of a convolutional neural network with various pooling methods, and the network is trained with pairwise similarity scores which is defined zero for images in identical cluster. The proposed method outperforms the state-of-the-art object similarity scoring techniques on evaluation for finding exact items. The proposed method achieves 86.5% of accuracy compared to the accuracy of the state-of-the-art technique, which is 59.9%. That is, an exact item can be found among four retrieved images with an accuracy of 86.5%, and the rest can possibly be similar products more than the accuracy. Therefore, the proposed method can greatly reduce the amount of training data with an order of magnitude as well as providing a reliable similarity metric.

Keywords: visual search, deep learning, convolutional neural network, machine learning

Procedia PDF Downloads 184
24188 Starlink Satellite Collision Probability Simulation Based on Simplified Geometry Model

Authors: Toby Li, Julian Zhu

Abstract:

In this paper, a model based on a simplified geometry is introduced to give a very conservative collision probability prediction for the Starlink satellite in its most densely clustered region. Under the model in this paper, the probability of collision for Starlink satellite where it clustered most densely is found to be 8.484 ∗ 10^−4. It is found that the predicted collision probability increased nonlinearly with the increased safety distance set. This simple model provides evidence that the continuous development of maneuver avoidance systems is necessary for the future of the orbital safety of satellites under the harsher Lower Earth Orbit environment.

Keywords: Starlink, collision probability, debris, geometry model

Procedia PDF Downloads 45
24187 Flowing Online Vehicle GPS Data Clustering Using a New Parallel K-Means Algorithm

Authors: Orhun Vural, Oguz Bayat, Rustu Akay, Osman N. Ucan

Abstract:

This study presents a new parallel approach clustering of GPS data. Evaluation has been made by comparing execution time of various clustering algorithms on GPS data. This paper aims to propose a parallel based on neighborhood K-means algorithm to make it faster. The proposed parallelization approach assumes that each GPS data represents a vehicle and to communicate between vehicles close to each other after vehicles are clustered. This parallelization approach has been examined on different sized continuously changing GPS data and compared with serial K-means algorithm and other serial clustering algorithms. The results demonstrated that proposed parallel K-means algorithm has been shown to work much faster than other clustering algorithms.

Keywords: parallel k-means algorithm, parallel clustering, clustering algorithms, clustering on flowing data

Procedia PDF Downloads 186
24186 Effectiveness of ISSR Technique in Revealing Genetic Diversity of Phaseolus vulgaris L. Representing Various Parts of the World

Authors: Mohamed El-Shikh

Abstract:

Phaseolus vulgaris L. is the world’s second most important bean after soybeans; used for human food and animal feed. It has generally been linked to reduced risk of cardiovascular disease, diabetes mellitus, obesity, cancer and diseases of digestive tract. The effectiveness of ISSR in achievement of the genetic diversity among 60 common bean accessions; represent various germplasms around the world was investigated. In general, the studied Phaseolus vulgaris accessions were divided into 2 major groups. All of the South-American accessions were separated into the second major group. These accessions may have different genetic features that are distinct from the rest of the accessions clustered in the major group. Asia and Europe accessions (1-20) seem to be more genetically similar (99%) to each other as they clustered in the same sub-group. The American and African varieties showed similarities as well and clustered in the same sub-tree group. In contrast, Asian and American accessions No. 22 and 23 showed a high level of genetic similarities, although these were isolated from different regions. The phylogenetic tree showed that all the Asian accessions (along with Australian No. 59 and 60) were similar except Indian and Yemen accessions No. 9 and 20. Only Netherlands accession No. 3 was different from the rest of European accessions. Morocco accession No. 52 was genetically different from the rest of the African accessions. Canadian accession No. 44 seems to be different from the other North American accessions including Guatemala, Mexico and USA.

Keywords: phylogenetic tree, Phaseolus vulgaris, ISSR technique, genetics

Procedia PDF Downloads 379
24185 Microsatellite-Based Genetic Variations and Relationships among Some Farmed Nile Tilapia Populations in Ghana: Implications for Nile Tilapia Culture

Authors: Acheampong Addo, Emmanuel Odartei Armah, Seth Koranteng Agyakwah, Ruby Asmah, Emmanuel Tetteh-Doku Mensah, Rhoda Lims Diyie, Sena Amewu, Catherine Ragasa, Edward Kofi Abban, Mike Yaw Osei-Atweneboana

Abstract:

The study investigated genetic variation and relationships among populations of Nile tilapia cultured in small-scale fish farms in selected regions of Ghana. A total of 700 samples were collected. All samples were screened with five microsatellite markers and results were analyzed using (Genetic Analysis in Excel), (Molecular and Evolutionary Genetic Analysis software, and Genpop on the web for Heterozygosity and Shannon diversity, (Analysis of Molecular Variance), and (Principal Coordinate Analysis). Fish from the 16 populations (made up of 14 farms and 2 selectively bred populations) clustered into three groups: 7 populations clustered with the GIFT-derived strain, 4 populations clustered with the Akosombo strain, and three populations were in a separate cluster. The clustering pattern indicated groups of different strains of Nile tilapia cultured. Mantel correlation test also showed low genetic variations among the 16 populations hence the need to boost seed quality in order to accelerate aquaculture production in Ghana.

Keywords: microsatellites, small- scale, Nile tilapia, akosombo strain, GIFT strain

Procedia PDF Downloads 121
24184 Sensor Data Analysis for a Large Mining Major

Authors: Sudipto Shanker Dasgupta

Abstract:

One of the largest mining companies wanted to look at health analytics for their driverless trucks. These trucks were the key to their supply chain logistics. The automated trucks had multi-level sub-assemblies which would send out sensor information. The use case that was worked on was to capture the sensor signal from the truck subcomponents and analyze the health of the trucks from repair and replacement purview. Open source software was used to stream the data into a clustered Hadoop setup in Amazon Web Services cloud and Apache Spark SQL was used to analyze the data. All of this was achieved through a 10 node amazon 32 core, 64 GB RAM setup real-time analytics was achieved on ‘300 million records’. To check the scalability of the system, the cluster was increased to 100 node setup. This talk will highlight how Open Source software was used to achieve the above use case and the insights on the high data throughput on a cloud set up.

Keywords: streaming analytics, data science, big data, Hadoop, high throughput, sensor data

Procedia PDF Downloads 376
24183 Customer Churn Analysis in Telecommunication Industry Using Data Mining Approach

Authors: Burcu Oralhan, Zeki Oralhan, Nilsun Sariyer, Kumru Uyar

Abstract:

Data mining has been becoming more and more important and a wide range of applications in recent years. Data mining is the process of find hidden and unknown patterns in big data. One of the applied fields of data mining is Customer Relationship Management. Understanding the relationships between products and customers is crucial for every business. Customer Relationship Management is an approach to focus on customer relationship development, retention and increase on customer satisfaction. In this study, we made an application of a data mining methods in telecommunication customer relationship management side. This study aims to determine the customers profile who likely to leave the system, develop marketing strategies, and customized campaigns for customers. Data are clustered by applying classification techniques for used to determine the churners. As a result of this study, we will obtain knowledge from international telecommunication industry. We will contribute to the understanding and development of this subject in Customer Relationship Management.

Keywords: customer churn analysis, customer relationship management, data mining, telecommunication industry

Procedia PDF Downloads 281
24182 Uncertainty Quantification of Corrosion Anomaly Length of Oil and Gas Steel Pipelines Based on Inline Inspection and Field Data

Authors: Tammeen Siraj, Wenxing Zhou, Terry Huang, Mohammad Al-Amin

Abstract:

The high resolution inline inspection (ILI) tool is used extensively in the pipeline industry to identify, locate, and measure metal-loss corrosion anomalies on buried oil and gas steel pipelines. Corrosion anomalies may occur singly (i.e. individual anomalies) or as clusters (i.e. a colony of corrosion anomalies). Although the ILI technology has advanced immensely, there are measurement errors associated with the sizes of corrosion anomalies reported by ILI tools due limitations of the tools and associated sizing algorithms, and detection threshold of the tools (i.e. the minimum detectable feature dimension). Quantifying the measurement error in the ILI data is crucial for corrosion management and developing maintenance strategies that satisfy the safety and economic constraints. Studies on the measurement error associated with the length of the corrosion anomalies (in the longitudinal direction of the pipeline) has been scarcely reported in the literature and will be investigated in the present study. Limitations in the ILI tool and clustering process can sometimes cause clustering error, which is defined as the error introduced during the clustering process by including or excluding a single or group of anomalies in or from a cluster. Clustering error has been found to be one of the biggest contributory factors for relatively high uncertainties associated with ILI reported anomaly length. As such, this study focuses on developing a consistent and comprehensive framework to quantify the measurement errors in the ILI-reported anomaly length by comparing the ILI data and corresponding field measurements for individual and clustered corrosion anomalies. The analysis carried out in this study is based on the ILI and field measurement data for a set of anomalies collected from two segments of a buried natural gas pipeline currently in service in Alberta, Canada. Data analyses showed that the measurement error associated with the ILI-reported length of the anomalies without clustering error, denoted as Type I anomalies is markedly less than that for anomalies with clustering error, denoted as Type II anomalies. A methodology employing data mining techniques is further proposed to classify the Type I and Type II anomalies based on the ILI-reported corrosion anomaly information.

Keywords: clustered corrosion anomaly, corrosion anomaly assessment, corrosion anomaly length, individual corrosion anomaly, metal-loss corrosion, oil and gas steel pipeline

Procedia PDF Downloads 279
24181 Estimation of Rare and Clustered Population Mean Using Two Auxiliary Variables in Adaptive Cluster Sampling

Authors: Muhammad Nouman Qureshi, Muhammad Hanif

Abstract:

Adaptive cluster sampling (ACS) is specifically developed for the estimation of highly clumped populations and applied to a wide range of situations like animals of rare and endangered species, uneven minerals, HIV patients and drug users. In this paper, we proposed a generalized semi-exponential estimator with two auxiliary variables under the framework of ACS design. The expressions of approximate bias and mean square error (MSE) of the proposed estimator are derived. Theoretical comparisons of the proposed estimator have been made with existing estimators. A numerical study is conducted on real and artificial populations to demonstrate and compare the efficiencies of the proposed estimator. The results indicate that the proposed generalized semi-exponential estimator performed considerably better than all the adaptive and non-adaptive estimators considered in this paper.

Keywords: auxiliary information, adaptive cluster sampling, clustered populations, Hansen-Hurwitz estimation

Procedia PDF Downloads 199
24180 Data Clustering in Wireless Sensor Network Implemented on Self-Organization Feature Map (SOFM) Neural Network

Authors: Krishan Kumar, Mohit Mittal, Pramod Kumar

Abstract:

Wireless sensor network is one of the most promising communication networks for monitoring remote environmental areas. In this network, all the sensor nodes are communicated with each other via radio signals. The sensor nodes have capability of sensing, data storage and processing. The sensor nodes collect the information through neighboring nodes to particular node. The data collection and processing is done by data aggregation techniques. For the data aggregation in sensor network, clustering technique is implemented in the sensor network by implementing self-organizing feature map (SOFM) neural network. Some of the sensor nodes are selected as cluster head nodes. The information aggregated to cluster head nodes from non-cluster head nodes and then this information is transferred to base station (or sink nodes). The aim of this paper is to manage the huge amount of data with the help of SOM neural network. Clustered data is selected to transfer to base station instead of whole information aggregated at cluster head nodes. This reduces the battery consumption over the huge data management. The network lifetime is enhanced at a greater extent.

Keywords: artificial neural network, data clustering, self organization feature map, wireless sensor network

Procedia PDF Downloads 483
24179 The Load Balancing Algorithm for the Star Interconnection Network

Authors: Ahmad M. Awwad, Jehad Al-Sadi

Abstract:

The star network is one of the promising interconnection networks for future high speed parallel computers, it is expected to be one of the future-generation networks. The star network is both edge and vertex symmetry, it was shown to have many gorgeous topological proprieties also it is owns hierarchical structure framework. Although much of the research work has been done on this promising network in literature, it still suffers from having enough algorithms for load balancing problem. In this paper we try to work on this issue by investigating and proposing an efficient algorithm for load balancing problem for the star network. The proposed algorithm is called Star Clustered Dimension Exchange Method SCDEM to be implemented on the star network. The proposed algorithm is based on the Clustered Dimension Exchange Method (CDEM). The SCDEM algorithm is shown to be efficient in redistributing the load balancing as evenly as possible among all nodes of different factor networks.

Keywords: load balancing, star network, interconnection networks, algorithm

Procedia PDF Downloads 286
24178 A Geographical Information System Supported Method for Determining Urban Transformation Areas in the Scope of Disaster Risks in Kocaeli

Authors: Tayfun Salihoğlu

Abstract:

Following the Law No: 6306 on Transformation of Disaster Risk Areas, urban transformation in Turkey found its legal basis. In the best practices all over the World, the urban transformation was shaped as part of comprehensive social programs through the discourses of renewing the economic, social and physical degraded parts of the city, producing spaces resistant to earthquakes and other possible disasters and creating a livable environment. In Turkish practice, a contradictory process is observed. In this study, it is aimed to develop a method for better understanding of the urban space in terms of disaster risks in order to constitute a basis for decisions in Kocaeli Urban Transformation Master Plan, which is being prepared by Kocaeli Metropolitan Municipality. The spatial unit used in the study is the 50x50 meter grids. In order to reflect the multidimensionality of urban transformation, three basic components that have spatial data in Kocaeli were identified. These components were named as 'Problems in Built-up Areas', 'Disaster Risks arising from Geological Conditions of the Ground and Problems of Buildings', and 'Inadequacy of Urban Services'. Each component was weighted and scored for each grid. In order to delimitate urban transformation zones Optimized Outlier Analysis (Local Moran I) in the ArcGIS 10.6.1 was conducted to test the type of distribution (clustered or scattered) and its significance on the grids by assuming the weighted total score of the grid as Input Features. As a result of this analysis, it was found that the weighted total scores were not significantly clustering at all grids in urban space. The grids which the input feature is clustered significantly were exported as the new database to use in further mappings. Total Score Map reflects the significant clusters in terms of weighted total scores of 'Problems in Built-up Areas', 'Disaster Risks arising from Geological Conditions of the Ground and Problems of Buildings' and 'Inadequacy of Urban Services'. Resulting grids with the highest scores are the most likely candidates for urban transformation in this citywide study. To categorize urban space in terms of urban transformation, Grouping Analysis in ArcGIS 10.6.1 was conducted to data that includes each component scores in significantly clustered grids. Due to Pseudo Statistics and Box Plots, 6 groups with the highest F stats were extracted. As a result of the mapping of the groups, it can be said that 6 groups can be interpreted in a more meaningful manner in relation to the urban space. The method presented in this study can be magnified due to the availability of more spatial data. By integrating with other data to be obtained during the planning process, this method can contribute to the continuation of research and decision-making processes of urban transformation master plans on a more consistent basis.

Keywords: urban transformation, GIS, disaster risk assessment, Kocaeli

Procedia PDF Downloads 90
24177 A Critical Look on Clustered Regularly Interspaced Short Palindromic Repeats Method Based on Different Mechanisms

Authors: R. Sulakshana, R. Lakshmi

Abstract:

Clustered Regularly Interspaced Short Palindromic Repeats, CRISPR associate (CRISPR/Cas) is an adaptive immunity system found in bacteria and archaea. It has been modified to serve as a potent gene editing tool. Moreover, it has found widespread use in the field of genome research because of its accessibility and low cost. Several bioinformatics methods have been created to aid in the construction of specific single guide RNA (sgRNA), which is highly active and crucial to CRISPR/Cas performance. Various Cas proteins, including Cas1, Cas2, Cas9, and Cas12, have been used to create genome engineering tools because of their programmable sequence specificity. Class 1 and 2 CRISPR/Cas systems, as well as the processes of all known Cas proteins (including Cas9 and Cas12), are discussed in this review paper. In addition, the various CRISPR methodologies and their tools so far discovered are discussed. Finally, the challenges and issues in the CRISPR system along with future works, are presented.

Keywords: gene editing tool, Cas proteins, CRISPR, guideRNA, programmable sequence

Procedia PDF Downloads 63
24176 Clustering-Based Computational Workload Minimization in Ontology Matching

Authors: Mansir Abubakar, Hazlina Hamdan, Norwati Mustapha, Teh Noranis Mohd Aris

Abstract:

In order to build a matching pattern for each class correspondences of ontology, it is required to specify a set of attribute correspondences across two corresponding classes by clustering. Clustering reduces the size of potential attribute correspondences considered in the matching activity, which will significantly reduce the computation workload; otherwise, all attributes of a class should be compared with all attributes of the corresponding class. Most existing ontology matching approaches lack scalable attributes discovery methods, such as cluster-based attribute searching. This problem makes ontology matching activity computationally expensive. It is therefore vital in ontology matching to design a scalable element or attribute correspondence discovery method that would reduce the size of potential elements correspondences during mapping thereby reduce the computational workload in a matching process as a whole. The objective of this work is 1) to design a clustering method for discovering similar attributes correspondences and relationships between ontologies, 2) to discover element correspondences by classifying elements of each class based on element’s value features using K-medoids clustering technique. Discovering attribute correspondence is highly required for comparing instances when matching two ontologies. During the matching process, any two instances across two different data sets should be compared to their attribute values, so that they can be regarded to be the same or not. Intuitively, any two instances that come from classes across which there is a class correspondence are likely to be identical to each other. Besides, any two instances that hold more similar attribute values are more likely to be matched than the ones with less similar attribute values. Most of the time, similar attribute values exist in the two instances across which there is an attribute correspondence. This work will present how to classify attributes of each class with K-medoids clustering, then, clustered groups to be mapped by their statistical value features. We will also show how to map attributes of a clustered group to attributes of the mapped clustered group, generating a set of potential attribute correspondences that would be applied to generate a matching pattern. The K-medoids clustering phase would largely reduce the number of attribute pairs that are not corresponding for comparing instances as only the coverage probability of attributes pairs that reaches 100% and attributes above the specified threshold can be considered as potential attributes for a matching. Using clustering will reduce the size of potential elements correspondences to be considered during mapping activity, which will in turn reduce the computational workload significantly. Otherwise, all element of the class in source ontology have to be compared with all elements of the corresponding classes in target ontology. K-medoids can ably cluster attributes of each class, so that a proportion of attribute pairs that are not corresponding would not be considered when constructing the matching pattern.

Keywords: attribute correspondence, clustering, computational workload, k-medoids clustering, ontology matching

Procedia PDF Downloads 219
24175 Clustering Categorical Data Using the K-Means Algorithm and the Attribute’s Relative Frequency

Authors: Semeh Ben Salem, Sami Naouali, Moetez Sallami

Abstract:

Clustering is a well known data mining technique used in pattern recognition and information retrieval. The initial dataset to be clustered can either contain categorical or numeric data. Each type of data has its own specific clustering algorithm. In this context, two algorithms are proposed: the k-means for clustering numeric datasets and the k-modes for categorical datasets. The main encountered problem in data mining applications is clustering categorical dataset so relevant in the datasets. One main issue to achieve the clustering process on categorical values is to transform the categorical attributes into numeric measures and directly apply the k-means algorithm instead the k-modes. In this paper, it is proposed to experiment an approach based on the previous issue by transforming the categorical values into numeric ones using the relative frequency of each modality in the attributes. The proposed approach is compared with a previously method based on transforming the categorical datasets into binary values. The scalability and accuracy of the two methods are experimented. The obtained results show that our proposed method outperforms the binary method in all cases.

Keywords: clustering, unsupervised learning, pattern recognition, categorical datasets, knowledge discovery, k-means

Procedia PDF Downloads 230
24174 Exploring the Physical Environment and Building Features in Earthquake Disaster Areas

Authors: Chang Hsueh-Sheng, Chen Tzu-Ling

Abstract:

Earthquake is an unpredictable natural disaster and intensive earthquakes have caused serious impacts on social-economic system, environmental and social resilience. Conventional ways to mitigate earthquake disaster are to enhance building codes and advance structural engineering measures. However, earthquake-induced ground damage such as liquefaction, land subsidence, landslide happen on places nearby earthquake prone or poor soil condition areas. Therefore, this study uses spatial statistical analysis to explore the spatial pattern of damaged buildings. Afterwards, principle components analysis (PCA) is applied to categorize the similar features in different kinds of clustered patterns. The results show that serious landslide prone area, close to fault, vegetated ground surface and mudslide prone area are common in those highly damaged buildings. In addition, the oldest building might not be directly referred to the most vulnerable one. In fact, it seems that buildings built between 1974 and 1989 become more fragile during the earthquake. The incorporation of both spatial statistical analyses and PCA can provide more accurate information to subsidize retrofit programs to enhance earthquake resistance in particular areas.

Keywords: earthquake disaster, spatial statistic analysis, principle components analysis (pca), clustered patterns

Procedia PDF Downloads 283
24173 Clustered Regularly Interspaced Short Palindromic Repeat/cas9-Based Lateral Flow and Fluorescence Diagnostics for Rapid Pathogen Detection

Authors: Mark Osborn

Abstract:

Clustered, regularly interspaced short palindromic repeat (CRISPR/Cas) proteins can be designed to bind specified DNA and RNA sequences and hold great promise for the accurate detection of nucleic acids for diagnostics. Commercially available reagents were integrated into a CRISPR/Cas9-based lateral flow assay that can detect severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) sequences with single-base specificity. This approach requires minimal equipment and represents a simplified platform for field-based deployment. A rapid, multiplex fluorescence CRISPR/Cas9 nuclease cleavage assay capable of detecting and differentiating SARS-CoV-2, influenza A and B, and respiratory syncytial virus in a single reaction was also developed. These findings provide proof of principle for CRISPR/Cas9 point-of-care diagnosis that can detect specific SARS-CoV-2 strain(s). Further, Cas9 cleavage allows for a scalable fluorescent platform for identifying respiratory viral pathogens with overlapping symptomology. Collectively, this approach is a facile platform for diagnostics with broad application to user-defined sequence interrogation and detection.

Keywords: CRISPR/Cas9, lateral flow assay, SARS-Co-V2, single-nucleotide resolution

Procedia PDF Downloads 146
24172 Parallel Genetic Algorithms Clustering for Handling Recruitment Problem

Authors: Walid Moudani, Ahmad Shahin

Abstract:

This research presents a study to handle the recruitment services system. It aims to enhance a business intelligence system by embedding data mining in its core engine and to facilitate the link between job searchers and recruiters companies. The purpose of this study is to present an intelligent management system for supporting recruitment services based on data mining methods. It consists to apply segmentation on the extracted job postings offered by the different recruiters. The details of the job postings are associated to a set of relevant features that are extracted from the web and which are based on critical criterion in order to define consistent clusters. Thereafter, we assign the job searchers to the best cluster while providing a ranking according to the job postings of the selected cluster. The performance of the proposed model used is analyzed, based on a real case study, with the clustered job postings dataset and classified job searchers dataset by using some metrics.

Keywords: job postings, job searchers, clustering, genetic algorithms, business intelligence

Procedia PDF Downloads 300
24171 Multimodal Optimization of Density-Based Clustering Using Collective Animal Behavior Algorithm

Authors: Kristian Bautista, Ruben A. Idoy

Abstract:

A bio-inspired metaheuristic algorithm inspired by the theory of collective animal behavior (CAB) was integrated to density-based clustering modeled as multimodal optimization problem. The algorithm was tested on synthetic, Iris, Glass, Pima and Thyroid data sets in order to measure its effectiveness relative to CDE-based Clustering algorithm. Upon preliminary testing, it was found out that one of the parameter settings used was ineffective in performing clustering when applied to the algorithm prompting the researcher to do an investigation. It was revealed that fine tuning distance δ3 that determines the extent to which a given data point will be clustered helped improve the quality of cluster output. Even though the modification of distance δ3 significantly improved the solution quality and cluster output of the algorithm, results suggest that there is no difference between the population mean of the solutions obtained using the original and modified parameter setting for all data sets. This implies that using either the original or modified parameter setting will not have any effect towards obtaining the best global and local animal positions. Results also suggest that CDE-based clustering algorithm is better than CAB-density clustering algorithm for all data sets. Nevertheless, CAB-density clustering algorithm is still a good clustering algorithm because it has correctly identified the number of classes of some data sets more frequently in a thirty trial run with a much smaller standard deviation, a potential in clustering high dimensional data sets. Thus, the researcher recommends further investigation in the post-processing stage of the algorithm.

Keywords: clustering, metaheuristics, collective animal behavior algorithm, density-based clustering, multimodal optimization

Procedia PDF Downloads 197
24170 Estimating the Receiver Operating Characteristic Curve from Clustered Data and Case-Control Studies

Authors: Yalda Zarnegarnia, Shari Messinger

Abstract:

Receiver operating characteristic (ROC) curves have been widely used in medical research to illustrate the performance of the biomarker in correctly distinguishing the diseased and non-diseased groups. Correlated biomarker data arises in study designs that include subjects that contain same genetic or environmental factors. The information about correlation might help to identify family members at increased risk of disease development, and may lead to initiating treatment to slow or stop the progression to disease. Approaches appropriate to a case-control design matched by family identification, must be able to accommodate both the correlation inherent in the design in correctly estimating the biomarker’s ability to differentiate between cases and controls, as well as to handle estimation from a matched case control design. This talk will review some developed methods for ROC curve estimation in settings with correlated data from case control design and will discuss the limitations of current methods for analyzing correlated familial paired data. An alternative approach using Conditional ROC curves will be demonstrated, to provide appropriate ROC curves for correlated paired data. The proposed approach will use the information about the correlation among biomarker values, producing conditional ROC curves that evaluate the ability of a biomarker to discriminate between diseased and non-diseased subjects in a familial paired design.

Keywords: biomarker, correlation, familial paired design, ROC curve

Procedia PDF Downloads 194
24169 Design and Implementation a Platform for Adaptive Online Learning Based on Fuzzy Logic

Authors: Budoor Al Abid

Abstract:

Educational systems are increasingly provided as open online services, providing guidance and support for individual learners. To adapt the learning systems, a proper evaluation must be made. This paper builds the evaluation model Fuzzy C Means Adaptive System (FCMAS) based on data mining techniques to assess the difficulty of the questions. The following steps are implemented; first using a dataset from an online international learning system called (slepemapy.cz) the dataset contains over 1300000 records with 9 features for students, questions and answers information with feedback evaluation. Next, a normalization process as preprocessing step was applied. Then FCM clustering algorithms are used to adaptive the difficulty of the questions. The result is three cluster labeled data depending on the higher Wight (easy, Intermediate, difficult). The FCM algorithm gives a label to all the questions one by one. Then Random Forest (RF) Classifier model is constructed on the clustered dataset uses 70% of the dataset for training and 30% for testing; the result of the model is a 99.9% accuracy rate. This approach improves the Adaptive E-learning system because it depends on the student behavior and gives accurate results in the evaluation process more than the evaluation system that depends on feedback only.

Keywords: machine learning, adaptive, fuzzy logic, data mining

Procedia PDF Downloads 161
24168 The Geographic Distribution of Complementary, Alternative, and Traditional Medicine in the United States in 2018

Authors: Janis E. Campbell

Abstract:

Most of what is known about complementary, alternative or traditional medicine (CATM) in the United States today is known from either the National Health Interview Survey a cross-sectional survey with a few questions or from small cross-sectional or cohort studies with specific populations. The broad geographical distribution in CATM use or providers is not known. For this project, we used geospatial cluster analysis to determine if there were clusters of CATM provider by county in the US. In this analysis, we used the National Provider Index to determine the geographic distribution of providers in the US. Of the 215,769 CAMT providers 211,603 resided in the contiguous US: Acupuncturist (26,563); Art, Poetry, Music and Dance Therapist (2,752); Chiropractor (89,514); Doula/Midwife (3,535); Exercise (507); Homeopath (380); Massage Therapist (36,540); Mechanotherapist (1,888); Naprapath (146); Naturopath (4,782); Nutrition (42,036); Reflexologist (522); Religious (2,438). ESRI® spatial autocorrelation was used to determine if the geographic location of CATM providers were random or clustered. For global analysis, we used Getis-Ord General G and for Local Indicators of Spatial Associations with an Optimized Hot Spot Analysis using an alpha of 0.05. Overall, CATM providers were clustered with both low and high. With Chiropractors, focusing in the Midwest, religious providers having very small clusters in the central US, and other types of CAMT focused in the northwest and west coast, Colorado and New Mexico, the great lakes areas and Florida. We will discuss some of the implications of this study, including associations with health, economic, social, and political systems.

Keywords: complementary medicine, alternative medicine, geospatial, United States

Procedia PDF Downloads 122
24167 Spatial Analysis of Survival Pattern and Treatment Outcomes of Multi-Drug Resistant Tuberculosis (MDR-TB) Patients in Lagos, Nigeria

Authors: Akinsola Oluwatosin, Udofia Samuel, Odofin Mayowa

Abstract:

The study is aimed at assessing the Geographic Information System (GIS)-based spatial analysis of Survival Pattern and Treatment Outcomes of Multi-Drug Resistant Tuberculosis (MDR-TB) cases for Lagos, Nigeria, with an objective to inform priority areas for public health planning and resource allocation. Multi-drug resistant tuberculosis (MDR-TB) develops due to problems such as irregular drug supply, poor drug quality, inappropriate prescription, and poor adherence to treatment. The shapefile(s) for this study were already georeferenced to Minna datum. The patient’s information was acquired on MS Excel and later converted to . CSV file for easy processing to ArcMap from various hospitals. To superimpose the patient’s information the spatial data, the addresses was geocoded to generate the longitude and latitude of the patients. The database was used for the SQL query to the various pattern of the treatment. To show the pattern of disease spread, spatial autocorrelation analysis was used. The result was displayed in a graphical format showing the areas of dispersing, random and clustered of patients in the study area. Hot and cold spot analysis was analyzed to show high-density areas. The distance between these patients and the closest health facility was examined using the buffer analysis. The result shows that 22% of the points were successfully matched, while 15% were tied. However, the result table shows that a greater percentage of it was unmatched; this is evident in the fact that most of the streets within the State are unnamed, and then again, most of the patients are likely to supply the wrong addresses. MDR-TB patients of all age groups are concentrated within Lagos-Mainland, Shomolu, Mushin, Surulere, Oshodi-Isolo, and Ifelodun LGAs. MDR-TB patients between the age group of 30-47 years had the highest number and were identified to be about 184 in number. The outcome of patients on ART treatment revealed that a high number of patients (300) were not ART treatment while a paltry 45 patients were on ART treatment. The result shows the Z-score of the distribution is greater than 1 (>2.58), which means that the distribution is highly clustered at a significance level of 0.01.

Keywords: tuberculosis, patients, treatment, GIS, MDR-TB

Procedia PDF Downloads 121
24166 Undernutrition Among Children Below Five Years of Age in Uganda: A Deep Dive into Space and Time

Authors: Vallence Ngabo Maniragaba

Abstract:

This study aimed at examining the variations of undernutrition among children below 5 years of age in Uganda. The approach of spatial and spatiotemporal analysis helped in identifying cluster patterns, hot spots and emerging hot spots. Data from the 6 Uganda Demographic and Health Surveys spanning from 1990 to 2016 were used with the main outcome variable being undernutrition among children <5 years of age. All data that were relevant to this study were retrieved from the survey datasets and combined with the 214 shape files for the districts of Uganda to enable spatial and spatiotemporal analysis. Spatial maps with the spatial distribution of the prevalence of undernutrition, both in space and time, were generated using ArcGIS Pro version 2.8. Moran’s I, an index of spatial autocorrelation, rules out doubts of spatial randomness in order to identify spatially clustered patterns of hot or cold spot areas. Furthermore, space-time cubes were generated to establish the trend in undernutrition as well as to mirror its variations over time and across Uganda. Moreover, emerging hot spot analysis was done to help identify the patterns of undernutrition over time. The results indicate a heterogeneous distribution of undernutrition across Uganda and the same variations were also evident over time. Moran’s I index confirmed spatial clustered patterns as opposed to random distributions of undernutrition prevalence. Four hot spot areas, namely; the Karamoja, the Sebei, the West Nile and the Toro regions were significantly evident, most of the central parts of Uganda were identified as cold spot clusters, while most of Western Uganda, the Acholi and the Lango regions had no statistically significant spatial patterns by the year 2016. The spatio-temporal analysis identified the Karamoja and Sebei regions as clusters of persistent, consecutive and intensifying hot spots, West Nile region was identified as a sporadic hot spot area while the Toro region was identified with both sporadic and emerging hotspots. In conclusion, undernutrition is a silent pandemic that needs to be handled with both hands. At 31.2 percent, the prevalence is still very high and unpleasant. The distribution across the country is nonuniform with some areas such as the Karamoja, the West Nile, the Sebei and the Toro regions being epicenters of undernutrition in Uganda. Over time, the same areas have experienced and exhibited high undernutrition prevalence. Policymakers, as well as the implementers, should bear in mind the spatial variations across the country and prioritize hot spot areas in order to have efficient, timely and region-specific interventions.

Keywords: undernutrition, spatial autocorrelation, hotspots analysis, geographically weighted regressions, emerging hotspots analysis, under-fives, Uganda

Procedia PDF Downloads 48
24165 Routing and Energy Efficiency through Data Coupled Clustering in Large Scale Wireless Sensor Networks (WSNs)

Authors: Jainendra Singh, Zaheeruddin

Abstract:

A typical wireless sensor networks (WSNs) consists of several tiny and low-power sensors which use radio frequency to perform distributed sensing tasks. The longevity of wireless sensor networks (WSNs) is a major issue that impacts the application of such networks. While routing protocols are striving to save energy by acting on sensor nodes, recent studies show that network lifetime can be enhanced by further involving sink mobility. A common approach for energy efficiency is partitioning the network into clusters with correlated data, where the representative nodes simply transmit or average measurements inside the cluster. In this paper, we propose an energy- efficient homogenous clustering (EHC) technique. In this technique, the decision of each sensor is based on their residual energy and an estimate of how many of its neighboring cluster heads (CHs) will benefit from it being a CH. We, also explore the routing algorithm in clustered WSNs. We show that the proposed schemes significantly outperform current approaches in terms of packet delay, hop count and energy consumption of WSNs.

Keywords: wireless sensor network, energy efficiency, clustering, routing

Procedia PDF Downloads 234
24164 The Effect of Parents and Coaches on Preschool Children's Self-Control in Preschool Centers in District 5 of Tehran

Authors: Alieh Arasteh

Abstract:

The main purpose of this study was to investigate the effect of parents and educators on the self-control of children in pre-primary schools in District 5 of Tehran. The method of this survey was a survey and post-correlation type. The statistical population of this study included all teachers and parents of children in preschool centers in the region. The 5th city of Tehran in 1397 was the number of kindergartens in 117 centers and the number of parents was 1872, the sample size of the parents was 320 and the sample size of the trainers was 76. The method of sampling in this study was randomized and clustered. The data gathering tool was Rosenbaum and Ronen (1992) self-control skills, a five-factor questionnaire NEO personality Costa and McCrae (1985) and a questionnaire on demographic characteristics, reliability using Cronbach's alpha, the data analysis was performed using the software spss24. The results of the research showed that the personality characteristics of parents, parents' socioeconomic status and personality traits of educators affect the self-control dimensions of pre-primary school children (P <0.05).

Keywords: self-control, pre-primary school, the effect of parents, couches

Procedia PDF Downloads 145
24163 Cleaning of Scientific References in Large Patent Databases Using Rule-Based Scoring and Clustering

Authors: Emiel Caron

Abstract:

Patent databases contain patent related data, organized in a relational data model, and are used to produce various patent statistics. These databases store raw data about scientific references cited by patents. For example, Patstat holds references to tens of millions of scientific journal publications and conference proceedings. These references might be used to connect patent databases with bibliographic databases, e.g. to study to the relation between science, technology, and innovation in various domains. Problematic in such studies is the low data quality of the references, i.e. they are often ambiguous, unstructured, and incomplete. Moreover, a complete bibliographic reference is stored in only one attribute. Therefore, a computerized cleaning and disambiguation method for large patent databases is developed in this work. The method uses rule-based scoring and clustering. The rules are based on bibliographic metadata, retrieved from the raw data by regular expressions, and are transparent and adaptable. The rules in combination with string similarity measures are used to detect pairs of records that are potential duplicates. Due to the scoring, different rules can be combined, to join scientific references, i.e. the rules reinforce each other. The scores are based on expert knowledge and initial method evaluation. After the scoring, pairs of scientific references that are above a certain threshold, are clustered by means of single-linkage clustering algorithm to form connected components. The method is designed to disambiguate all the scientific references in the Patstat database. The performance evaluation of the clustering method, on a large golden set with highly cited papers, shows on average a 99% precision and a 95% recall. The method is therefore accurate but careful, i.e. it weighs precision over recall. Consequently, separate clusters of high precision are sometimes formed, when there is not enough evidence for connecting scientific references, e.g. in the case of missing year and journal information for a reference. The clusters produced by the method can be used to directly link the Patstat database with bibliographic databases as the Web of Science or Scopus.

Keywords: clustering, data cleaning, data disambiguation, data mining, patent analysis, scientometrics

Procedia PDF Downloads 167