Search results for: weighted based clustering
28401 Machine Learning Prediction of Diabetes Prevalence in the U.S. Using Demographic, Physical, and Lifestyle Indicators: A Study Based on NHANES 2009-2018
Authors: Oluwafunmibi Omotayo Fasanya, Augustine Kena Adjei
Abstract:
To develop a machine learning model to predict diabetes (DM) prevalence in the U.S. population using demographic characteristics, physical indicators, and lifestyle habits, and to analyze how these factors contribute to the likelihood of diabetes. We analyzed data from 23,546 participants aged 20 and older, who were non-pregnant, from the 2009-2018 National Health and Nutrition Examination Survey (NHANES). The dataset included key demographic (age, sex, ethnicity), physical (BMI, leg length, total cholesterol [TCHOL], fasting plasma glucose), and lifestyle indicators (smoking habits). A weighted sample was used to account for NHANES survey design features such as stratification and clustering. A classification machine learning model was trained to predict diabetes status. The target variable was binary (diabetes or non-diabetes) based on fasting plasma glucose measurements. The following models were evaluated: Logistic Regression (baseline), Random Forest Classifier, Gradient Boosting Machine (GBM), Support Vector Machine (SVM). Model performance was assessed using accuracy, F1-score, AUC-ROC, and precision-recall metrics. Feature importance was analyzed using SHAP values to interpret the contributions of variables such as age, BMI, ethnicity, and smoking status. The Gradient Boosting Machine (GBM) model outperformed other classifiers with an AUC-ROC score of 0.85. Feature importance analysis revealed the following key predictors: Age: The most significant predictor, with diabetes prevalence increasing with age, peaking around the 60s for males and 70s for females. BMI: Higher BMI was strongly associated with a higher risk of diabetes. Ethnicity: Black participants had the highest predicted prevalence of diabetes (14.6%), followed by Mexican-Americans (13.5%) and Whites (10.6%). TCHOL: Diabetics had lower total cholesterol levels, particularly among White participants (mean decline of 23.6 mg/dL). Smoking: Smoking showed a slight increase in diabetes risk among Whites (0.2%) but had a limited effect in other ethnic groups. Using machine learning models, we identified key demographic, physical, and lifestyle predictors of diabetes in the U.S. population. The results confirm that diabetes prevalence varies significantly across age, BMI, and ethnic groups, with lifestyle factors such as smoking contributing differently by ethnicity. These findings provide a basis for more targeted public health interventions and resource allocation for diabetes management.Keywords: diabetes, NHANES, random forest, gradient boosting machine, support vector machine
Procedia PDF Downloads 1128400 TMBCoI-SIOT: Trust Management System Based on the Community of Interest for the Social Internet of Things
Authors: Oumaima Ben Abderrahim, Mohamed Houcine Elhedhili, Leila Saidane
Abstract:
In this paper, we propose a trust management system based on clustering architecture for the social internet of things called TMBCO-SIOT. The proposed model integrates numerous factors such as direct and indirect trust; transaction factor; precaution factor; and social modeling of trust. The novelty of our approach can be summed up in two aspects. The first aspect concerns the architecture based on the community of interest (CoT) where each community is headed by an administrator (admin). However, the second aspect is the trust management system that tries to prevent On-Off attacks and mitigates dishonest recommendations using the k-means algorithm and guarantor things. The effectiveness of the proposed system is proved by simulation against malicious nodes.Keywords: IoT, trust management system, attacks, trust, dishonest recommendations, K-means algorithm
Procedia PDF Downloads 21228399 Fault-Detection and Self-Stabilization Protocol for Wireless Sensor Networks
Authors: Ather Saeed, Arif Khan, Jeffrey Gosper
Abstract:
Sensor devices are prone to errors and sudden node failures, which are difficult to detect in a timely manner when deployed in real-time, hazardous, large-scale harsh environments and in medical emergencies. Therefore, the loss of data can be life-threatening when the sensed phenomenon is not disseminated due to sudden node failure, battery depletion or temporary malfunctioning. We introduce a set of partial differential equations for localizing faults, similar to Green’s and Maxwell’s equations used in Electrostatics and Electromagnetism. We introduce a node organization and clustering scheme for self-stabilizing sensor networks. Green’s theorem is applied to regions where the curve is closed and continuously differentiable to ensure network connectivity. Experimental results show that the proposed GTFD (Green’s Theorem fault-detection and Self-stabilization) protocol not only detects faulty nodes but also accurately generates network stability graphs where urgent intervention is required for dynamically self-stabilizing the network.Keywords: Green’s Theorem, self-stabilization, fault-localization, RSSI, WSN, clustering
Procedia PDF Downloads 7528398 Design of Agricultural Machinery Factory Facility Layout
Authors: Nilda Tri Putri, Muhammad Taufik
Abstract:
Tools and agricultural machinery (Alsintan) is a tool used in agribusiness activities. Alsintan used to change the traditional farming systems generally use manual equipment into modern agriculture with mechanization. CV Nugraha Chakti Consultant make an action plan for industrial development Alsintan West Sumatra in 2012 to develop medium industries of Alsintan become a major industry of Alsintan, one of efforts made is increase the production capacity of the industry Alsintan. Production capacity for superior products as hydrotiller and threshers set each for 2.000 units per year. CV Citra Dragon as one of the medium industry alsintan in West Sumatra has a plan to relocate the existing plant to meet growing consumer demand each year. Increased production capacity and plant relocation plan has led to a change in the layout; therefore need to design the layout of the plant facility CV Citra Dragon. First step the to design of plant layout is design the layout of the production floor. The design of the production floor layout is done by applying group technology layout. The initial step is to do a machine grouping and part family using the Average Linkage Clustering (ALC) and Rank Order Clustering (ROC). Furthermore done independent work station design and layout design using the Modified Spanning Tree (MST). Alternative selection layout is done to select the best production floor layout between ALC and ROC cell grouping. Furthermore, to design the layout of warehouses, offices and other production support facilities. Activity Relationship Chart methods used to organize the placement of factory facilities has been designed. After structuring plan facilities, calculated cost manufacturing facility plant establishment. Type of layout is used on the production floor layout technology group. The production floor is composed of four cell machinery, assembly area and painting area. The total distance of the displacement of material in a single production amounted to 1120.16 m which means need 18,7minutes of transportation time for one time production. Alsintan Factory has designed a circular flow pattern with 11 facilities. The facilities were designed consisting of 10 rooms and 1 parking space. The measure of factory building is 84 m x 52 m.Keywords: Average Linkage Clustering (ALC), Rank Order Clustering (ROC), Modified Spanning Tree (MST), Activity Relationship Chart (ARC)
Procedia PDF Downloads 49628397 A Mixing Matrix Estimation Algorithm for Speech Signals under the Under-Determined Blind Source Separation Model
Authors: Jing Wu, Wei Lv, Yibing Li, Yuanfan You
Abstract:
The separation of speech signals has become a research hotspot in the field of signal processing in recent years. It has many applications and influences in teleconferencing, hearing aids, speech recognition of machines and so on. The sounds received are usually noisy. The issue of identifying the sounds of interest and obtaining clear sounds in such an environment becomes a problem worth exploring, that is, the problem of blind source separation. This paper focuses on the under-determined blind source separation (UBSS). Sparse component analysis is generally used for the problem of under-determined blind source separation. The method is mainly divided into two parts. Firstly, the clustering algorithm is used to estimate the mixing matrix according to the observed signals. Then the signal is separated based on the known mixing matrix. In this paper, the problem of mixing matrix estimation is studied. This paper proposes an improved algorithm to estimate the mixing matrix for speech signals in the UBSS model. The traditional potential algorithm is not accurate for the mixing matrix estimation, especially for low signal-to noise ratio (SNR).In response to this problem, this paper considers the idea of an improved potential function method to estimate the mixing matrix. The algorithm not only avoids the inuence of insufficient prior information in traditional clustering algorithm, but also improves the estimation accuracy of mixing matrix. This paper takes the mixing of four speech signals into two channels as an example. The results of simulations show that the approach in this paper not only improves the accuracy of estimation, but also applies to any mixing matrix.Keywords: DBSCAN, potential function, speech signal, the UBSS model
Procedia PDF Downloads 13528396 Identification of Watershed Landscape Character Types in Middle Yangtze River within Wuhan Metropolitan Area
Authors: Huijie Wang, Bin Zhang
Abstract:
In China, the middle reaches of the Yangtze River are well-developed, boasting a wealth of different types of watershed landscape. In this regard, landscape character assessment (LCA) can serve as a basis for protection, management and planning of trans-regional watershed landscape types. For this study, we chose the middle reaches of the Yangtze River in Wuhan metropolitan area as our study site, wherein the water system consists of rich variety in landscape types. We analyzed trans-regional data to cluster and identify types of landscape characteristics at two levels. 55 basins were analyzed as variables with topography, land cover and river system features in order to identify the watershed landscape character types. For watershed landscape, drainage density and degree of curvature were specified as special variables to directly reflect the regional differences of river system features. Then, we used the principal component analysis (PCA) method and hierarchical clustering algorithm based on the geographic information system (GIS) and statistical products and services solution (SPSS) to obtain results for clusters of watershed landscape which were divided into 8 characteristic groups. These groups highlighted watershed landscape characteristics of different river systems as well as key landscape characteristics that can serve as a basis for targeted protection of watershed landscape characteristics, thus helping to rationally develop multi-value landscape resources and promote coordinated development of trans-regions.Keywords: GIS, hierarchical clustering, landscape character, landscape typology, principal component analysis, watershed
Procedia PDF Downloads 23128395 Scheduling Jobs with Stochastic Processing Times or Due Dates on a Server to Minimize the Number of Tardy Jobs
Authors: H. M. Soroush
Abstract:
The problem of scheduling products and services for on-time deliveries is of paramount importance in today’s competitive environments. It arises in many manufacturing and service organizations where it is desirable to complete jobs (products or services) with different weights (penalties) on or before their due dates. In such environments, schedules should frequently decide whether to schedule a job based on its processing time, due-date, and the penalty for tardy delivery to improve the system performance. For example, it is common to measure the weighted number of late jobs or the percentage of on-time shipments to evaluate the performance of a semiconductor production facility or an automobile assembly line. In this paper, we address the problem of scheduling a set of jobs on a server where processing times or due-dates of jobs are random variables and fixed weights (penalties) are imposed on the jobs’ late deliveries. The goal is to find the schedule that minimizes the expected weighted number of tardy jobs. The problem is NP-hard to solve; however, we explore three scenarios of the problem wherein: (i) both processing times and due-dates are stochastic; (ii) processing times are stochastic and due-dates are deterministic; and (iii) processing times are deterministic and due-dates are stochastic. We prove that special cases of these scenarios are solvable optimally in polynomial time, and introduce efficient heuristic methods for the general cases. Our computational results show that the heuristics perform well in yielding either optimal or near optimal sequences. The results also demonstrate that the stochasticity of processing times or due-dates can affect scheduling decisions. Moreover, the proposed problem is general in the sense that its special cases reduce to some new and some classical stochastic single machine models.Keywords: number of late jobs, scheduling, single server, stochastic
Procedia PDF Downloads 49828394 Landslide Hazard Zonation Using Satellite Remote Sensing and GIS Technology
Authors: Ankit Tyagi, Reet Kamal Tiwari, Naveen James
Abstract:
Landslide is the major geo-environmental problem of Himalaya because of high ridges, steep slopes, deep valleys, and complex system of streams. They are mainly triggered by rainfall and earthquake and causing severe damage to life and property. In Uttarakhand, the Tehri reservoir rim area, which is situated in the lesser Himalaya of Garhwal hills, was selected for landslide hazard zonation (LHZ). The study utilized different types of data, including geological maps, topographic maps from the survey of India, Landsat 8, and Cartosat DEM data. This paper presents the use of a weighted overlay method in LHZ using fourteen causative factors. The various data layers generated and co-registered were slope, aspect, relative relief, soil cover, intensity of rainfall, seismic ground shaking, seismic amplification at surface level, lithology, land use/land cover (LULC), normalized difference vegetation index (NDVI), topographic wetness index (TWI), stream power index (SPI), drainage buffer and reservoir buffer. Seismic analysis is performed using peak horizontal acceleration (PHA) intensity and amplification factors in the evaluation of the landslide hazard index (LHI). Several digital image processing techniques such as topographic correction, NDVI, and supervised classification were widely used in the process of terrain factor extraction. Lithological features, LULC, drainage pattern, lineaments, and structural features are extracted using digital image processing techniques. Colour, tones, topography, and stream drainage pattern from the imageries are used to analyse geological features. Slope map, aspect map, relative relief are created by using Cartosat DEM data. DEM data is also used for the detailed drainage analysis, which includes TWI, SPI, drainage buffer, and reservoir buffer. In the weighted overlay method, the comparative importance of several causative factors obtained from experience. In this method, after multiplying the influence factor with the corresponding rating of a particular class, it is reclassified, and the LHZ map is prepared. Further, based on the land-use map developed from remote sensing images, a landslide vulnerability study for the study area is carried out and presented in this paper.Keywords: weighted overlay method, GIS, landslide hazard zonation, remote sensing
Procedia PDF Downloads 13328393 GBKMeans: A Genetic Based K-Means Applied to the Capacitated Planning of Reading Units
Authors: Anderson S. Fonseca, Italo F. S. Da Silva, Robert D. A. Santos, Mayara G. Da Silva, Pedro H. C. Vieira, Antonio M. S. Sobrinho, Victor H. B. Lemos, Petterson S. Diniz, Anselmo C. Paiva, Eliana M. G. Monteiro
Abstract:
In Brazil, the National Electric Energy Agency (ANEEL) establishes that electrical energy companies are responsible for measuring and billing their customers. Among these regulations, it’s defined that a company must bill your customers within 27-33 days. If a relocation or a change of period is required, the consumer must be notified in writing, in advance of a billing period. To make it easier to organize a workday’s measurements, these companies create a reading plan. These plans consist of grouping customers into reading groups, which are visited by an employee responsible for measuring consumption and billing. The creation process of a plan efficiently and optimally is a capacitated clustering problem with constraints related to homogeneity and compactness, that is, the employee’s working load and the geographical position of the consuming unit. This process is a work done manually by several experts who have experience in the geographic formation of the region, which takes a large number of days to complete the final planning, and because it’s human activity, there is no guarantee of finding the best optimization for planning. In this paper, the GBKMeans method presents a technique based on K-Means and genetic algorithms for creating a capacitated cluster that respects the constraints established in an efficient and balanced manner, that minimizes the cost of relocating consumer units and the time required for final planning creation. The results obtained by the presented method are compared with the current planning of a real city, showing an improvement of 54.71% in the standard deviation of working load and 11.97% in the compactness of the groups.Keywords: capacitated clustering, k-means, genetic algorithm, districting problems
Procedia PDF Downloads 19828392 Static vs. Stream Mining Trajectories Similarity Measures
Authors: Musaab Riyadh, Norwati Mustapha, Dina Riyadh
Abstract:
Trajectory similarity can be defined as the cost of transforming one trajectory into another based on certain similarity method. It is the core of numerous mining tasks such as clustering, classification, and indexing. Various approaches have been suggested to measure similarity based on the geometric and dynamic properties of trajectory, the overlapping between trajectory segments, and the confined area between entire trajectories. In this article, an evaluation of these approaches has been done based on computational cost, usage memory, accuracy, and the amount of data which is needed in advance to determine its suitability to stream mining applications. The evaluation results show that the stream mining applications support similarity methods which have low computational cost and memory, single scan on data, and free of mathematical complexity due to the high-speed generation of data.Keywords: global distance measure, local distance measure, semantic trajectory, spatial dimension, stream data mining
Procedia PDF Downloads 39628391 Genomic Prediction Reliability Using Haplotypes Defined by Different Methods
Authors: Sohyoung Won, Heebal Kim, Dajeong Lim
Abstract:
Genomic prediction is an effective way to measure the abilities of livestock for breeding based on genomic estimated breeding values, statistically predicted values from genotype data using best linear unbiased prediction (BLUP). Using haplotypes, clusters of linked single nucleotide polymorphisms (SNPs), as markers instead of individual SNPs can improve the reliability of genomic prediction since the probability of a quantitative trait loci to be in strong linkage disequilibrium (LD) with markers is higher. To efficiently use haplotypes in genomic prediction, finding optimal ways to define haplotypes is needed. In this study, 770K SNP chip data was collected from Hanwoo (Korean cattle) population consisted of 2506 cattle. Haplotypes were first defined in three different ways using 770K SNP chip data: haplotypes were defined based on 1) length of haplotypes (bp), 2) the number of SNPs, and 3) k-medoids clustering by LD. To compare the methods in parallel, haplotypes defined by all methods were set to have comparable sizes; in each method, haplotypes defined to have an average number of 5, 10, 20 or 50 SNPs were tested respectively. A modified GBLUP method using haplotype alleles as predictor variables was implemented for testing the prediction reliability of each haplotype set. Also, conventional genomic BLUP (GBLUP) method, which uses individual SNPs were tested to evaluate the performance of the haplotype sets on genomic prediction. Carcass weight was used as the phenotype for testing. As a result, using haplotypes defined by all three methods showed increased reliability compared to conventional GBLUP. There were not many differences in the reliability between different haplotype defining methods. The reliability of genomic prediction was highest when the average number of SNPs per haplotype was 20 in all three methods, implying that haplotypes including around 20 SNPs can be optimal to use as markers for genomic prediction. When the number of alleles generated by each haplotype defining methods was compared, clustering by LD generated the least number of alleles. Using haplotype alleles for genomic prediction showed better performance, suggesting improved accuracy in genomic selection. The number of predictor variables was decreased when the LD-based method was used while all three haplotype defining methods showed similar performances. This suggests that defining haplotypes based on LD can reduce computational costs and allows efficient prediction. Finding optimal ways to define haplotypes and using the haplotype alleles as markers can provide improved performance and efficiency in genomic prediction.Keywords: best linear unbiased predictor, genomic prediction, haplotype, linkage disequilibrium
Procedia PDF Downloads 14128390 Towards a Distributed Computation Platform Tailored for Educational Process Discovery and Analysis
Authors: Awatef Hicheur Cairns, Billel Gueni, Hind Hafdi, Christian Joubert, Nasser Khelifa
Abstract:
Given the ever changing needs of the job markets, education and training centers are increasingly held accountable for student success. Therefore, education and training centers have to focus on ways to streamline their offers and educational processes in order to achieve the highest level of quality in curriculum contents and managerial decisions. Educational process mining is an emerging field in the educational data mining (EDM) discipline, concerned with developing methods to discover, analyze and provide a visual representation of complete educational processes. In this paper, we present our distributed computation platform which allows different education centers and institutions to load their data and access to advanced data mining and process mining services. To achieve this, we present also a comparative study of the different clustering techniques developed in the context of process mining to partition efficiently educational traces. Our goal is to find the best strategy for distributing heavy analysis computations on many processing nodes of our platform.Keywords: educational process mining, distributed process mining, clustering, distributed platform, educational data mining, ProM
Procedia PDF Downloads 45428389 Connecting MRI Physics to Glioma Microenvironment: Comparing Simulated T2-Weighted MRI Models of Fixed and Expanding Extracellular Space
Authors: Pamela R. Jackson, Andrea Hawkins-Daarud, Cassandra R. Rickertsen, Kamala Clark-Swanson, Scott A. Whitmire, Kristin R. Swanson
Abstract:
Glioblastoma Multiforme (GBM), the most common primary brain tumor, often presents with hyperintensity on T2-weighted or T2-weighted fluid attenuated inversion recovery (T2/FLAIR) magnetic resonance imaging (MRI). This hyperintensity corresponds with vasogenic edema, however there are likely many infiltrating tumor cells within the hyperintensity as well. While MRIs do not directly indicate tumor cells, MRIs do reflect the microenvironmental water abnormalities caused by the presence of tumor cells and edema. The inherent heterogeneity and resulting MRI features of GBMs complicate assessing disease response. To understand how hyperintensity on T2/FLAIR MRI may correlate with edema in the extracellular space (ECS), a multi-compartmental MRI signal equation which takes into account tissue compartments and their associated volumes with input coming from a mathematical model of glioma growth that incorporates edema formation was explored. The reasonableness of two possible extracellular space schema was evaluated by varying the T2 of the edema compartment and calculating the possible resulting T2s in tumor and peripheral edema. In the mathematical model, gliomas were comprised of vasculature and three tumor cellular phenotypes: normoxic, hypoxic, and necrotic. Edema was characterized as fluid leaking from abnormal tumor vessels. Spatial maps of tumor cell density and edema for virtual tumors were simulated with different rates of proliferation and invasion and various ECS expansion schemes. These spatial maps were then passed into a multi-compartmental MRI signal model for generating simulated T2/FLAIR MR images. Individual compartments’ T2 values in the signal equation were either from literature or estimated and the T2 for edema specifically was varied over a wide range (200 ms – 9200 ms). T2 maps were calculated from simulated images. T2 values based on simulated images were evaluated for regions of interest (ROIs) in normal appearing white matter, tumor, and peripheral edema. The ROI T2 values were compared to T2 values reported in literature. The expanding scheme of extracellular space is had T2 values similar to the literature calculated values. The static scheme of extracellular space had a much lower T2 values and no matter what T2 was associated with edema, the intensities did not come close to literature values. Expanding the extracellular space is necessary to achieve simulated edema intensities commiserate with acquired MRIs.Keywords: extracellular space, glioblastoma multiforme, magnetic resonance imaging, mathematical modeling
Procedia PDF Downloads 23528388 Study for an Optimal Cable Connection within an Inner Grid of an Offshore Wind Farm
Authors: Je-Seok Shin, Wook-Won Kim, Jin-O Kim
Abstract:
The offshore wind farm needs to be designed carefully considering economics and reliability aspects. There are many decision-making problems for designing entire offshore wind farm, this paper focuses on an inner grid layout which means the connection between wind turbines as well as between wind turbines and an offshore substation. A methodology proposed in this paper determines the connections and the cable type for each connection section using K-clustering, minimum spanning tree and cable selection algorithms. And then, a cost evaluation is performed in terms of investment, power loss and reliability. Through the cost evaluation, an optimal layout of inner grid is determined so as to have the lowest total cost. In order to demonstrate the validity of the methodology, the case study is conducted on 240MW offshore wind farm, and the results show that it is helpful to design optimally offshore wind farm.Keywords: offshore wind farm, optimal layout, k-clustering algorithm, minimum spanning algorithm, cable type selection, power loss cost, reliability cost
Procedia PDF Downloads 38528387 Design of Low Latency Multiport Network Router on Chip
Authors: P. G. Kaviya, B. Muthupandian, R. Ganesan
Abstract:
On-chip routers typically have buffers are used input or output ports for temporarily storing packets. The buffers are consuming some router area and power. The multiple queues in parallel as in VC router. While running a traffic trace, not all input ports have incoming packets needed to be transferred. Therefore large numbers of queues are empty and others are busy in the network. So the time consumption should be high for the high traffic. Therefore using a RoShaQ, minimize the buffer area and time The RoShaQ architecture was send the input packets are travel through the shared queues at low traffic. At high load traffic the input packets are bypasses the shared queues. So the power and area consumption was reduced. A parallel cross bar architecture is proposed in this project in order to reduce the power consumption. Also a new adaptive weighted routing algorithm for 8-port router architecture is proposed in order to decrease the delay of the network on chip router. The proposed system is simulated using Modelsim and synthesized using Xilinx Project Navigator.Keywords: buffer, RoShaQ architecture, shared queue, VC router, weighted routing algorithm
Procedia PDF Downloads 54228386 An Indoor Positioning System in Wireless Sensor Networks with Measurement Delay
Authors: Pyung Soo Kim, Eung Hyuk Lee, Mun Suck Jang
Abstract:
In the current paper, an indoor positioning system is proposed with consideration of measurement delay. Firstly, an estimation filter with a measurement delay is designed for the indoor positioning mechanism under a weighted least square criterion, which utilizes only finite measurements on the most recent window. The proposed estimation filtering based scheme gives the filtered estimates for position, velocity and acceleration of moving target in real-time, while removing undesired noisy effects and preserving desired moving positions. Secondly, the proposed scheme is shown to have good inherent properties such as unbiasedness, efficiency, time-invariance, deadbeat, and robustness due to the finite memory structure. Finally, computer simulations shows that the performance of the proposed estimation filtering based scheme can outperform to the existing infinite memory filtering based mechanism.Keywords: indoor positioning system, wireless sensor networks, measurement delay
Procedia PDF Downloads 48228385 Semantic Based Analysis in Complaint Management System with Analytics
Authors: Francis Alterado, Jennifer Enriquez
Abstract:
Semantic Based Analysis in Complaint Management System with Analytics is an enhanced tool of providing complaints by the clients as well as a mechanism for Palawan Polytechnic College to gather, process, and monitor status of these complaints. The study has a mobile application that serves as a remote facility of communication between the students and the school management on the issues encountered by the student and the solution of every complaint received. In processing the complaints, text mining and clustering algorithms were utilized. Every module of the systems was tested and based on the results; these are 100% free from error before integration was done. A system testing was also done by checking the expected functionality of the system which was 100% functional. The system was tested by 10 students by forwarding complaints to 10 departments. Based on results, the students were able to submit complaints, the system was able to process accordingly by identifying to which department the complaints are intended, and the concerned department was able to give feedback on the complaint received to the student. With this, the system gained 4.7 rating which means Excellent.Keywords: technology adoption, emerging technology, issues challenges, algorithm, text mining, mobile technology
Procedia PDF Downloads 19928384 Optimal Maintenance Clustering for Rail Track Components Subject to Possession Capacity Constraints
Authors: Cuong D. Dao, Rob J.I. Basten, Andreas Hartmann
Abstract:
This paper studies the optimal maintenance planning of preventive maintenance and renewal activities for components in a single railway track when the available time for maintenance is limited. The rail-track system consists of several types of components, such as rail, ballast, and switches with different preventive maintenance and renewal intervals. To perform maintenance or renewal on the track, a train free period for maintenance, called a possession, is required. Since a major possession directly affects the regular train schedule, maintenance and renewal activities are clustered as much as possible. In a highly dense and utilized railway network, the possession time on the track is critical since the demand for train operations is very high and a long possession has a severe impact on the regular train schedule. We present an optimization model and investigate the maintenance schedules with and without the possession capacity constraint. In addition, we also integrate the social-economic cost related to the effects of the maintenance time to the variable possession cost into the optimization model. A numerical example is provided to illustrate the model.Keywords: rail-track components, maintenance, optimal clustering, possession capacity
Procedia PDF Downloads 26328383 Detecting Port Maritime Communities in Spain with Complex Network Analysis
Authors: Nicanor Garcia Alvarez, Belarmino Adenso-Diaz, Laura Calzada Infante
Abstract:
In recent years, researchers have shown an interest in modelling maritime traffic as a complex network. In this paper, we propose a bipartite weighted network to model maritime traffic and detect port maritime communities. The bipartite weighted network considers two different types of nodes. The first one represents Spanish ports, while the second one represents the countries with which there is major import/export activity. The flow among both types of nodes is modeled by weighting the volume of product transported. To illustrate the model, the data is segmented by each type of traffic. This will allow fine tuning and the creation of communities for each type of traffic and therefore finding similar ports for a specific type of traffic, which will provide decision-makers with tools to search for alliances or identify their competitors. The traffic with the greatest impact on the Spanish gross domestic product is selected, and the evolution of the communities formed by the most important ports and their differences between 2019 and 2009 will be analyzed. Finally, the set of communities formed by the ports of the Spanish port system will be inspected to determine global similarities between them, analyzing the sum of the membership of the different ports in communities formed for each type of traffic in particular.Keywords: bipartite networks, competition, infomap, maritime traffic, port communities
Procedia PDF Downloads 14928382 Language Development and Growing Spanning Trees in Children Semantic Network
Authors: Somayeh Sadat Hashemi Kamangar, Fatemeh Bakouie, Shahriar Gharibzadeh
Abstract:
In this study, we target to exploit Maximum Spanning Trees (MST) of children's semantic networks to investigate their language development. To do so, we examine the graph-theoretic properties of word-embedding networks. The networks are made of words children learn prior to the age of 30 months as the nodes and the links which are built from the cosine vector similarity of words normatively acquired by children prior to two and a half years of age. These networks are weighted graphs and the strength of each link is determined by the numerical similarities of the two words (nodes) on the sides of the link. To avoid changing the weighted networks to the binaries by setting a threshold, constructing MSTs might present a solution. MST is a unique sub-graph that connects all the nodes in such a way that the sum of all the link weights is maximized without forming cycles. MSTs as the backbone of the semantic networks are suitable to examine developmental changes in semantic network topology in children. From these trees, several parameters were calculated to characterize the developmental change in network organization. We showed that MSTs provides an elegant method sensitive to capture subtle developmental changes in semantic network organization.Keywords: maximum spanning trees, word-embedding, semantic networks, language development
Procedia PDF Downloads 14528381 Wildlife Habitat Corridor Mapping in Urban Environments: A GIS-Based Approach Using Preliminary Category Weightings
Authors: Stefan Peters, Phillip Roetman
Abstract:
The global loss of biodiversity is threatening the benefits nature provides to human populations and has become a more pressing issue than climate change and requires immediate attention. While there have been successful global agreements for environmental protection, such as the Montreal Protocol, these are rare, and we cannot rely on them solely. Thus, it is crucial to take national and local actions to support biodiversity. Australia is one of the 17 countries in the world with a high level of biodiversity, and its cities are vital habitats for endangered species, with more of them found in urban areas than in non-urban ones. However, the protection of biodiversity in metropolitan Adelaide has been inadequate, with over 130 species disappearing since European colonization in 1836. In this research project we conceptualized, developed and implemented a framework for wildlife Habitat Hotspots and Habitat Corridor modelling in an urban context using geographic data and GIS modelling and analysis. We used detailed topographic and other geographic data provided by a local council, including spatial and attributive properties of trees, parcels, water features, vegetated areas, roads, verges, traffic, and census data. Weighted factors considered in our raster-based Habitat Hotspot model include parcel size, parcel shape, population density, canopy cover, habitat quality and proximity to habitats and water features. Weighted factors considered in our raster-based Habitat Corridor model include habitat potential (resulting from the Habitat Hotspot model), verge size, road hierarchy, road widths, human density, and presence of remnant indigenous vegetation species. We developed a GIS model, using Python scripting and ArcGIS-Pro Model-Builder, to establish an automated reproducible and adjustable geoprocessing workflow, adaptable to any study area of interest. Our habitat hotspot and corridor modelling framework allow to determine and map existing habitat hotspots and wildlife habitat corridors. Our research had been applied to the study case of Burnside, a local council in Adelaide, Australia, which encompass an area of 30 km2. We applied end-user expertise-based category weightings to refine our models and optimize the use of our habitat map outputs towards informing local strategic decision-making.Keywords: biodiversity, GIS modeling, habitat hotspot, wildlife corridor
Procedia PDF Downloads 11528380 Pure Scalar Equilibria for Normal-Form Games
Authors: Herbert W. Corley
Abstract:
A scalar equilibrium (SE) is an alternative type of equilibrium in pure strategies for an n-person normal-form game G. It is defined using optimization techniques to obtain a pure strategy for each player of G by maximizing an appropriate utility function over the acceptable joint actions. The players’ actions are determined by the choice of the utility function. Such a utility function could be agreed upon by the players or chosen by an arbitrator. An SE is an equilibrium since no players of G can increase the value of this utility function by changing their strategies. SEs are formally defined, and examples are given. In a greedy SE, the goal is to assign actions to the players giving them the largest individual payoffs jointly possible. In a weighted SE, each player is assigned weights modeling the degree to which he helps every player, including himself, achieve as large a payoff as jointly possible. In a compromise SE, each player wants a fair payoff for a reasonable interpretation of fairness. In a parity SE, the players want their payoffs to be as nearly equal as jointly possible. Finally, a satisficing SE achieves a personal target payoff value for each player. The vector payoffs associated with each of these SEs are shown to be Pareto optimal among all such acceptable vectors, as well as computationally tractable.Keywords: compromise equilibrium, greedy equilibrium, normal-form game, parity equilibrium, pure strategies, satisficing equilibrium, scalar equilibria, utility function, weighted equilibrium
Procedia PDF Downloads 11328379 New Hardy Type Inequalities of Two-Dimensional on Time Scales via Steklov Operator
Authors: Wedad Albalawi
Abstract:
The mathematical inequalities have been the core of mathematical study and used in almost all branches of mathematics as well in various areas of science and engineering. The inequalities by Hardy, Littlewood and Polya were the first significant composition of several science. This work presents fundamental ideas, results and techniques, and it has had much influence on research in various branches of analysis. Since 1934, various inequalities have been produced and studied in the literature. Furthermore, some inequalities have been formulated by some operators; in 1989, weighted Hardy inequalities have been obtained for integration operators. Then, they obtained weighted estimates for Steklov operators that were used in the solution of the Cauchy problem for the wave equation. They were improved upon in 2011 to include the boundedness of integral operators from the weighted Sobolev space to the weighted Lebesgue space. Some inequalities have been demonstrated and improved using the Hardy–Steklov operator. Recently, a lot of integral inequalities have been improved by differential operators. Hardy inequality has been one of the tools that is used to consider integrity solutions of differential equations. Then, dynamic inequalities of Hardy and Coposon have been extended and improved by various integral operators. These inequalities would be interesting to apply in different fields of mathematics (functional spaces, partial differential equations, mathematical modeling). Some inequalities have been appeared involving Copson and Hardy inequalities on time scales to obtain new special version of them. A time scale is an arbitrary nonempty closed subset of the real numbers. Then, the dynamic inequalities on time scales have received a lot of attention in the literature and has become a major field in pure and applied mathematics. There are many applications of dynamic equations on time scales to quantum mechanics, electrical engineering, neural networks, heat transfer, combinatorics, and population dynamics. This study focuses on Hardy and Coposon inequalities, using Steklov operator on time scale in double integrals to obtain special cases of time-scale inequalities of Hardy and Copson on high dimensions. The advantage of this study is that it uses the one-dimensional classical Hardy inequality to obtain higher dimensional on time scale versions that will be applied in the solution of the Cauchy problem for the wave equation. In addition, the obtained inequalities have various applications involving discontinuous domains such as bug populations, phytoremediation of metals, wound healing, maximization problems. The proof can be done by introducing restriction on the operator in several cases. The concepts in time scale version such as time scales calculus will be used that allows to unify and extend many problems from the theories of differential and of difference equations. In addition, using chain rule, and some properties of multiple integrals on time scales, some theorems of Fubini and the inequality of H¨older.Keywords: time scales, inequality of hardy, inequality of coposon, steklov operator
Procedia PDF Downloads 9628378 Local Directional Encoded Derivative Binary Pattern Based Coral Image Classification Using Weighted Distance Gray Wolf Optimization Algorithm
Authors: Annalakshmi G., Sakthivel Murugan S.
Abstract:
This paper presents a local directional encoded derivative binary pattern (LDEDBP) feature extraction method that can be applied for the classification of submarine coral reef images. The classification of coral reef images using texture features is difficult due to the dissimilarities in class samples. In coral reef image classification, texture features are extracted using the proposed method called local directional encoded derivative binary pattern (LDEDBP). The proposed approach extracts the complete structural arrangement of the local region using local binary batten (LBP) and also extracts the edge information using local directional pattern (LDP) from the edge response available in a particular region, thereby achieving extra discriminative feature value. Typically the LDP extracts the edge details in all eight directions. The process of integrating edge responses along with the local binary pattern achieves a more robust texture descriptor than the other descriptors used in texture feature extraction methods. Finally, the proposed technique is applied to an extreme learning machine (ELM) method with a meta-heuristic algorithm known as weighted distance grey wolf optimizer (GWO) to optimize the input weight and biases of single-hidden-layer feed-forward neural networks (SLFN). In the empirical results, ELM-WDGWO demonstrated their better performance in terms of accuracy on all coral datasets, namely RSMAS, EILAT, EILAT2, and MLC, compared with other state-of-the-art algorithms. The proposed method achieves the highest overall classification accuracy of 94% compared to the other state of art methods.Keywords: feature extraction, local directional pattern, ELM classifier, GWO optimization
Procedia PDF Downloads 16328377 An Inquiry of the Impact of Flood Risk on Housing Market with Enhanced Geographically Weighted Regression
Authors: Lin-Han Chiang Hsieh, Hsiao-Yi Lin
Abstract:
This study aims to determine the impact of the disclosure of flood potential map on housing prices. The disclosure is supposed to mitigate the market failure by reducing information asymmetry. On the other hand, opponents argue that the official disclosure of simulated results will only create unnecessary disturbances on the housing market. This study identifies the impact of the disclosure of the flood potential map by comparing the hedonic price of flood potential before and after the disclosure. The flood potential map used in this study is published by Taipei municipal government in 2015, which is a result of a comprehensive simulation based on geographical, hydrological, and meteorological factors. The residential property sales data of 2013 to 2016 is used in this study, which is collected from the actual sales price registration system by the Department of Land Administration (DLA). The result shows that the impact of flood potential on residential real estate market is statistically significant both before and after the disclosure. But the trend is clearer after the disclosure, suggesting that the disclosure does have an impact on the market. Also, the result shows that the impact of flood potential differs by the severity and frequency of precipitation. The negative impact for a relatively mild, high frequency flood potential is stronger than that for a heavy, low possibility flood potential. The result indicates that home buyers are of more concern to the frequency, than the intensity of flood. Another contribution of this study is in the methodological perspective. The classic hedonic price analysis with OLS regression suffers from two spatial problems: the endogeneity problem caused by omitted spatial-related variables, and the heterogeneity concern to the presumption that regression coefficients are spatially constant. These two problems are seldom considered in a single model. This study tries to deal with the endogeneity and heterogeneity problem together by combining the spatial fixed-effect model and geographically weighted regression (GWR). A series of literature indicates that the hedonic price of certain environmental assets varies spatially by applying GWR. Since the endogeneity problem is usually not considered in typical GWR models, it is arguable that the omitted spatial-related variables might bias the result of GWR models. By combing the spatial fixed-effect model and GWR, this study concludes that the effect of flood potential map is highly sensitive by location, even after controlling for the spatial autocorrelation at the same time. The main policy application of this result is that it is improper to determine the potential benefit of flood prevention policy by simply multiplying the hedonic price of flood risk by the number of houses. The effect of flood prevention might vary dramatically by location.Keywords: flood potential, hedonic price analysis, endogeneity, heterogeneity, geographically-weighted regression
Procedia PDF Downloads 29028376 An Energy-Balanced Clustering Method on Wireless Sensor Networks
Authors: Yu-Ting Tsai, Chiun-Chieh Hsu, Yu-Chun Chu
Abstract:
In recent years, due to the development of wireless network technology, many researchers have devoted to the study of wireless sensor networks. The applications of wireless sensor network mainly use the sensor nodes to collect the required information, and send the information back to the users. Since the sensed area is difficult to reach, there are many restrictions on the design of the sensor nodes, where the most important restriction is the limited energy of sensor nodes. Because of the limited energy, researchers proposed a number of ways to reduce energy consumption and balance the load of sensor nodes in order to increase the network lifetime. In this paper, we proposed the Energy-Balanced Clustering method with Auxiliary Members on Wireless Sensor Networks(EBCAM)based on the cluster routing. The main purpose is to balance the energy consumption on the sensed area and average the distribution of dead nodes in order to avoid excessive energy consumption because of the increasing in transmission distance. In addition, we use the residual energy and average energy consumption of the nodes within the cluster to choose the cluster heads, use the multi hop transmission method to deliver the data, and dynamically adjust the transmission radius according to the load conditions. Finally, we use the auxiliary cluster members to change the delivering path according to the residual energy of the cluster head in order to its load. Finally, we compare the proposed method with the related algorithms via simulated experiments and then analyze the results. It reveals that the proposed method outperforms other algorithms in the numbers of used rounds and the average energy consumption.Keywords: auxiliary nodes, cluster, load balance, routing algorithm, wireless sensor network
Procedia PDF Downloads 27428375 Building User Behavioral Models by Processing Web Logs and Clustering Mechanisms
Authors: Madhuka G. P. D. Udantha, Gihan V. Dias, Surangika Ranathunga
Abstract:
Today Websites contain very interesting applications. But there are only few methodologies to analyze User navigations through the Websites and formulating if the Website is put to correct use. The web logs are only used if some major attack or malfunctioning occurs. Web Logs contain lot interesting dealings on users in the system. Analyzing web logs has become a challenge due to the huge log volume. Finding interesting patterns is not as easy as it is due to size, distribution and importance of minor details of each log. Web logs contain very important data of user and site which are not been put to good use. Retrieving interesting information from logs gives an idea of what the users need, group users according to their various needs and improve site to build an effective and efficient site. The model we built is able to detect attacks or malfunctioning of the system and anomaly detection. Logs will be more complex as volume of traffic and the size and complexity of web site grows. Unsupervised techniques are used in this solution which is fully automated. Expert knowledge is only used in validation. In our approach first clean and purify the logs to bring them to a common platform with a standard format and structure. After cleaning module web session builder is executed. It outputs two files, Web Sessions file and Indexed URLs file. The Indexed URLs file contains the list of URLs accessed and their indices. Web Sessions file lists down the indices of each web session. Then DBSCAN and EM Algorithms are used iteratively and recursively to get the best clustering results of the web sessions. Using homogeneity, completeness, V-measure, intra and inter cluster distance and silhouette coefficient as parameters these algorithms self-evaluate themselves to input better parametric values to run the algorithms. If a cluster is found to be too large then micro-clustering is used. Using Cluster Signature Module the clusters are annotated with a unique signature called finger-print. In this module each cluster is fed to Associative Rule Learning Module. If it outputs confidence and support as value 1 for an access sequence it would be a potential signature for the cluster. Then the access sequence occurrences are checked in other clusters. If it is found to be unique for the cluster considered then the cluster is annotated with the signature. These signatures are used in anomaly detection, prevent cyber attacks, real-time dashboards that visualize users, accessing web pages, predict actions of users and various other applications in Finance, University Websites, News and Media Websites etc.Keywords: anomaly detection, clustering, pattern recognition, web sessions
Procedia PDF Downloads 28828374 Optimized Cluster Head Selection Algorithm Based on LEACH Protocol for Wireless Sensor Networks
Authors: Wided Abidi, Tahar Ezzedine
Abstract:
Low-Energy Adaptive Clustering Hierarchy (LEACH) has been considered as one of the effective hierarchical routing algorithms that optimize energy and prolong the lifetime of network. Since the selection of Cluster Head (CH) in LEACH is carried out randomly, in this paper, we propose an approach of electing CH based on LEACH protocol. In other words, we present a formula for calculating the threshold responsible for CH election. In fact, we adopt three principle criteria: the remaining energy of node, the number of neighbors within cluster range and the distance between node and CH. Simulation results show that our proposed approach beats LEACH protocol in regards of prolonging the lifetime of network and saving residual energy.Keywords: wireless sensors networks, LEACH protocol, cluster head election, energy efficiency
Procedia PDF Downloads 33028373 High-Capacity Image Steganography using Wavelet-based Fusion on Deep Convolutional Neural Networks
Authors: Amal Khalifa, Nicolas Vana Santos
Abstract:
Steganography has been known for centuries as an efficient approach for covert communication. Due to its popularity and ease of access, image steganography has attracted researchers to find secure techniques for hiding information within an innocent looking cover image. In this research, we propose a novel deep-learning approach to digital image steganography. The proposed method, DeepWaveletFusion, uses convolutional neural networks (CNN) to hide a secret image into a cover image of the same size. Two CNNs are trained back-to-back to merge the Discrete Wavelet Transform (DWT) of both colored images and eventually be able to blindly extract the hidden image. Based on two different image similarity metrics, a weighted gain function is used to guide the learning process and maximize the quality of the retrieved secret image and yet maintaining acceptable imperceptibility. Experimental results verified the high recoverability of DeepWaveletFusion which outperformed similar deep-learning-based methods.Keywords: deep learning, steganography, image, discrete wavelet transform, fusion
Procedia PDF Downloads 9228372 Integrating Molecular Approaches to Understand Diatom Assemblages in Marine Environment
Authors: Shruti Malviya, Chris Bowler
Abstract:
Environmental processes acting at multiple spatial scales control marine diatom community structure. However, the contribution of local factors (e.g., temperature, salinity, etc.) in these highly complex systems is poorly understood. We, therefore, investigated the diatom community organization as a function of environmental predictors and determined the relative contribution of various environmental factors on the structure of marine diatoms assemblages in the world’s ocean. The dataset for this study was derived from the Tara Oceans expedition, constituting 46 sampling stations from diverse oceanic provinces. The V9 hypervariable region of 18s rDNA was organized into assemblages based on their distributional co-occurrence. Using Ward’s hierarchical clustering, nine clusters were defined. The number of ribotypes and reads varied within each cluster-three clusters (II, VIII and IX) contained only a few reads whereas two of them (I and IV) were highly abundant. Of the nine clusters, seven can be divided into two categories defined by a positive correlation with phosphate and nitrate and a negative correlation with longitude and, the other by a negative correlation with salinity, temperature, latitude and positive correlation with Lyapunov exponent. All the clusters were found to be remarkably dominant in South Pacific Ocean and can be placed into three classes, namely Southern Ocean-South Pacific Ocean clusters (I, II, V, VIII, IX), South Pacific Ocean clusters (IV and VII), and cosmopolitan clusters (III and VI). Our findings showed that co-occurring ribotypes can be significantly associated into recognizable clusters which exhibit a distinct response to environmental variables. This study, thus, demonstrated distinct behavior of each recognized assemblage displaying a taxonomic and environmental signature.Keywords: assemblage, diatoms, hierarchical clustering, Tara Oceans
Procedia PDF Downloads 202