Search results for: data sets
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 24744

Search results for: data sets

24654 Rank of Semigroup: Generating Sets and Cases Revealing Limitations of the Concept of Independence

Authors: Zsolt Lipcsey, Sampson Marshal Imeh

Abstract:

We investigate a certain characterisation for rank of a semigroup by Howie and Ribeiro (1999), to ascertain the relevance of the concept of independence. There are cases where the concept of independence fails to be useful for this purpose. One would expect the basic element to be the maximal independent subset of a given semigroup. However, we construct examples for semigroups where finite basis exist and the basis is larger than the number of independent elements.

Keywords: generating sets, independent set, rank, cyclic semigroup, basis, commutative

Procedia PDF Downloads 165
24653 An Extended Inverse Pareto Distribution, with Applications

Authors: Abdel Hadi Ebraheim

Abstract:

This paper introduces a new extension of the Inverse Pareto distribution in the framework of Marshal-Olkin (1997) family of distributions. This model is capable of modeling various shapes of aging and failure data. The statistical properties of the new model are discussed. Several methods are used to estimate the parameters involved. Explicit expressions are derived for different types of moments of value in reliability analysis are obtained. Besides, the order statistics of samples from the new proposed model have been studied. Finally, the usefulness of the new model for modeling reliability data is illustrated using two real data sets with simulation study.

Keywords: pareto distribution, marshal-Olkin, reliability, hazard functions, moments, estimation

Procedia PDF Downloads 51
24652 Investigating the Effects of Data Transformations on a Bi-Dimensional Chi-Square Test

Authors: Alexandru George Vaduva, Adriana Vlad, Bogdan Badea

Abstract:

In this research, we conduct a Monte Carlo analysis on a two-dimensional χ2 test, which is used to determine the minimum distance required for independent sampling in the context of chaotic signals. We investigate the impact of transforming initial data sets from any probability distribution to new signals with a uniform distribution using the Spearman rank correlation on the χ2 test. This transformation removes the randomness of the data pairs, and as a result, the observed distribution of χ2 test values differs from the expected distribution. We propose a solution to this problem and evaluate it using another chaotic signal.

Keywords: chaotic signals, logistic map, Pearson’s test, Chi Square test, bivariate distribution, statistical independence

Procedia PDF Downloads 58
24651 A Fermatean Fuzzy MAIRCA Approach for Maintenance Strategy Selection of Process Plant Gearbox Using Sustainability Criteria

Authors: Soumava Boral, Sanjay K. Chaturvedi, Ian Howard, Kristoffer McKee, V. N. A. Naikan

Abstract:

Due to strict regulations from government to enhance the possibilities of sustainability practices in industries, and noting the advances in sustainable manufacturing practices, it is necessary that the associated processes are also sustainable. Maintenance of large scale and complex machines is a pivotal task to maintain the uninterrupted flow of manufacturing processes. Appropriate maintenance practices can prolong the lifetime of machines, and prevent associated breakdowns, which subsequently reduces different cost heads. Selection of the best maintenance strategies for such machines are considered as a burdensome task, as they require the consideration of multiple technical criteria, complex mathematical calculations, previous fault data, maintenance records, etc. In the era of the fourth industrial revolution, organizations are rapidly changing their way of business, and they are giving their utmost importance to sensor technologies, artificial intelligence, data analytics, automations, etc. In this work, the effectiveness of several maintenance strategies (e.g., preventive, failure-based, reliability centered, condition based, total productive maintenance, etc.) related to a large scale and complex gearbox, operating in a steel processing plant is evaluated in terms of economic, social, environmental and technical criteria. As it is not possible to obtain/describe some criteria by exact numerical values, these criteria are evaluated linguistically by cross-functional experts. Fuzzy sets are potential soft-computing technique, which has been useful to deal with linguistic data and to provide inferences in many complex situations. To prioritize different maintenance practices based on the identified sustainable criteria, multi-criteria decision making (MCDM) approaches can be considered as potential tools. Multi-Attributive Ideal Real Comparative Analysis (MAIRCA) is a recent addition in the MCDM family and has proven its superiority over some well-known MCDM approaches, like TOPSIS (Technique for Order Preference by Similarity to Ideal Solution) and ELECTRE (ELimination Et Choix Traduisant la REalité). It has a simple but robust mathematical approach, which is easy to comprehend. On the other side, due to some inherent drawbacks of Intuitionistic Fuzzy Sets (IFS) and Pythagorean Fuzzy Sets (PFS), recently, the use of Fermatean Fuzzy Sets (FFSs) has been proposed. In this work, we propose the novel concept of FF-MAIRCA. We obtain the weights of the criteria by experts’ evaluation and use them to prioritize the different maintenance practices according to their suitability by FF-MAIRCA approach. Finally, a sensitivity analysis is carried out to highlight the robustness of the approach.

Keywords: Fermatean fuzzy sets, Fermatean fuzzy MAIRCA, maintenance strategy selection, sustainable manufacturing, MCDM

Procedia PDF Downloads 112
24650 A Neural Network Based Clustering Approach for Imputing Multivariate Values in Big Data

Authors: S. Nickolas, Shobha K.

Abstract:

The treatment of incomplete data is an important step in the data pre-processing. Missing values creates a noisy environment in all applications and it is an unavoidable problem in big data management and analysis. Numerous techniques likes discarding rows with missing values, mean imputation, expectation maximization, neural networks with evolutionary algorithms or optimized techniques and hot deck imputation have been introduced by researchers for handling missing data. Among these, imputation techniques plays a positive role in filling missing values when it is necessary to use all records in the data and not to discard records with missing values. In this paper we propose a novel artificial neural network based clustering algorithm, Adaptive Resonance Theory-2(ART2) for imputation of missing values in mixed attribute data sets. The process of ART2 can recognize learned models fast and be adapted to new objects rapidly. It carries out model-based clustering by using competitive learning and self-steady mechanism in dynamic environment without supervision. The proposed approach not only imputes the missing values but also provides information about handling the outliers.

Keywords: ART2, data imputation, clustering, missing data, neural network, pre-processing

Procedia PDF Downloads 248
24649 A Deterministic Large Deviation Model Based on Complex N-Body Systems

Authors: David C. Ni

Abstract:

In the previous efforts, we constructed N-Body Systems by an extended Blaschke product (EBP), which represents a non-temporal and nonlinear extension of Lorentz transformation. In this construction, we rely only on two parameters, nonlinear degree, and relative momentum to characterize the systems. We further explored root computation via iteration with an algorithm extended from Jenkins-Traub method. The solution sets demonstrate a form of σ+ i [-t, t], where σ and t are the real numbers, and the [-t, t] shows various canonical distributions. In this paper, we correlate the convergent sets in the original domain with solution sets, which demonstrating large-deviation distributions in the codomain. We proceed to compare our approach with the formula or principles, such as Donsker-Varadhan and Wentzell-Freidlin theories. The deterministic model based on this construction allows us to explore applications in the areas of finance and statistical mechanics.

Keywords: nonlinear Lorentz transformation, Blaschke equation, iteration solutions, root computation, large deviation distribution, deterministic model

Procedia PDF Downloads 365
24648 Linguistic Features for Sentence Difficulty Prediction in Aspect-Based Sentiment Analysis

Authors: Adrian-Gabriel Chifu, Sebastien Fournier

Abstract:

One of the challenges of natural language understanding is to deal with the subjectivity of sentences, which may express opinions and emotions that add layers of complexity and nuance. Sentiment analysis is a field that aims to extract and analyze these subjective elements from text, and it can be applied at different levels of granularity, such as document, paragraph, sentence, or aspect. Aspect-based sentiment analysis is a well-studied topic with many available data sets and models. However, there is no clear definition of what makes a sentence difficult for aspect-based sentiment analysis. In this paper, we explore this question by conducting an experiment with three data sets: ”Laptops”, ”Restaurants”, and ”MTSC” (Multi-Target-dependent Sentiment Classification), and a merged version of these three datasets. We study the impact of domain diversity and syntactic diversity on difficulty. We use a combination of classifiers to identify the most difficult sentences and analyze their characteristics. We employ two ways of defining sentence difficulty. The first one is binary and labels a sentence as difficult if the classifiers fail to correctly predict the sentiment polarity. The second one is a six-level scale based on how many of the top five best-performing classifiers can correctly predict the sentiment polarity. We also define 9 linguistic features that, combined, aim at estimating the difficulty at sentence level.

Keywords: sentiment analysis, difficulty, classification, machine learning

Procedia PDF Downloads 46
24647 Model of Optimal Centroids Approach for Multivariate Data Classification

Authors: Pham Van Nha, Le Cam Binh

Abstract:

Particle swarm optimization (PSO) is a population-based stochastic optimization algorithm. PSO was inspired by the natural behavior of birds and fish in migration and foraging for food. PSO is considered as a multidisciplinary optimization model that can be applied in various optimization problems. PSO’s ideas are simple and easy to understand but PSO is only applied in simple model problems. We think that in order to expand the applicability of PSO in complex problems, PSO should be described more explicitly in the form of a mathematical model. In this paper, we represent PSO in a mathematical model and apply in the multivariate data classification. First, PSOs general mathematical model (MPSO) is analyzed as a universal optimization model. Then, Model of Optimal Centroids (MOC) is proposed for the multivariate data classification. Experiments were conducted on some benchmark data sets to prove the effectiveness of MOC compared with several proposed schemes.

Keywords: analysis of optimization, artificial intelligence based optimization, optimization for learning and data analysis, global optimization

Procedia PDF Downloads 180
24646 The Interplay between Autophagy and Macrophages' Polarization in Wound Healing: A Genetic Regulatory Network Analysis

Authors: Mayada Mazher, Ahmed Moustafa, Ahmed Abdellatif

Abstract:

Background: Autophagy is a eukaryotic, highly conserved catabolic process implicated in many pathophysiologies such as wound healing. Autophagy-associated genes serve as a scaffolding platform for signal transduction of macrophage polarization during the inflammatory phase of wound healing and tissue repair process. In the current study, we report a model for the interplay between autophagy-associated genes and macrophages polarization associated genes. Methods: In silico analysis was performed on 249 autophagy-related genes retrieved from the public autophagy database and gene expression data retrieved from Gene Expression Omnibus (GEO); GSE81922 and GSE69607 microarray data macrophages polarization 199 DEGS. An integrated protein-protein interaction network was constructed for autophagy and macrophage gene sets. The gene sets were then used for GO terms pathway enrichment analysis. Common transcription factors for autophagy and macrophages' polarization were identified. Finally, microRNAs enriched in both autophagy and macrophages were predicated. Results: In silico prediction of common transcription factors in DEGs macrophages and autophagy gene sets revealed a new role for the transcription factors, HOMEZ, GABPA, ELK1 and REL, that commonly regulate macrophages associated genes: IL6,IL1M, IL1B, NOS1, SOC3 and autophagy-related genes: Atg12, Rictor, Rb1cc1, Gaparab1, Atg16l1. Conclusions: Autophagy and macrophages' polarization are interdependent cellular processes, and both autophagy-related proteins and macrophages' polarization related proteins coordinate in tissue remodelling via transcription factors and microRNAs regulatory network. The current work highlights a potential new role for transcription factors HOMEZ, GABPA, ELK1 and REL in wound healing.

Keywords: autophagy related proteins, integrated network analysis, macrophages polarization M1 and M2, tissue remodelling

Procedia PDF Downloads 120
24645 Approximation of Convex Set by Compactly Semidefinite Representable Set

Authors: Anusuya Ghosh, Vishnu Narayanan

Abstract:

The approximation of convex set by semidefinite representable set plays an important role in semidefinite programming, especially in modern convex optimization. To optimize a linear function over a convex set is a hard problem. But optimizing the linear function over the semidefinite representable set which approximates the convex set is easy to solve as there exists numerous efficient algorithms to solve semidefinite programming problems. So, our approximation technique is significant in optimization. We develop a technique to approximate any closed convex set, say K by compactly semidefinite representable set. Further we prove that there exists a sequence of compactly semidefinite representable sets which give tighter approximation of the closed convex set, K gradually. We discuss about the convergence of the sequence of compactly semidefinite representable sets to closed convex set K. The recession cone of K and the recession cone of the compactly semidefinite representable set are equal. So, we say that the sequence of compactly semidefinite representable sets converge strongly to the closed convex set. Thus, this approximation technique is very useful development in semidefinite programming.

Keywords: semidefinite programming, semidefinite representable set, compactly semidefinite representable set, approximation

Procedia PDF Downloads 353
24644 Minimizing Mutant Sets by Equivalence and Subsumption

Authors: Samia Alblwi, Amani Ayad

Abstract:

Mutation testing is the art of generating syntactic variations of a base program and checking whether a candidate test suite can identify all the mutants that are not semantically equivalent to the base: this technique is widely used by researchers to select quality test suites. One of the main obstacles to the widespread use of mutation testing is cost: even small pro-grams (a few dozen lines of code) can give rise to a large number of mutants (up to hundreds): this has created an incentive to seek to reduce the number of mutants while preserving their collective effectiveness. Two criteria have been used to reduce the size of mutant sets: equiva-lence, which aims to partition the set of mutants into equivalence classes modulo semantic equivalence, and selecting one representative per class; subsumption, which aims to define a partial ordering among mutants that ranks mutants by effectiveness and seeks to select maximal elements in this ordering. In this paper we analyze these two policies using analytical and em-pirical criteria.

Keywords: mutation testing, mutant sets, mutant equivalence, mutant subsumption, mutant set minimization

Procedia PDF Downloads 35
24643 Forecasting Free Cash Flow of an Industrial Enterprise Using Fuzzy Set Tools

Authors: Elena Tkachenko, Elena Rogova, Daria Koval

Abstract:

The paper examines the ways of cash flows forecasting in the dynamic external environment. The so-called new reality in economy lowers the predictability of the companies’ performance indicators due to the lack of long-term steady trends in external conditions of development and fast changes in the markets. The traditional methods based on the trend analysis lead to a very high error of approximation. The macroeconomic situation for the last 10 years is defined by continuous consequences of financial crisis and arising of another one. In these conditions, the instruments of forecasting on the basis of fuzzy sets show good results. The fuzzy sets based models turn out to lower the error of approximation to acceptable level and to provide the companies with reliable cash flows estimation that helps to reach the financial stability. In the paper, the applicability of the model of cash flows forecasting based on fuzzy logic was analyzed.

Keywords: cash flow, industrial enterprise, forecasting, fuzzy sets

Procedia PDF Downloads 172
24642 A New Approach for Improving Accuracy of Multi Label Stream Data

Authors: Kunal Shah, Swati Patel

Abstract:

Many real world problems involve data which can be considered as multi-label data streams. Efficient methods exist for multi-label classification in non streaming scenarios. However, learning in evolving streaming scenarios is more challenging, as the learners must be able to adapt to change using limited time and memory. Classification is used to predict class of unseen instance as accurate as possible. Multi label classification is a variant of single label classification where set of labels associated with single instance. Multi label classification is used by modern applications, such as text classification, functional genomics, image classification, music categorization etc. This paper introduces the task of multi-label classification, methods for multi-label classification and evolution measure for multi-label classification. Also, comparative analysis of multi label classification methods on the basis of theoretical study, and then on the basis of simulation was done on various data sets.

Keywords: binary relevance, concept drift, data stream mining, MLSC, multiple window with buffer

Procedia PDF Downloads 559
24641 From Text to Data: Sentiment Analysis of Presidential Election Political Forums

Authors: Sergio V Davalos, Alison L. Watkins

Abstract:

User generated content (UGC) such as website post has data associated with it: time of the post, gender, location, type of device, and number of words. The text entered in user generated content (UGC) can provide a valuable dimension for analysis. In this research, each user post is treated as a collection of terms (words). In addition to the number of words per post, the frequency of each term is determined by post and by the sum of occurrences in all posts. This research focuses on one specific aspect of UGC: sentiment. Sentiment analysis (SA) was applied to the content (user posts) of two sets of political forums related to the US presidential elections for 2012 and 2016. Sentiment analysis results in deriving data from the text. This enables the subsequent application of data analytic methods. The SASA (SAIL/SAI Sentiment Analyzer) model was used for sentiment analysis. The application of SASA resulted with a sentiment score for each post. Based on the sentiment scores for the posts there are significant differences between the content and sentiment of the two sets for the 2012 and 2016 presidential election forums. In the 2012 forums, 38% of the forums started with positive sentiment and 16% with negative sentiment. In the 2016 forums, 29% started with positive sentiment and 15% with negative sentiment. There also were changes in sentiment over time. For both elections as the election got closer, the cumulative sentiment score became negative. The candidate who won each election was in the more posts than the losing candidates. In the case of Trump, there were more negative posts than Clinton’s highest number of posts which were positive. KNIME topic modeling was used to derive topics from the posts. There were also changes in topics and keyword emphasis over time. Initially, the political parties were the most referenced and as the election got closer the emphasis changed to the candidates. The performance of the SASA method proved to predict sentiment better than four other methods in Sentibench. The research resulted in deriving sentiment data from text. In combination with other data, the sentiment data provided insight and discovery about user sentiment in the US presidential elections for 2012 and 2016.

Keywords: sentiment analysis, text mining, user generated content, US presidential elections

Procedia PDF Downloads 160
24640 Assessing the Impact of Climate Change on Pulses Production in Khyber Pakhtunkhwa, Pakistan

Authors: Khuram Nawaz Sadozai, Rizwan Ahmad, Munawar Raza Kazmi, Awais Habib

Abstract:

Climate change and crop production are intrinsically associated with each other. Therefore, this research study is designed to assess the impact of climate change on pulses production in Southern districts of Khyber Pakhtunkhwa (KP) Province of Pakistan. Two pulses (i.e. chickpea and mung bean) were selected for this research study with respect to climate change. Climatic variables such as temperature, humidity and precipitation along with pulses production and area under cultivation of pulses were encompassed as the major variables of this study. Secondary data of climatic variables and crop variables for the period of thirty four years (1986-2020) were obtained from Pakistan Metrological Department and Agriculture Statistics of KP respectively. Panel data set of chickpea and mung bean crops was estimated separately. The analysis validate that both data sets were a balanced panel data. The Hausman specification test was run separately for both the panel data sets whose findings had suggested the fixed effect model can be deemed as an appropriate model for chickpea panel data, however random effect model was appropriate for estimation of the panel data of mung bean. Major findings confirm that maximum temperature is statistically significant for the chickpea yield. This implies if maximum temperature increases by 1 0C, it can enhance the chickpea yield by 0.0463 units. However, the impact of precipitation was reported insignificant. Furthermore, the humidity was statistically significant and has a positive association with chickpea yield. In case of mung bean the minimum temperature was significantly contributing in the yield of mung bean. This study concludes that temperature and humidity can significantly contribute to enhance the pulses yield. It is recommended that capacity building of pulses growers may be made to adapt the climate change strategies. Moreover, government may ensure the availability of climate change resistant varieties of pulses to encourage the pulses cultivation.

Keywords: climate change, pulses productivity, agriculture, Pakistan

Procedia PDF Downloads 18
24639 Effects of the Different Recovery Durations on Some Physiological Parameters during 3 X 3 Small-Sided Games in Soccer

Authors: Samet Aktaş, Nurtekin Erkmen, Faruk Guven, Halil Taskin

Abstract:

This study aimed to determine the effects of 3 versus 3 small-sided games (SSG) with different recovery times on soma physiological parameters in soccer players. Twelve soccer players from Regional Amateur League volunteered for this study (mean±SD age, 20.50±2.43 years; height, 177.73±4.13 cm; weight, 70.83±8.38 kg). Subjects were performing soccer training for five days per week. The protocol of the study was approved by the local ethic committee in School of Physical Education and Sport, Selcuk University. The subjects were divided into teams with 3 players according to Yo-Yo Intermittent Recovery Test. The field dimension was 26 m wide and 34 m in length. Subjects performed two times in a random order a series of 3 bouts of 3-a-side SSGs with 3 min and 5 min recovery durations. In SSGs, each set were performed with 6 min duration. The percent of maximal heart rate (% HRmax), blood lactate concentration (LA) and Rated Perceived Exertion (RPE) scale points were collected before the SSGs and at the end of each set. Data were analyzed by analysis of variance (ANOVA) with repeated measures. Significant differences were found between %HRmax in before SSG and 1st set, 2nd set, and 3rd set in both SSG with 3 min recovery duration and SSG with 5 min recovery duration (p<0.05). Means of %HRmax in SSG with 3 min recovery duration at both 1st and 2nd sets were significantly higher than SSG with 5 min recovery duration (p<0.05). No significant difference was found between sets of either SSGs in terms of LA (p>0.05). LA in SSG with 3 min recovery duration was higher than SSG with 5 min recovery duration at 2nd sets (p<0.05). RPE in soccer players was not different between SSGs (p>0.05).In conclusion, this study demonstrates that exercise intensity in SSG with 3 min recovery durations is higher than SSG with 5 min recovery durations.

Keywords: small-sided games, soccer, heart rate, lactate

Procedia PDF Downloads 433
24638 Timely Detection and Identification of Abnormalities for Process Monitoring

Authors: Hyun-Woo Cho

Abstract:

The detection and identification of multivariate manufacturing processes are quite important in order to maintain good product quality. Unusual behaviors or events encountered during its operation can have a serious impact on the process and product quality. Thus they should be detected and identified as soon as possible. This paper focused on the efficient representation of process measurement data in detecting and identifying abnormalities. This qualitative method is effective in representing fault patterns of process data. In addition, it is quite sensitive to measurement noise so that reliable outcomes can be obtained. To evaluate its performance a simulation process was utilized, and the effect of adopting linear and nonlinear methods in the detection and identification was tested with different simulation data. It has shown that the use of a nonlinear technique produced more satisfactory and more robust results for the simulation data sets. This monitoring framework can help operating personnel to detect the occurrence of process abnormalities and identify their assignable causes in an on-line or real-time basis.

Keywords: detection, monitoring, identification, measurement data, multivariate techniques

Procedia PDF Downloads 202
24637 Towards a Balancing Medical Database by Using the Least Mean Square Algorithm

Authors: Kamel Belammi, Houria Fatrim

Abstract:

imbalanced data set, a problem often found in real world application, can cause seriously negative effect on classification performance of machine learning algorithms. There have been many attempts at dealing with classification of imbalanced data sets. In medical diagnosis classification, we often face the imbalanced number of data samples between the classes in which there are not enough samples in rare classes. In this paper, we proposed a learning method based on a cost sensitive extension of Least Mean Square (LMS) algorithm that penalizes errors of different samples with different weight and some rules of thumb to determine those weights. After the balancing phase, we applythe different classifiers (support vector machine (SVM), k- nearest neighbor (KNN) and multilayer neuronal networks (MNN)) for balanced data set. We have also compared the obtained results before and after balancing method.

Keywords: multilayer neural networks, k- nearest neighbor, support vector machine, imbalanced medical data, least mean square algorithm, diabetes

Procedia PDF Downloads 499
24636 Process Data-Driven Representation of Abnormalities for Efficient Process Control

Authors: Hyun-Woo Cho

Abstract:

Unexpected operational events or abnormalities of industrial processes have a serious impact on the quality of final product of interest. In terms of statistical process control, fault detection and diagnosis of processes is one of the essential tasks needed to run the process safely. In this work, nonlinear representation of process measurement data is presented and evaluated using a simulation process. The effect of using different representation methods on the diagnosis performance is tested in terms of computational efficiency and data handling. The results have shown that the nonlinear representation technique produced more reliable diagnosis results and outperforms linear methods. The use of data filtering step improved computational speed and diagnosis performance for test data sets. The presented scheme is different from existing ones in that it attempts to extract the fault pattern in the reduced space, not in the original process variable space. Thus this scheme helps to reduce the sensitivity of empirical models to noise.

Keywords: fault diagnosis, nonlinear technique, process data, reduced spaces

Procedia PDF Downloads 222
24635 Using Genetic Algorithms and Rough Set Based Fuzzy K-Modes to Improve Centroid Model Clustering Performance on Categorical Data

Authors: Rishabh Srivastav, Divyam Sharma

Abstract:

We propose an algorithm to cluster categorical data named as ‘Genetic algorithm initialized rough set based fuzzy K-Modes for categorical data’. We propose an amalgamation of the simple K-modes algorithm, the Rough and Fuzzy set based K-modes and the Genetic Algorithm to form a new algorithm,which we hypothesise, will provide better Centroid Model clustering results, than existing standard algorithms. In the proposed algorithm, the initialization and updation of modes is done by the use of genetic algorithms while the membership values are calculated using the rough set and fuzzy logic.

Keywords: categorical data, fuzzy logic, genetic algorithm, K modes clustering, rough sets

Procedia PDF Downloads 213
24634 Implementation of Algorithm K-Means for Grouping District/City in Central Java Based on Macro Economic Indicators

Authors: Nur Aziza Luxfiati

Abstract:

Clustering is partitioning data sets into sub-sets or groups in such a way that elements certain properties have shared property settings with a high level of similarity within one group and a low level of similarity between groups. . The K-Means algorithm is one of thealgorithmsclustering as a grouping tool that is most widely used in scientific and industrial applications because the basic idea of the kalgorithm is-means very simple. In this research, applying the technique of clustering using the k-means algorithm as a method of solving the problem of national development imbalances between regions in Central Java Province based on macroeconomic indicators. The data sample used is secondary data obtained from the Central Java Provincial Statistics Agency regarding macroeconomic indicator data which is part of the publication of the 2019 National Socio-Economic Survey (Susenas) data. score and determine the number of clusters (k) using the elbow method. After the clustering process is carried out, the validation is tested using themethodsBetween-Class Variation (BCV) and Within-Class Variation (WCV). The results showed that detection outlier using z-score normalization showed no outliers. In addition, the results of the clustering test obtained a ratio value that was not high, namely 0.011%. There are two district/city clusters in Central Java Province which have economic similarities based on the variables used, namely the first cluster with a high economic level consisting of 13 districts/cities and theclustersecondwith a low economic level consisting of 22 districts/cities. And in the cluster second, namely, between low economies, the authors grouped districts/cities based on similarities to macroeconomic indicators such as 20 districts of Gross Regional Domestic Product, with a Poverty Depth Index of 19 districts, with 5 districts in Human Development, and as many as Open Unemployment Rate. 10 districts.

Keywords: clustering, K-Means algorithm, macroeconomic indicators, inequality, national development

Procedia PDF Downloads 131
24633 Collision Detection Algorithm Based on Data Parallelism

Authors: Zhen Peng, Baifeng Wu

Abstract:

Modern computing technology enters the era of parallel computing with the trend of sustainable and scalable parallelism. Single Instruction Multiple Data (SIMD) is an important way to go along with the trend. It is able to gather more and more computing ability by increasing the number of processor cores without the need of modifying the program. Meanwhile, in the field of scientific computing and engineering design, many computation intensive applications are facing the challenge of increasingly large amount of data. Data parallel computing will be an important way to further improve the performance of these applications. In this paper, we take the accurate collision detection in building information modeling as an example. We demonstrate a model for constructing a data parallel algorithm. According to the model, a complex object is decomposed into the sets of simple objects; collision detection among complex objects is converted into those among simple objects. The resulting algorithm is a typical SIMD algorithm, and its advantages in parallelism and scalability is unparalleled in respect to the traditional algorithms.

Keywords: data parallelism, collision detection, single instruction multiple data, building information modeling, continuous scalability

Procedia PDF Downloads 259
24632 Data Augmentation for Automatic Graphical User Interface Generation Based on Generative Adversarial Network

Authors: Xulu Yao, Moi Hoon Yap, Yanlong Zhang

Abstract:

As a branch of artificial neural network, deep learning is widely used in the field of image recognition, but the lack of its dataset leads to imperfect model learning. By analysing the data scale requirements of deep learning and aiming at the application in GUI generation, it is found that the collection of GUI dataset is a time-consuming and labor-consuming project, which is difficult to meet the needs of current deep learning network. To solve this problem, this paper proposes a semi-supervised deep learning model that relies on the original small-scale datasets to produce a large number of reliable data sets. By combining the cyclic neural network with the generated countermeasure network, the cyclic neural network can learn the sequence relationship and characteristics of data, make the generated countermeasure network generate reasonable data, and then expand the Rico dataset. Relying on the network structure, the characteristics of collected data can be well analysed, and a large number of reasonable data can be generated according to these characteristics. After data processing, a reliable dataset for model training can be formed, which alleviates the problem of dataset shortage in deep learning.

Keywords: GUI, deep learning, GAN, data augmentation

Procedia PDF Downloads 150
24631 New Two-Way Map-Reduce Join Algorithm: Hash Semi Join

Authors: Marwa Hussein Mohamed, Mohamed Helmy Khafagy, Samah Ahmed Senbel

Abstract:

Map Reduce is a programming model used to handle and support massive data sets. Rapidly increasing in data size and big data are the most important issue today to make an analysis of this data. map reduce is used to analyze data and get more helpful information by using two simple functions map and reduce it's only written by the programmer, and it includes load balancing , fault tolerance and high scalability. The most important operation in data analysis are join, but map reduce is not directly support join. This paper explains two-way map-reduce join algorithm, semi-join and per split semi-join, and proposes new algorithm hash semi-join that used hash table to increase performance by eliminating unused records as early as possible and apply join using hash table rather than using map function to match join key with other data table in the second phase but using hash tables isn't affecting on memory size because we only save matched records from the second table only. Our experimental result shows that using a hash table with hash semi-join algorithm has higher performance than two other algorithms while increasing the data size from 10 million records to 500 million and running time are increased according to the size of joined records between two tables.

Keywords: map reduce, hadoop, semi join, two way join

Procedia PDF Downloads 486
24630 Activity Data Analysis for Status Classification Using Fitness Trackers

Authors: Rock-Hyun Choi, Won-Seok Kang, Chang-Sik Son

Abstract:

Physical activity is important for healthy living. Recently wearable devices which motivate physical activity are quickly developing, and become cheaper and more comfortable. In particular, fitness trackers provide a variety of information and need to provide well-analyzed, and user-friendly results. In this study, frequency analysis was performed to classify various data sets of Fitbit into simple activity status. The data from Fitbit cloud server consists of 263 subjects who were healthy factory and office workers in Korea from March 7th to April 30th, 2016. In the results, we found assumptions of activity state classification seem to be sufficient and reasonable.

Keywords: activity status, fitness tracker, heart rate, steps

Procedia PDF Downloads 357
24629 The Effects of Passive and Active Recoveries on Responses of Platelet Indices and Hemodynamic Variables to Resistance Exercise

Authors: Mohammad Soltani, Sajad Ahmadizad, Fatemeh Hoseinzadeh, Atefe Sarvestan

Abstract:

The exercise recovery is an important variable in designing resistance exercise training. This study determined the effects of passive and active recoveries on responses of platelet indices and hemodynamic variables to resistance exercise. Twelve healthy subjects (six men and six women, age, 25.4 ±2.5 yrs) performed two types of resistance exercise protocols (six exercises including upper- and lower-body parts) at two separate sessions with one-week intervening. First resistance protocol included three sets of six repetitions at 80% of 1RM with 2 min passive rest between sets and exercises; while, the second protocol included three sets of six repetitions at 60% of 1RM followed by active recovery included six repetitions of the same exercise at 20% of 1RM. The exercise volume was equalized. Three blood samples were taken before exercise, immediately after exercise and after 1-hour recovery, and analyzed for fibrinogen and platelet indices. Blood pressure (BP), heart rate (HR) and rate pressure product (RPP), were measured before, immediately after exercise and every 5 minutes during recovery. Data analyzes showed a significant increase in SBP (systolic blood pressure), HR, rate of pressure product (RPP) and PLT in response to resistance exercise (P<0.05) and that changes for HR and RPP were significantly different between two protocols (P<0.05). Furthermore, MPV and P_LCR did not change in response to resistance exercise, though significant reductions were observed after 1h recovery compared to before and after exercise (P<0.05). No significant changes in fibrinogen and PDW following two types of resistance exercise protocols were observed (P>0.05). On the other hand, no significant differences in platelet indices were found between the two protocols (P>0.05). Resistance exercise induces changes in platelet indices and hemodynamic variables, and that these changes are not related to the type of recovery and returned to normal levels after 1h recovery.

Keywords: hemodynamic variables, platelet indices, resistance exercise, recovery intensity

Procedia PDF Downloads 107
24628 Application of Unconventional Materials for ‘Statement Jewellery’

Authors: Shaleni Bajpai, V. Niveditha

Abstract:

A fashion accessory is a product which used to give secondary way to the wearer’s outfit. The term came into use in the 19th century and was specifically chosen to complement the wearer’s look. The aim of project was to introduce the unconventional materials for statement jewellery. The materials used for statement jewellery were waste Cd’s, and scrap fabric. These materials were amalgamated with the traditional raw materials such as beads, sequins, charms and chains to form unique jewellery sets. The sets were divided into two categories based on the type of raw material used i.e. Category 1: Clef-Cd Jewellery, Category 2: Crumb-Fabric Jewellery. Each Jewellery set consisted of a necklace, a pair of earrings, a ring and a bracelet.

Keywords: statement jewellery, unconventional, crumb fabric, Cd’s

Procedia PDF Downloads 233
24627 Acute Effects of Local Vibration on Muscle Activation, Metabolic and Hormone Responses

Authors: Zong Yan Cai, Wen-Chyuan Chen, Chih-Min Wu

Abstract:

The purpose of this study was to investigate the acute effects of local vibration on muscle activation, metabolic and hormone responses. Totally 12 healthy, physically inactive, male adults participated in this study and completed LV exercise session. During LV exercise session, four custom-made vibrations (diameter: 20 mm; thickness: 8 mm; weight: 0.022 g) were locally placed over the belly of the thigh of each subject’s non-dominant leg in supine lying position, and subjects received 10 sets for 1 min at the frequency of 35-40Hz, with 1–2 min of rest between sets. The surface electromyography (EMG) were obtained from the vastus medialis and rectus femoris, and the subjects’ rating of perceived exertion (RPE) and heart rate (HR) were measured. EMG data, RPE values as well as HR were obtained by averaging the results of 10 sets of each exercise session. Blood samples were drawn before exercise, immediately after exercise, and 15min and 30min after exercise in each session for analysis of lactic acid (LA), growth hormone (GH), testosterone (T) and cortisol (C). The results indicated that the HR did not increase after LV (63.18±3.5 to 63.25±2.58 beat/min, p > 0.05). The average RPE values during the LV exposure were at 2.86±0.39. The root mean square % EMG values from the vastus medialis and rectus femoris were 19.02±2.19 and 8.25±2.20 respectively. There were no significant differences after acute LV exercise among LA, GH and T values as compared with baseline values (LA: 0.68±0.11 to 0.7±0.1 mmol/L; GH: 0.06±0.05 to 0.57±0.27 ng/mL; T: 551.33±46.62 to 520.42±43.78 ng/dL, p>0.05). However, the LV treatment caused a significant decrease in C values after exercise (16.56±1.05 to 11.64±1.85 nmol/L, p<0.05). In conclusion, acute LV exercise only slightly increase muscle activation which may not cause effective exercise response. However, acute LV exercise reduces C level, which may reduce the catabolic response. The probable reason might partly due to the vibration rhythmically which massage on muscles.

Keywords: cortisol, growth hormone, lactic acid, testosterone

Procedia PDF Downloads 249
24626 Analyzing Environmental Emotive Triggers in Terrorist Propaganda

Authors: Travis Morris

Abstract:

The purpose of this study is to measure the intersection of environmental security entities in terrorist propaganda. To the best of author’s knowledge, this is the first study of its kind to examine this intersection within terrorist propaganda. Rosoka, natural language processing software and frame analysis are used to advance our understanding of how environmental frames function as emotive triggers. Violent jihadi demagogues use frames to suggest violent and non-violent solutions to their grievances. Emotive triggers are framed in a way to leverage individual and collective attitudes in psychological warfare. A comparative research design is used because of the differences and similarities that exist between two variants of violent jihadi propaganda that target western audiences. Analysis is based on salience and network text analysis, which generates violent jihadi semantic networks. Findings indicate that environmental frames are used as emotive triggers across both data sets, but also as tactical and information data points. A significant finding is that certain core environmental emotive triggers like “water,” “soil,” and “trees” are significantly salient at the aggregate level across both data sets. All environmental entities can be classified into two categories, symbolic and literal. Importantly, this research illustrates how demagogues use environmental emotive triggers in cyber space from a subcultural perspective to mobilize target audiences to their ideology and praxis. Understanding the anatomy of propaganda construction is necessary in order to generate effective counter narratives in information operations. This research advances an additional method to inform practitioners and policy makers of how environmental security and propaganda intersect.

Keywords: propaganda analysis, emotive triggers environmental security, frames

Procedia PDF Downloads 114
24625 Location-Domination on Join of Two Graphs and Their Complements

Authors: Analen Malnegro, Gina Malacas

Abstract:

Dominating sets and related topics have been studied extensively in the past few decades. A dominating set of a graph G is a subset D of V such that every vertex not in D is adjacent to at least one member of D. The domination number γ(G) is the number of vertices in a smallest dominating set for G. Some problems involving detection devices can be modeled with graphs. Finding the minimum number of devices needed according to the type of devices and the necessity of locating the object gives rise to locating-dominating sets. A subset S of vertices of a graph G is called locating-dominating set, LD-set for short, if it is a dominating set and if every vertex v not in S is uniquely determined by the set of neighbors of v belonging to S. The location-domination number λ(G) is the minimum cardinality of an LD-set for G. The complement of a graph G is a graph Ḡ on same vertices such that two distinct vertices of Ḡ are adjacent if and only if they are not adjacent in G. An LD-set of a graph G is global if it is an LD-set of both G and its complement Ḡ. The global location-domination number λg(G) is defined as the minimum cardinality of a global LD-set of G. In this paper, global LD-sets on the join of two graphs are characterized. Global location-domination numbers of these graphs are also determined.

Keywords: dominating set, global locating-dominating set, global location-domination number, locating-dominating set, location-domination number

Procedia PDF Downloads 156