Search results for: evolutionary algorithms (EA's)
537 PaSA: A Dataset for Patent Sentiment Analysis to Highlight Patent Paragraphs
Authors: Renukswamy Chikkamath, Vishvapalsinhji Ramsinh Parmar, Christoph Hewel, Markus Endres
Abstract:
Given a patent document, identifying distinct semantic annotations is an interesting research aspect. Text annotation helps the patent practitioners such as examiners and patent attorneys to quickly identify the key arguments of any invention, successively providing a timely marking of a patent text. In the process of manual patent analysis, to attain better readability, recognising the semantic information by marking paragraphs is in practice. This semantic annotation process is laborious and time-consuming. To alleviate such a problem, we proposed a dataset to train machine learning algorithms to automate the highlighting process. The contributions of this work are: i) we developed a multi-class dataset of size 150k samples by traversing USPTO patents over a decade, ii) articulated statistics and distributions of data using imperative exploratory data analysis, iii) baseline Machine Learning models are developed to utilize the dataset to address patent paragraph highlighting task, and iv) future path to extend this work using Deep Learning and domain-specific pre-trained language models to develop a tool to highlight is provided. This work assists patent practitioners in highlighting semantic information automatically and aids in creating a sustainable and efficient patent analysis using the aptitude of machine learning.Keywords: machine learning, patents, patent sentiment analysis, patent information retrieval
Procedia PDF Downloads 90536 Test Suite Optimization Using an Effective Meta-Heuristic BAT Algorithm
Authors: Anuradha Chug, Sunali Gandhi
Abstract:
Regression Testing is a very expensive and time-consuming process carried out to ensure the validity of modified software. Due to the availability of insufficient resources to re-execute all the test cases in time constrained environment, efforts are going on to generate test data automatically without human efforts. Many search based techniques have been proposed to generate efficient, effective as well as optimized test data, so that the overall cost of the software testing can be minimized. The generated test data should be able to uncover all potential lapses that exist in the software or product. Inspired from the natural behavior of bat for searching her food sources, current study employed a meta-heuristic, search-based bat algorithm for optimizing the test data on the basis certain parameters without compromising their effectiveness. Mathematical functions are also applied that can effectively filter out the redundant test data. As many as 50 Java programs are used to check the effectiveness of proposed test data generation and it has been found that 86% saving in testing efforts can be achieved using bat algorithm while covering 100% of the software code for testing. Bat algorithm was found to be more efficient in terms of simplicity and flexibility when the results were compared with another nature inspired algorithms such as Firefly Algorithm (FA), Hill Climbing Algorithm (HC) and Ant Colony Optimization (ACO). The output of this study would be useful to testers as they can achieve 100% path coverage for testing with minimum number of test cases.Keywords: regression testing, test case selection, test case prioritization, genetic algorithm, bat algorithm
Procedia PDF Downloads 380535 Aerodynamic Modelling of Unmanned Aerial System through Computational Fluid Dynamics: Application to the UAS-S45 Balaam
Authors: Maxime A. J. Kuitche, Ruxandra M. Botez, Arthur Guillemin
Abstract:
As the Unmanned Aerial Systems have found diverse utilities in both military and civil aviation, the necessity to obtain an accurate aerodynamic model has shown an enormous growth of interest. Recent modeling techniques are procedures using optimization algorithms and statistics that require many flight tests and are therefore extremely demanding in terms of costs. This paper presents a procedure to estimate the aerodynamic behavior of an unmanned aerial system from a numerical approach using computational fluid dynamic analysis. The study was performed using an unstructured mesh obtained from a grid convergence analysis at a Mach number of 0.14, and at an angle of attack of 0°. The flow around the aircraft was described using a standard k-ω turbulence model. Thus, the Reynold Averaged Navier-Stokes (RANS) equations were solved using ANSYS FLUENT software. The method was applied on the UAS-S45 designed and manufactured by Hydra Technologies in Mexico. The lift, the drag, and the pitching moment coefficients were obtained at different angles of attack for several flight conditions defined in terms of altitudes and Mach numbers. The results obtained from the Computational Fluid Dynamics analysis were compared with the results obtained by using the DATCOM semi-empirical procedure. This comparison has indicated that our approach is highly accurate and that the aerodynamic model obtained could be useful to estimate the flight dynamics of the UAS-S45.Keywords: aerodynamic modelling, CFD Analysis, ANSYS FLUENT, UAS-S45
Procedia PDF Downloads 375534 Constructions of Linear and Robust Codes Based on Wavelet Decompositions
Authors: Alla Levina, Sergey Taranov
Abstract:
The classical approach to the providing noise immunity and integrity of information that process in computing devices and communication channels is to use linear codes. Linear codes have fast and efficient algorithms of encoding and decoding information, but this codes concentrate their detect and correct abilities in certain error configurations. To protect against any configuration of errors at predetermined probability can robust codes. This is accomplished by the use of perfect nonlinear and almost perfect nonlinear functions to calculate the code redundancy. The paper presents the error-correcting coding scheme using biorthogonal wavelet transform. Wavelet transform applied in various fields of science. Some of the wavelet applications are cleaning of signal from noise, data compression, spectral analysis of the signal components. The article suggests methods for constructing linear codes based on wavelet decomposition. For developed constructions we build generator and check matrix that contain the scaling function coefficients of wavelet. Based on linear wavelet codes we develop robust codes that provide uniform protection against all errors. In article we propose two constructions of robust code. The first class of robust code is based on multiplicative inverse in finite field. In the second robust code construction the redundancy part is a cube of information part. Also, this paper investigates the characteristics of proposed robust and linear codes.Keywords: robust code, linear code, wavelet decomposition, scaling function, error masking probability
Procedia PDF Downloads 489533 Unsupervised Feature Learning by Pre-Route Simulation of Auto-Encoder Behavior Model
Authors: Youngjae Jin, Daeshik Kim
Abstract:
This paper describes a cycle accurate simulation results of weight values learned by an auto-encoder behavior model in terms of pre-route simulation. Given the results we visualized the first layer representations with natural images. Many common deep learning threads have focused on learning high-level abstraction of unlabeled raw data by unsupervised feature learning. However, in the process of handling such a huge amount of data, the learning method’s computation complexity and time limited advanced research. These limitations came from the fact these algorithms were computed by using only single core CPUs. For this reason, parallel-based hardware, FPGAs, was seen as a possible solution to overcome these limitations. We adopted and simulated the ready-made auto-encoder to design a behavior model in Verilog HDL before designing hardware. With the auto-encoder behavior model pre-route simulation, we obtained the cycle accurate results of the parameter of each hidden layer by using MODELSIM. The cycle accurate results are very important factor in designing a parallel-based digital hardware. Finally this paper shows an appropriate operation of behavior model based pre-route simulation. Moreover, we visualized learning latent representations of the first hidden layer with Kyoto natural image dataset.Keywords: auto-encoder, behavior model simulation, digital hardware design, pre-route simulation, Unsupervised feature learning
Procedia PDF Downloads 446532 The Messy and Irregular Experience of Entrepreneurial Life
Authors: Hannah Dean
Abstract:
The growth ideology, and its association with progress, is an important construct in the narrative of modernity. This ideology is embedded in neoclassical economic growth theory which conceptualises growth as linear and predictable, and the entrepreneur as a rational economic manager. This conceptualisation has been critiqued for reinforcing the managerial discourse in entrepreneurship studies. Despite these critiques, both the neoclassical growth theory and its adjacent managerial discourse dominate entrepreneurship studies notably the literature on female entrepreneurs. The latter is the focus of this paper. Given this emphasis on growth, female entrepreneurs are portrayed as problematic because their growth lags behind their male counterparts. This image which ignores the complexity and diversity of female entrepreneurs’ experience persists in the literature due to the lack of studies that analyse the process and contextual factors surrounding female entrepreneurs’ experience. This study aims to address the subordination of female entrepreneurs by questioning the hegemonic logic of economic growth and the managerial discourse as a true representation for the entrepreneurial experience. This objective is achieved by drawing on Schumpeter’s theorising and narrative inquiry. This exploratory study undertakes in depth interviews to gain insights into female entrepreneurs’ experience and the impact of the economic growth model and the managerial discourse on their performance. The narratives challenge a number of assumptions about female entrepreneurs. The participants occupied senior positions in the corporate world before setting up their businesses. This is at odds with much writing which assumes that women underperform because they leave their career without gaining managerial experience to achieve work-life balance. In line with Schumpeter, who distinguishes the entrepreneur from the manager, the participants’ main function was innovation. They did not believe that the managerial paradigm governing their corporate careers was applicable to their entrepreneurial experience. Formal planning and managerial rationality can hinder their decision making process. The narratives point to the gap between the two worlds which makes stepping into entrepreneurship a scary move. Schumpeter argues that the entrepreneurial process is evolutionary and that failure is an integral part of it. The participants’ entrepreneurial process was in fact irregular. The performance of new combinations was not always predictable. They therefore relied on their initiative. The inhibition to deploy these traits had an adverse effect on business growth. The narratives also indicate that over-reliance on growth threaten the business survival as it faces competing pressures. The study offers theoretical and empirical contributions to (female) entrepreneurship studies by presenting Schumpeter’s theorising as an alternative theoretical framework to the neoclassical economic growth theory. The study also reduces entrepreneurs’ vulnerability by making them aware of the negative influence that the linear growth model and the managerial discourse hold upon their performance. The study has implications for policy makers as it generates new knowledge that incorporates the current social and economic changes in the context of entrepreneurs that can no longer be sustained by the linear growth models especially in the current economic climate.Keywords: economic growth, female entrepreneurs, managerial discourse, Schumpeter
Procedia PDF Downloads 296531 Development of pm2.5 Forecasting System in Seoul, South Korea Using Chemical Transport Modeling and ConvLSTM-DNN
Authors: Ji-Seok Koo, Hee‑Yong Kwon, Hui-Young Yun, Kyung-Hui Wang, Youn-Seo Koo
Abstract:
This paper presents a forecasting system for PM2.5 levels in Seoul, South Korea, leveraging a combination of chemical transport modeling and ConvLSTM-DNN machine learning technology. Exposure to PM2.5 has known detrimental impacts on public health, making its prediction crucial for establishing preventive measures. Existing forecasting models, like the Community Multiscale Air Quality (CMAQ) and Weather Research and Forecasting (WRF), are hindered by their reliance on uncertain input data, such as anthropogenic emissions and meteorological patterns, as well as certain intrinsic model limitations. The system we've developed specifically addresses these issues by integrating machine learning and using carefully selected input features that account for local and distant sources of PM2.5. In South Korea, the PM2.5 concentration is greatly influenced by both local emissions and long-range transport from China, and our model effectively captures these spatial and temporal dynamics. Our PM2.5 prediction system combines the strengths of advanced hybrid machine learning algorithms, convLSTM and DNN, to improve upon the limitations of the traditional CMAQ model. Data used in the system include forecasted information from CMAQ and WRF models, along with actual PM2.5 concentration and weather variable data from monitoring stations in China and South Korea. The system was implemented specifically for Seoul's PM2.5 forecasting.Keywords: PM2.5 forecast, machine learning, convLSTM, DNN
Procedia PDF Downloads 54530 Hybrid GNN Based Machine Learning Forecasting Model For Industrial IoT Applications
Authors: Atish Bagchi, Siva Chandrasekaran
Abstract:
Background: According to World Bank national accounts data, the estimated global manufacturing value-added output in 2020 was 13.74 trillion USD. These manufacturing processes are monitored, modelled, and controlled by advanced, real-time, computer-based systems, e.g., Industrial IoT, PLC, SCADA, etc. These systems measure and manipulate a set of physical variables, e.g., temperature, pressure, etc. Despite the use of IoT, SCADA etc., in manufacturing, studies suggest that unplanned downtime leads to economic losses of approximately 864 billion USD each year. Therefore, real-time, accurate detection, classification and prediction of machine behaviour are needed to minimise financial losses. Although vast literature exists on time-series data processing using machine learning, the challenges faced by the industries that lead to unplanned downtimes are: The current algorithms do not efficiently handle the high-volume streaming data from industrial IoTsensors and were tested on static and simulated datasets. While the existing algorithms can detect significant 'point' outliers, most do not handle contextual outliers (e.g., values within normal range but happening at an unexpected time of day) or subtle changes in machine behaviour. Machines are revamped periodically as part of planned maintenance programmes, which change the assumptions on which original AI models were created and trained. Aim: This research study aims to deliver a Graph Neural Network(GNN)based hybrid forecasting model that interfaces with the real-time machine control systemand can detect, predict machine behaviour and behavioural changes (anomalies) in real-time. This research will help manufacturing industries and utilities, e.g., water, electricity etc., reduce unplanned downtimes and consequential financial losses. Method: The data stored within a process control system, e.g., Industrial-IoT, Data Historian, is generally sampled during data acquisition from the sensor (source) and whenpersistingin the Data Historian to optimise storage and query performance. The sampling may inadvertently discard values that might contain subtle aspects of behavioural changes in machines. This research proposed a hybrid forecasting and classification model which combines the expressive and extrapolation capability of GNN enhanced with the estimates of entropy and spectral changes in the sampled data and additional temporal contexts to reconstruct the likely temporal trajectory of machine behavioural changes. The proposed real-time model belongs to the Deep Learning category of machine learning and interfaces with the sensors directly or through 'Process Data Historian', SCADA etc., to perform forecasting and classification tasks. Results: The model was interfaced with a Data Historianholding time-series data from 4flow sensors within a water treatment plantfor45 days. The recorded sampling interval for a sensor varied from 10 sec to 30 min. Approximately 65% of the available data was used for training the model, 20% for validation, and the rest for testing. The model identified the anomalies within the water treatment plant and predicted the plant's performance. These results were compared with the data reported by the plant SCADA-Historian system and the official data reported by the plant authorities. The model's accuracy was much higher (20%) than that reported by the SCADA-Historian system and matched the validated results declared by the plant auditors. Conclusions: The research demonstrates that a hybrid GNN based approach enhanced with entropy calculation and spectral information can effectively detect and predict a machine's behavioural changes. The model can interface with a plant's 'process control system' in real-time to perform forecasting and classification tasks to aid the asset management engineers to operate their machines more efficiently and reduce unplanned downtimes. A series of trialsare planned for this model in the future in other manufacturing industries.Keywords: GNN, Entropy, anomaly detection, industrial time-series, AI, IoT, Industry 4.0, Machine Learning
Procedia PDF Downloads 150529 A Hierarchical Method for Multi-Class Probabilistic Classification Vector Machines
Authors: P. Byrnes, F. A. DiazDelaO
Abstract:
The Support Vector Machine (SVM) has become widely recognised as one of the leading algorithms in machine learning for both regression and binary classification. It expresses predictions in terms of a linear combination of kernel functions, referred to as support vectors. Despite its popularity amongst practitioners, SVM has some limitations, with the most significant being the generation of point prediction as opposed to predictive distributions. Stemming from this issue, a probabilistic model namely, Probabilistic Classification Vector Machines (PCVM), has been proposed which respects the original functional form of SVM whilst also providing a predictive distribution. As physical system designs become more complex, an increasing number of classification tasks involving industrial applications consist of more than two classes. Consequently, this research proposes a framework which allows for the extension of PCVM to a multi class setting. Additionally, the original PCVM framework relies on the use of type II maximum likelihood to provide estimates for both the kernel hyperparameters and model evidence. In a high dimensional multi class setting, however, this approach has been shown to be ineffective due to bad scaling as the number of classes increases. Accordingly, we propose the application of Markov Chain Monte Carlo (MCMC) based methods to provide a posterior distribution over both parameters and hyperparameters. The proposed framework will be validated against current multi class classifiers through synthetic and real life implementations.Keywords: probabilistic classification vector machines, multi class classification, MCMC, support vector machines
Procedia PDF Downloads 221528 Bridge Health Monitoring: A Review
Authors: Mohammad Bakhshandeh
Abstract:
Structural Health Monitoring (SHM) is a crucial and necessary practice that plays a vital role in ensuring the safety and integrity of critical structures, and in particular, bridges. The continuous monitoring of bridges for signs of damage or degradation through Bridge Health Monitoring (BHM) enables early detection of potential problems, allowing for prompt corrective action to be taken before significant damage occurs. Although all monitoring techniques aim to provide accurate and decisive information regarding the remaining useful life, safety, integrity, and serviceability of bridges, understanding the development and propagation of damage is vital for maintaining uninterrupted bridge operation. Over the years, extensive research has been conducted on BHM methods, and experts in the field have increasingly adopted new methodologies. In this article, we provide a comprehensive exploration of the various BHM approaches, including sensor-based, non-destructive testing (NDT), model-based, and artificial intelligence (AI)-based methods. We also discuss the challenges associated with BHM, including sensor placement and data acquisition, data analysis and interpretation, cost and complexity, and environmental effects, through an extensive review of relevant literature and research studies. Additionally, we examine potential solutions to these challenges and propose future research ideas to address critical gaps in BHM.Keywords: structural health monitoring (SHM), bridge health monitoring (BHM), sensor-based methods, machine-learning algorithms, and model-based techniques, sensor placement, data acquisition, data analysis
Procedia PDF Downloads 90527 Design and Optimization of Open Loop Supply Chain Distribution Network Using Hybrid K-Means Cluster Based Heuristic Algorithm
Authors: P. Suresh, K. Gunasekaran, R. Thanigaivelan
Abstract:
Radio frequency identification (RFID) technology has been attracting considerable attention with the expectation of improved supply chain visibility for consumer goods, apparel, and pharmaceutical manufacturers, as well as retailers and government procurement agencies. It is also expected to improve the consumer shopping experience by making it more likely that the products they want to purchase are available. Recent announcements from some key retailers have brought interest in RFID to the forefront. A modified K- Means Cluster based Heuristic approach, Hybrid Genetic Algorithm (GA) - Simulated Annealing (SA) approach, Hybrid K-Means Cluster based Heuristic-GA and Hybrid K-Means Cluster based Heuristic-GA-SA for Open Loop Supply Chain Network problem are proposed. The study incorporated uniform crossover operator and combined crossover operator in GAs for solving open loop supply chain distribution network problem. The algorithms are tested on 50 randomly generated data set and compared with each other. The results of the numerical experiments show that the Hybrid K-means cluster based heuristic-GA-SA, when tested on 50 randomly generated data set, shows superior performance to the other methods for solving the open loop supply chain distribution network problem.Keywords: RFID, supply chain distribution network, open loop supply chain, genetic algorithm, simulated annealing
Procedia PDF Downloads 165526 Predicting the Compressive Strength of Geopolymer Concrete Using Machine Learning Algorithms: Impact of Chemical Composition and Curing Conditions
Authors: Aya Belal, Ahmed Maher Eltair, Maggie Ahmed Mashaly
Abstract:
Geopolymer concrete is gaining recognition as a sustainable alternative to conventional Portland Cement concrete due to its environmentally friendly nature, which is a key goal for Smart City initiatives. It has demonstrated its potential as a reliable material for the design of structural elements. However, the production of Geopolymer concrete is hindered by batch-to-batch variations, which presents a significant challenge to the widespread adoption of Geopolymer concrete. To date, Machine learning has had a profound impact on various fields by enabling models to learn from large datasets and predict outputs accurately. This paper proposes an integration between the current drift to Artificial Intelligence and the composition of Geopolymer mixtures to predict their mechanical properties. This study employs Python software to develop machine learning model in specific Decision Trees. The research uses the percentage oxides and the chemical composition of the Alkali Solution along with the curing conditions as the input independent parameters, irrespective of the waste products used in the mixture yielding the compressive strength of the mix as the output parameter. The results showed 90 % agreement of the predicted values to the actual values having the ratio of the Sodium Silicate to the Sodium Hydroxide solution being the dominant parameter in the mixture.Keywords: decision trees, geopolymer concrete, machine learning, smart cities, sustainability
Procedia PDF Downloads 88525 Client Hacked Server
Authors: Bagul Abhijeet
Abstract:
Background: Client-Server model is the backbone of today’s internet communication. In which normal user can not have control over particular website or server? By using the same processing model one can have unauthorized access to particular server. In this paper, we discussed about application scenario of hacking for simple website or server consist of unauthorized way to access the server database. This application emerges to autonomously take direct access of simple website or server and retrieve all essential information maintain by administrator. In this system, IP address of server given as input to retrieve user-id and password of server. This leads to breaking administrative security of server and acquires the control of server database. Whereas virus helps to escape from server security by crashing the whole server. Objective: To control malicious attack and preventing all government website, and also find out illegal work to do hackers activity. Results: After implementing different hacking as well as non-hacking techniques, this system hacks simple web sites with normal security credentials. It provides access to server database and allow attacker to perform database operations from client machine. Above Figure shows the experimental result of this application upon different servers and provides satisfactory results as required. Conclusion: In this paper, we have presented a to view to hack the server which include some hacking as well as non-hacking methods. These algorithms and methods provide efficient way to hack server database. By breaking the network security allow to introduce new and better security framework. The terms “Hacking” not only consider for its illegal activities but also it should be use for strengthen our global network.Keywords: Hacking, Vulnerabilities, Dummy request, Virus, Server monitoring
Procedia PDF Downloads 251524 Improving Activity Recognition Classification of Repetitious Beginner Swimming Using a 2-Step Peak/Valley Segmentation Method with Smoothing and Resampling for Machine Learning
Authors: Larry Powell, Seth Polsley, Drew Casey, Tracy Hammond
Abstract:
Human activity recognition (HAR) systems have shown positive performance when recognizing repetitive activities like walking, running, and sleeping. Water-based activities are a reasonably new area for activity recognition. However, water-based activity recognition has largely focused on supporting the elite and competitive swimming population, which already has amazing coordination and proper form. Beginner swimmers are not perfect, and activity recognition needs to support the individual motions to help beginners. Activity recognition algorithms are traditionally built around short segments of timed sensor data. Using a time window input can cause performance issues in the machine learning model. The window’s size can be too small or large, requiring careful tuning and precise data segmentation. In this work, we present a method that uses a time window as the initial segmentation, then separates the data based on the change in the sensor value. Our system uses a multi-phase segmentation method that pulls all peaks and valleys for each axis of an accelerometer placed on the swimmer’s lower back. This results in high recognition performance using leave-one-subject-out validation on our study with 20 beginner swimmers, with our model optimized from our final dataset resulting in an F-Score of 0.95.Keywords: time window, peak/valley segmentation, feature extraction, beginner swimming, activity recognition
Procedia PDF Downloads 123523 Feature Based Unsupervised Intrusion Detection
Authors: Deeman Yousif Mahmood, Mohammed Abdullah Hussein
Abstract:
The goal of a network-based intrusion detection system is to classify activities of network traffics into two major categories: normal and attack (intrusive) activities. Nowadays, data mining and machine learning plays an important role in many sciences; including intrusion detection system (IDS) using both supervised and unsupervised techniques. However, one of the essential steps of data mining is feature selection that helps in improving the efficiency, performance and prediction rate of proposed approach. This paper applies unsupervised K-means clustering algorithm with information gain (IG) for feature selection and reduction to build a network intrusion detection system. For our experimental analysis, we have used the new NSL-KDD dataset, which is a modified dataset for KDDCup 1999 intrusion detection benchmark dataset. With a split of 60.0% for the training set and the remainder for the testing set, a 2 class classifications have been implemented (Normal, Attack). Weka framework which is a java based open source software consists of a collection of machine learning algorithms for data mining tasks has been used in the testing process. The experimental results show that the proposed approach is very accurate with low false positive rate and high true positive rate and it takes less learning time in comparison with using the full features of the dataset with the same algorithm.Keywords: information gain (IG), intrusion detection system (IDS), k-means clustering, Weka
Procedia PDF Downloads 296522 A Framework Based on Dempster-Shafer Theory of Evidence Algorithm for the Analysis of the TV-Viewers’ Behaviors
Authors: Hamdi Amroun, Yacine Benziani, Mehdi Ammi
Abstract:
In this paper, we propose an approach of detecting the behavior of the viewers of a TV program in a non-controlled environment. The experiment we propose is based on the use of three types of connected objects (smartphone, smart watch, and a connected remote control). 23 participants were observed while watching their TV programs during three phases: before, during and after watching a TV program. Their behaviors were detected using an approach based on The Dempster Shafer Theory (DST) in two phases. The first phase is to approximate dynamically the mass functions using an approach based on the correlation coefficient. The second phase is to calculate the approximate mass functions. To approximate the mass functions, two approaches have been tested: the first approach was to divide each features data space into cells; each one has a specific probability distribution over the behaviors. The probability distributions were computed statistically (estimated by empirical distribution). The second approach was to predict the TV-viewing behaviors through the use of classifiers algorithms and add uncertainty to the prediction based on the uncertainty of the model. Results showed that mixing the fusion rule with the computation of the initial approximate mass functions using a classifier led to an overall of 96%, 95% and 96% success rate for the first, second and third TV-viewing phase respectively. The results were also compared to those found in the literature. This study aims to anticipate certain actions in order to maintain the attention of TV viewers towards the proposed TV programs with usual connected objects, taking into account the various uncertainties that can be generated.Keywords: Iot, TV-viewing behaviors identification, automatic classification, unconstrained environment
Procedia PDF Downloads 229521 Detection and Classification of Mammogram Images Using Principle Component Analysis and Lazy Classifiers
Authors: Rajkumar Kolangarakandy
Abstract:
Feature extraction and selection is the primary part of any mammogram classification algorithms. The choice of feature, attribute or measurements have an important influence in any classification system. Discrete Wavelet Transformation (DWT) coefficients are one of the prominent features for representing images in frequency domain. The features obtained after the decomposition of the mammogram images using wavelet transformations have higher dimension. Even though the features are higher in dimension, they were highly correlated and redundant in nature. The dimensionality reduction techniques play an important role in selecting the optimum number of features from the higher dimension data, which are highly correlated. PCA is a mathematical tool that reduces the dimensionality of the data while retaining most of the variation in the dataset. In this paper, a multilevel classification of mammogram images using reduced discrete wavelet transformation coefficients and lazy classifiers is proposed. The classification is accomplished in two different levels. In the first level, mammogram ROIs extracted from the dataset is classified as normal and abnormal types. In the second level, all the abnormal mammogram ROIs is classified into benign and malignant too. A further classification is also accomplished based on the variation in structure and intensity distribution of the images in the dataset. The Lazy classifiers called Kstar, IBL and LWL are used for classification. The classification results obtained with the reduced feature set is highly promising and the result is also compared with the performance obtained without dimension reduction.Keywords: PCA, wavelet transformation, lazy classifiers, Kstar, IBL, LWL
Procedia PDF Downloads 335520 Fuzzy Optimization Multi-Objective Clustering Ensemble Model for Multi-Source Data Analysis
Authors: C. B. Le, V. N. Pham
Abstract:
In modern data analysis, multi-source data appears more and more in real applications. Multi-source data clustering has emerged as a important issue in the data mining and machine learning community. Different data sources provide information about different data. Therefore, multi-source data linking is essential to improve clustering performance. However, in practice multi-source data is often heterogeneous, uncertain, and large. This issue is considered a major challenge from multi-source data. Ensemble is a versatile machine learning model in which learning techniques can work in parallel, with big data. Clustering ensemble has been shown to outperform any standard clustering algorithm in terms of accuracy and robustness. However, most of the traditional clustering ensemble approaches are based on single-objective function and single-source data. This paper proposes a new clustering ensemble method for multi-source data analysis. The fuzzy optimized multi-objective clustering ensemble method is called FOMOCE. Firstly, a clustering ensemble mathematical model based on the structure of multi-objective clustering function, multi-source data, and dark knowledge is introduced. Then, rules for extracting dark knowledge from the input data, clustering algorithms, and base clusterings are designed and applied. Finally, a clustering ensemble algorithm is proposed for multi-source data analysis. The experiments were performed on the standard sample data set. The experimental results demonstrate the superior performance of the FOMOCE method compared to the existing clustering ensemble methods and multi-source clustering methods.Keywords: clustering ensemble, multi-source, multi-objective, fuzzy clustering
Procedia PDF Downloads 189519 A Methodology for Automatic Diversification of Document Categories
Authors: Dasom Kim, Chen Liu, Myungsu Lim, Su-Hyeon Jeon, ByeoungKug Jeon, Kee-Young Kwahk, Namgyu Kim
Abstract:
Recently, numerous documents including unstructured data and text have been created due to the rapid increase in the usage of social media and the Internet. Each document is usually provided with a specific category for the convenience of the users. In the past, the categorization was performed manually. However, in the case of manual categorization, not only can the accuracy of the categorization be not guaranteed but the categorization also requires a large amount of time and huge costs. Many studies have been conducted towards the automatic creation of categories to solve the limitations of manual categorization. Unfortunately, most of these methods cannot be applied to categorizing complex documents with multiple topics because the methods work by assuming that one document can be categorized into one category only. In order to overcome this limitation, some studies have attempted to categorize each document into multiple categories. However, they are also limited in that their learning process involves training using a multi-categorized document set. These methods therefore cannot be applied to multi-categorization of most documents unless multi-categorized training sets are provided. To overcome the limitation of the requirement of a multi-categorized training set by traditional multi-categorization algorithms, we previously proposed a new methodology that can extend a category of a single-categorized document to multiple categorizes by analyzing relationships among categories, topics, and documents. In this paper, we design a survey-based verification scenario for estimating the accuracy of our automatic categorization methodology.Keywords: big data analysis, document classification, multi-category, text mining, topic analysis
Procedia PDF Downloads 272518 Using Monte Carlo Model for Simulation of Rented Housing in Mashhad, Iran
Authors: Mohammad Rahim Rahnama
Abstract:
The study employs Monte Carlo method for simulation of rented housing in Mashhad second largest city in Iran. A total number of 334 rental residential units in Mashhad, including both apartments and houses (villa), were randomly selected from advertisements placed in Khorasan Newspapers during the months of July and August of 2015. In order to simulate the monthly rent price, the rent index was calculated through combining the mortgage and the rent price. In the next step, the relation between the variables of the floor area and that of the number of bedrooms for each unit, in both apartments and houses(villa), was calculated through multivariate regression using SPSS and was coded in XML. The initial model was called using simulation button in SPSS and was simulated using triangular and binominal algorithms. The findings revealed that the average simulated rental index was 548.5$ per month. Calculating the sensitivity of rental index to a number of bedrooms we found that firstly, 97% of units have three bedrooms, and secondly as the number of bedrooms increases from one to three, for the rent price of less than 200$, the percentage of units having one bedroom decreases from 10% to 0. Contrariwise, for units with the rent price of more than 571.4$, the percentage of bedrooms increases from 37% to 48%. In the light of these findings, it becomes clear that planning to build rental residential units, overseeing the rent prices, and granting subsidies to rental residential units, for apartments with two bedrooms, present a felicitous policy for regulating residential units in Mashhad.Keywords: Mashhad, Monte Carlo, simulation, rent price, residential unit
Procedia PDF Downloads 275517 Measurement of Solids Concentration in Hydrocyclone Using ERT: Validation Against CFD
Authors: Vakamalla Teja Reddy, Narasimha Mangadoddy
Abstract:
Hydrocyclones are used to separate particles into different size fractions in the mineral processing, chemical and metallurgical industries. High speed video imaging, Laser Doppler Anemometry (LDA), X-ray and Gamma ray tomography are previously used to measure the two-phase flow characteristics in the cyclone. However, investigation of solids flow characteristics inside the cyclone is often impeded by the nature of the process due to slurry opaqueness and solid metal wall vessels. In this work, a dual-plane high speed Electrical resistance tomography (ERT) is used to measure hydrocyclone internal flow dynamics in situ. Experiments are carried out in 3 inch hydrocyclone for feed solid concentrations varying in the range of 0-50%. ERT data analysis through the optimized FEM mesh size and reconstruction algorithms on air-core and solid concentration tomograms is assessed. Results are presented in terms of the air-core diameter and solids volume fraction contours using Maxwell’s equation for various hydrocyclone operational parameters. It is confirmed by ERT that the air core occupied area and wall solids conductivity levels decreases with increasing the feed solids concentration. Algebraic slip mixture based multi-phase computational fluid dynamics (CFD) model is used to predict the air-core size and the solid concentrations in the hydrocyclone. Validation of air-core size and mean solid volume fractions by ERT measurements with the CFD simulations is attempted.Keywords: air-core, electrical resistance tomography, hydrocyclone, multi-phase CFD
Procedia PDF Downloads 379516 A Supervised Learning Data Mining Approach for Object Recognition and Classification in High Resolution Satellite Data
Authors: Mais Nijim, Rama Devi Chennuboyina, Waseem Al Aqqad
Abstract:
Advances in spatial and spectral resolution of satellite images have led to tremendous growth in large image databases. The data we acquire through satellites, radars and sensors consists of important geographical information that can be used for remote sensing applications such as region planning, disaster management. Spatial data classification and object recognition are important tasks for many applications. However, classifying objects and identifying them manually from images is a difficult task. Object recognition is often considered as a classification problem, this task can be performed using machine-learning techniques. Despite of many machine-learning algorithms, the classification is done using supervised classifiers such as Support Vector Machines (SVM) as the area of interest is known. We proposed a classification method, which considers neighboring pixels in a region for feature extraction and it evaluates classifications precisely according to neighboring classes for semantic interpretation of region of interest (ROI). A dataset has been created for training and testing purpose; we generated the attributes by considering pixel intensity values and mean values of reflectance. We demonstrated the benefits of using knowledge discovery and data-mining techniques, which can be on image data for accurate information extraction and classification from high spatial resolution remote sensing imagery.Keywords: remote sensing, object recognition, classification, data mining, waterbody identification, feature extraction
Procedia PDF Downloads 339515 Monitoring the Drying and Grinding Process during Production of Celitement through a NIR-Spectroscopy Based Approach
Authors: Carolin Lutz, Jörg Matthes, Patrick Waibel, Ulrich Precht, Krassimir Garbev, Günter Beuchle, Uwe Schweike, Peter Stemmermann, Hubert B. Keller
Abstract:
Online measurement of the product quality is a challenging task in cement production, especially in the production of Celitement, a novel environmentally friendly hydraulic binder. The mineralogy and chemical composition of clinker in ordinary Portland cement production is measured by X-ray diffraction (XRD) and X ray fluorescence (XRF), where only crystalline constituents can be detected. But only a small part of the Celitement components can be measured via XRD, because most constituents have an amorphous structure. This paper describes the development of algorithms suitable for an on-line monitoring of the final processing step of Celitement based on NIR-data. For calibration intermediate products were dried at different temperatures and ground for variable durations. The products were analyzed using XRD and thermogravimetric analyses together with NIR-spectroscopy to investigate the dependency between the drying and the milling processes on one and the NIR-signal on the other side. As a result, different characteristic parameters have been defined. A short overview of the Celitement process and the challenging tasks of the online measurement and evaluation of the product quality will be presented. Subsequently, methods for systematic development of near-infrared calibration models and the determination of the final calibration model will be introduced. The application of the model on experimental data illustrates that NIR-spectroscopy allows for a quick and sufficiently exact determination of crucial process parameters.Keywords: calibration model, celitement, cementitious material, NIR spectroscopy
Procedia PDF Downloads 500514 Algorithms for Run-Time Task Mapping in NoC-Based Heterogeneous MPSoCs
Authors: M. K. Benhaoua, A. K. Singh, A. E. Benyamina, P. Boulet
Abstract:
Mapping parallelized tasks of applications onto these MPSoCs can be done either at design time (static) or at run-time (dynamic). Static mapping strategies find the best placement of tasks at design-time, and hence, these are not suitable for dynamic workload and seem incapable of runtime resource management. The number of tasks or applications executing in MPSoC platform can exceed the available resources, requiring efficient run-time mapping strategies to meet these constraints. This paper describes a new Spiral Dynamic Task Mapping heuristic for mapping applications onto NoC-based Heterogeneous MPSoC. This heuristic is based on packing strategy and routing Algorithm proposed also in this paper. Heuristic try to map the tasks of an application in a clustering region to reduce the communication overhead between the communicating tasks. The heuristic proposed in this paper attempts to map the tasks of an application that are most related to each other in a spiral manner and to find the best possible path load that minimizes the communication overhead. In this context, we have realized a simulation environment for experimental evaluations to map applications with varying number of tasks onto an 8x8 NoC-based Heterogeneous MPSoCs platform, we demonstrate that the new mapping heuristics with the new modified dijkstra routing algorithm proposed are capable of reducing the total execution time and energy consumption of applications when compared to state-of-the-art run-time mapping heuristics reported in the literature.Keywords: multiprocessor system on chip, MPSoC, network on chip, NoC, heterogeneous architectures, run-time mapping heuristics, routing algorithm
Procedia PDF Downloads 489513 A Machine Learning Based Framework for Education Levelling in Multicultural Countries: UAE as a Case Study
Authors: Shatha Ghareeb, Rawaa Al-Jumeily, Thar Baker
Abstract:
In Abu Dhabi, there are many different education curriculums where sector of private schools and quality assurance is supervising many private schools in Abu Dhabi for many nationalities. As there are many different education curriculums in Abu Dhabi to meet expats’ needs, there are different requirements for registration and success. In addition, there are different age groups for starting education in each curriculum. In fact, each curriculum has a different number of years, assessment techniques, reassessment rules, and exam boards. Currently, students that transfer curriculums are not being placed in the right year group due to different start and end dates of each academic year and their date of birth for each year group is different for each curriculum and as a result, we find students that are either younger or older for that year group which therefore creates gaps in their learning and performance. In addition, there is not a way of storing student data throughout their academic journey so that schools can track the student learning process. In this paper, we propose to develop a computational framework applicable in multicultural countries such as UAE in which multi-education systems are implemented. The ultimate goal is to use cloud and fog computing technology integrated with Artificial Intelligence techniques of Machine Learning to aid in a smooth transition when assigning students to their year groups, and provide leveling and differentiation information of students who relocate from a particular education curriculum to another, whilst also having the ability to store and access student data from anywhere throughout their academic journey.Keywords: admissions, algorithms, cloud computing, differentiation, fog computing, levelling, machine learning
Procedia PDF Downloads 142512 Performance Comparison of Different Regression Methods for a Polymerization Process with Adaptive Sampling
Authors: Florin Leon, Silvia Curteanu
Abstract:
Developing complete mechanistic models for polymerization reactors is not easy, because complex reactions occur simultaneously; there is a large number of kinetic parameters involved and sometimes the chemical and physical phenomena for mixtures involving polymers are poorly understood. To overcome these difficulties, empirical models based on sampled data can be used instead, namely regression methods typical of machine learning field. They have the ability to learn the trends of a process without any knowledge about its particular physical and chemical laws. Therefore, they are useful for modeling complex processes, such as the free radical polymerization of methyl methacrylate achieved in a batch bulk process. The goal is to generate accurate predictions of monomer conversion, numerical average molecular weight and gravimetrical average molecular weight. This process is associated with non-linear gel and glass effects. For this purpose, an adaptive sampling technique is presented, which can select more samples around the regions where the values have a higher variation. Several machine learning methods are used for the modeling and their performance is compared: support vector machines, k-nearest neighbor, k-nearest neighbor and random forest, as well as an original algorithm, large margin nearest neighbor regression. The suggested method provides very good results compared to the other well-known regression algorithms.Keywords: batch bulk methyl methacrylate polymerization, adaptive sampling, machine learning, large margin nearest neighbor regression
Procedia PDF Downloads 304511 Contrastive Analysis of Parameters Registered in Training Rowers and the Impact on the Olympic Performance
Authors: Gheorghe Braniste
Abstract:
The management of the training process in sports is closely related to the awareness of the close connection between performance and the morphological, functional and psychological characteristics of the athlete's body. Achieving high results in Olympic sports is influenced, on the one hand, by the genetically determined characteristics of the body and, on the other hand, by the morphological, functional and motor abilities of the athlete. Taking into account the importance of properly understanding the evolutionary specificity of athletes to assess their competitive potential, this study provides a comparative analysis of the parameters that characterize the growth and development of the level of adaptation of sweeping rowers, considering the growth interval between 12 and 20 years. The study established that, in the multi-annual training process, the bodies of the targeted athletes register significant adaptive changes while analyzing parameters of the morphological, functional, psychomotor and sports-technical spheres. As a result of the influence of physical efforts, both specific and non-specific, there is an increase in the adaptability of the body, its transfer to a much higher level of functionality within the parameters, useful and economical adaptive reactions influenced by environmental factors, be they internal or external. The research was carried out for 7 years, on a group of 28 athletes, following their evolution and recording the specific parameters of each age stage. In order to determine the level of physical, morpho-functional, psychomotor development and technical training of rowers, the screening data were applied at the State University of Physical Education and Sports in the Republic of Moldova. During the research, measurements were made on the waist, in the standing and sitting position, arm span, weight, circumference and chest perimeter, vital capacity of the lungs, with the subsequent determination of the vital index (tolerance level to oxygen deficiency in venous blood in Stange and Genchi breath-taking tests that characterize the level of oxygen saturation, absolute and relative strength of the hand and back, calculation of body mass and morphological maturity indices (Kettle index), body surface area (body gait), psychomotor tests (Romberg test), test-tepping 10 s., reaction to a moving object, visual and auditory-motor reaction, recording of technical parameters of rowing on a competitive distance of 200 m. At the end of the study it was found that highly performance is sports is to be associated on the one hand with the genetically determined characteristics of the body and, on the other hand, with favorable adaptive reactions and energy saving, as well as morphofunctional changes influenced by internal and external environmental factors. The importance of the results obtained at the end of the study was positively reflected in obtaining the maximum level of training of athletes in order to demonstrate performance in large-scale competitions and mostly in the Olympic Games.Keywords: olympics, parameters, performance, peak
Procedia PDF Downloads 123510 Machine Learning Models for the Prediction of Heating and Cooling Loads of a Residential Building
Authors: Aaditya U. Jhamb
Abstract:
Due to the current energy crisis that many countries are battling, energy-efficient buildings are the subject of extensive research in the modern technological era because of growing worries about energy consumption and its effects on the environment. The paper explores 8 factors that help determine energy efficiency for a building: (relative compactness, surface area, wall area, roof area, overall height, orientation, glazing area, and glazing area distribution), with Tsanas and Xifara providing a dataset. The data set employed 768 different residential building models to anticipate heating and cooling loads with a low mean squared error. By optimizing these characteristics, machine learning algorithms may assess and properly forecast a building's heating and cooling loads, lowering energy usage while increasing the quality of people's lives. As a result, the paper studied the magnitude of the correlation between these input factors and the two output variables using various statistical methods of analysis after determining which input variable was most closely associated with the output loads. The most conclusive model was the Decision Tree Regressor, which had a mean squared error of 0.258, whilst the least definitive model was the Isotonic Regressor, which had a mean squared error of 21.68. This paper also investigated the KNN Regressor and the Linear Regression, which had to mean squared errors of 3.349 and 18.141, respectively. In conclusion, the model, given the 8 input variables, was able to predict the heating and cooling loads of a residential building accurately and precisely.Keywords: energy efficient buildings, heating load, cooling load, machine learning models
Procedia PDF Downloads 95509 Designing and Prototyping Permanent Magnet Generators for Wind Energy
Authors: T. Asefi, J. Faiz, M. A. Khan
Abstract:
This paper introduces dual rotor axial flux machines with surface mounted and spoke type ferrite permanent magnets with concentrated windings; they are introduced as alternatives to a generator with surface mounted Nd-Fe-B magnets. The output power, voltage, speed and air gap clearance for all the generators are identical. The machine designs are optimized for minimum mass using a population-based algorithm, assuming the same efficiency as the Nd-Fe-B machine. A finite element analysis (FEA) is applied to predict the performance, emf, developed torque, cogging torque, no load losses, leakage flux and efficiency of both ferrite generators and that of the Nd-Fe-B generator. To minimize cogging torque, different rotor pole topologies and different pole arc to pole pitch ratios are investigated by means of 3D FEA. It was found that the surface mounted ferrite generator topology is unable to develop the nominal electromagnetic torque, and has higher torque ripple and is heavier than the spoke type machine. Furthermore, it was shown that the spoke type ferrite permanent magnet generator has favorable performance and could be an alternative to rare-earth permanent magnet generators, particularly in wind energy applications. Finally, the analytical and numerical results are verified using experimental results.Keywords: axial flux, permanent magnet generator, dual rotor, ferrite permanent magnet generator, finite element analysis, wind turbines, cogging torque, population-based algorithms
Procedia PDF Downloads 151508 Sequential Pattern Mining from Data of Medical Record with Sequential Pattern Discovery Using Equivalent Classes (SPADE) Algorithm (A Case Study : Bolo Primary Health Care, Bima)
Authors: Rezky Rifaini, Raden Bagus Fajriya Hakim
Abstract:
This research was conducted at the Bolo primary health Care in Bima Regency. The purpose of the research is to find out the association pattern that is formed of medical record database from Bolo Primary health care’s patient. The data used is secondary data from medical records database PHC. Sequential pattern mining technique is the method that used to analysis. Transaction data generated from Patient_ID, Check_Date and diagnosis. Sequential Pattern Discovery Algorithms Using Equivalent Classes (SPADE) is one of the algorithm in sequential pattern mining, this algorithm find frequent sequences of data transaction, using vertical database and sequence join process. Results of the SPADE algorithm is frequent sequences that then used to form a rule. It technique is used to find the association pattern between items combination. Based on association rules sequential analysis with SPADE algorithm for minimum support 0,03 and minimum confidence 0,75 is gotten 3 association sequential pattern based on the sequence of patient_ID, check_Date and diagnosis data in the Bolo PHC.Keywords: diagnosis, primary health care, medical record, data mining, sequential pattern mining, SPADE algorithm
Procedia PDF Downloads 401