Search results for: estimation algorithms
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 3725

Search results for: estimation algorithms

3305 Research of Data Cleaning Methods Based on Dependency Rules

Authors: Yang Bao, Shi Wei Deng, WangQun Lin

Abstract:

This paper introduces the concept and principle of data cleaning, analyzes the types and causes of dirty data, and proposes several key steps of typical cleaning process, puts forward a well scalability and versatility data cleaning framework, in view of data with attribute dependency relation, designs several of violation data discovery algorithms by formal formula, which can obtain inconsistent data to all target columns with condition attribute dependent no matter data is structured (SQL) or unstructured (NoSQL), and gives 6 data cleaning methods based on these algorithms.

Keywords: data cleaning, dependency rules, violation data discovery, data repair

Procedia PDF Downloads 545
3304 A Near-Optimal Domain Independent Approach for Detecting Approximate Duplicates

Authors: Abdelaziz Fellah, Allaoua Maamir

Abstract:

We propose a domain-independent merging-cluster filter approach complemented with a set of algorithms for identifying approximate duplicate entities efficiently and accurately within a single and across multiple data sources. The near-optimal merging-cluster filter (MCF) approach is based on the Monge-Elkan well-tuned algorithm and extended with an affine variant of the Smith-Waterman similarity measure. Then we present constant, variable, and function threshold algorithms that work conceptually in a divide-merge filtering fashion for detecting near duplicates as hierarchical clusters along with their corresponding representatives. The algorithms take recursive refinement approaches in the spirit of filtering, merging, and updating, cluster representatives to detect approximate duplicates at each level of the cluster tree. Experiments show a high effectiveness and accuracy of the MCF approach in detecting approximate duplicates by outperforming the seminal Monge-Elkan’s algorithm on several real-world benchmarks and generated datasets.

Keywords: data mining, data cleaning, approximate duplicates, near-duplicates detection, data mining applications and discovery

Procedia PDF Downloads 368
3303 Project Progress Prediction in Software Devlopment Integrating Time Prediction Algorithms and Large Language Modeling

Authors: Dong Wu, Michael Grenn

Abstract:

Managing software projects effectively is crucial for meeting deadlines, ensuring quality, and managing resources well. Traditional methods often struggle with predicting project timelines accurately due to uncertain schedules and complex data. This study addresses these challenges by combining time prediction algorithms with Large Language Models (LLMs). It makes use of real-world software project data to construct and validate a model. The model takes detailed project progress data such as task completion dynamic, team Interaction and development metrics as its input and outputs predictions of project timelines. To evaluate the effectiveness of this model, a comprehensive methodology is employed, involving simulations and practical applications in a variety of real-world software project scenarios. This multifaceted evaluation strategy is designed to validate the model's significant role in enhancing forecast accuracy and elevating overall management efficiency, particularly in complex software project environments. The results indicate that the integration of time prediction algorithms with LLMs has the potential to optimize software project progress management. These quantitative results suggest the effectiveness of the method in practical applications. In conclusion, this study demonstrates that integrating time prediction algorithms with LLMs can significantly improve the predictive accuracy and efficiency of software project management. This offers an advanced project management tool for the industry, with the potential to improve operational efficiency, optimize resource allocation, and ensure timely project completion.

Keywords: software project management, time prediction algorithms, large language models (LLMS), forecast accuracy, project progress prediction

Procedia PDF Downloads 57
3302 Building Information Modeling-Based Approach for Automatic Quantity Take-off and Cost Estimation

Authors: Lo Kar Yin, Law Ka Mei

Abstract:

Architectural, engineering, construction and operations (AECO) industry practitioners have been well adapting to the dynamic construction market from the fundamental training of its discipline. As further triggered by the pandemic since 2019, great steps are taken in virtual environment and the best collaboration is strived with project teams without boundaries. With adoption of Building Information Modeling-based approach and qualitative analysis, this paper is to review quantity take-off and cost estimation process through modeling techniques in liaison with suppliers, fabricators, subcontractors, contractors, designers, consultants and services providers in the construction industry value chain for automatic project cost budgeting, project cost control and cost evaluation on design options of in-situ reinforced-concrete construction and Modular Integrated Construction (MiC) at design stage, variation of works and cash flow/spending analysis at construction stage as far as practicable, with a view to sharing the findings for enhancing mutual trust and co-operation among AECO industry practitioners. It is to foster development through a common prototype of design and build project delivery method in NEC Engineering and Construction Contract (ECC) Options A and C.

Keywords: building information modeling, cost estimation, quantity take-off, modeling techniques

Procedia PDF Downloads 163
3301 Evaluation of Classification Algorithms for Diagnosis of Asthma in Iranian Patients

Authors: Taha SamadSoltani, Peyman Rezaei Hachesu, Marjan GhaziSaeedi, Maryam Zolnoori

Abstract:

Introduction: Data mining defined as a process to find patterns and relationships along data in the database to build predictive models. Application of data mining extended in vast sectors such as the healthcare services. Medical data mining aims to solve real-world problems in the diagnosis and treatment of diseases. This method applies various techniques and algorithms which have different accuracy and precision. The purpose of this study was to apply knowledge discovery and data mining techniques for the diagnosis of asthma based on patient symptoms and history. Method: Data mining includes several steps and decisions should be made by the user which starts by creation of an understanding of the scope and application of previous knowledge in this area and identifying KD process from the point of view of the stakeholders and finished by acting on discovered knowledge using knowledge conducting, integrating knowledge with other systems and knowledge documenting and reporting.in this study a stepwise methodology followed to achieve a logical outcome. Results: Sensitivity, Specifity and Accuracy of KNN, SVM, Naïve bayes, NN, Classification tree and CN2 algorithms and related similar studies was evaluated and ROC curves were plotted to show the performance of the system. Conclusion: The results show that we can accurately diagnose asthma, approximately ninety percent, based on the demographical and clinical data. The study also showed that the methods based on pattern discovery and data mining have a higher sensitivity compared to expert and knowledge-based systems. On the other hand, medical guidelines and evidence-based medicine should be base of diagnostics methods, therefore recommended to machine learning algorithms used in combination with knowledge-based algorithms.

Keywords: asthma, datamining, classification, machine learning

Procedia PDF Downloads 431
3300 Stochastic Matrices and Lp Norms for Ill-Conditioned Linear Systems

Authors: Riadh Zorgati, Thomas Triboulet

Abstract:

In quite diverse application areas such as astronomy, medical imaging, geophysics or nondestructive evaluation, many problems related to calibration, fitting or estimation of a large number of input parameters of a model from a small amount of output noisy data, can be cast as inverse problems. Due to noisy data corruption, insufficient data and model errors, most inverse problems are ill-posed in a Hadamard sense, i.e. existence, uniqueness and stability of the solution are not guaranteed. A wide class of inverse problems in physics relates to the Fredholm equation of the first kind. The ill-posedness of such inverse problem results, after discretization, in a very ill-conditioned linear system of equations, the condition number of the associated matrix can typically range from 109 to 1018. This condition number plays the role of an amplifier of uncertainties on data during inversion and then, renders the inverse problem difficult to handle numerically. Similar problems appear in other areas such as numerical optimization when using interior points algorithms for solving linear programs leads to face ill-conditioned systems of linear equations. Devising efficient solution approaches for such system of equations is therefore of great practical interest. Efficient iterative algorithms are proposed for solving a system of linear equations. The approach is based on a preconditioning of the initial matrix of the system with an approximation of a generalized inverse leading to a stochastic preconditioned matrix. This approach, valid for non-negative matrices, is first extended to hermitian, semi-definite positive matrices and then generalized to any complex rectangular matrices. The main results obtained are as follows: 1) We are able to build a generalized inverse of any complex rectangular matrix which satisfies the convergence condition requested in iterative algorithms for solving a system of linear equations. This completes the (short) list of generalized inverse having this property, after Kaczmarz and Cimmino matrices. Theoretical results on both the characterization of the type of generalized inverse obtained and the convergence are derived. 2) Thanks to its properties, this matrix can be efficiently used in different solving schemes as Richardson-Tanabe or preconditioned conjugate gradients. 3) By using Lp norms, we propose generalized Kaczmarz’s type matrices. We also show how Cimmino's matrix can be considered as a particular case consisting in choosing the Euclidian norm in an asymmetrical structure. 4) Regarding numerical results obtained on some pathological well-known test-cases (Hilbert, Nakasaka, …), some of the proposed algorithms are empirically shown to be more efficient on ill-conditioned problems and more robust to error propagation than the known classical techniques we have tested (Gauss, Moore-Penrose inverse, minimum residue, conjugate gradients, Kaczmarz, Cimmino). We end on a very early prospective application of our approach based on stochastic matrices aiming at computing some parameters (such as the extreme values, the mean, the variance, …) of the solution of a linear system prior to its resolution. Such an approach, if it were to be efficient, would be a source of information on the solution of a system of linear equations.

Keywords: conditioning, generalized inverse, linear system, norms, stochastic matrix

Procedia PDF Downloads 118
3299 Implicit Force Control of a Position Controlled Robot - A Comparison with Explicit Algorithms

Authors: Alexander Winkler, Jozef Suchý

Abstract:

This paper investigates simple implicit force control algorithms realizable with industrial robots. A lot of approaches already published are difficult to implement in commercial robot controllers, because the access to the robot joint torques is necessary or the complete dynamic model of the manipulator is used. In the past we already deal with explicit force control of a position controlled robot. Well known schemes of implicit force control are stiffness control, damping control and impedance control. Using such algorithms the contact force cannot be set directly. It is further the result of controller impedance, environment impedance and the commanded robot motion/position. The relationships of these properties are worked out in this paper in detail for the chosen implicit approaches. They have been adapted to be implementable on a position controlled robot. The behaviors of stiffness control and damping control are verified by practical experiments. For this purpose a suitable test bed was configured. Using the full mechanical impedance within the controller structure will not be practical in the case when the robot is in physical contact with the environment. This fact will be verified by simulation.

Keywords: robot force control, stiffness control, damping control, impedance control, stability

Procedia PDF Downloads 505
3298 Enhancing Athlete Training using Real Time Pose Estimation with Neural Networks

Authors: Jeh Patel, Chandrahas Paidi, Ahmed Hambaba

Abstract:

Traditional methods for analyzing athlete movement often lack the detail and immediacy required for optimal training. This project aims to address this limitation by developing a Real-time human pose estimation system specifically designed to enhance athlete training across various sports. This system leverages the power of convolutional neural networks (CNNs) to provide a comprehensive and immediate analysis of an athlete’s movement patterns during training sessions. The core architecture utilizes dilated convolutions to capture crucial long-range dependencies within video frames. Combining this with the robust encoder-decoder architecture to further refine pose estimation accuracy. This capability is essential for precise joint localization across the diverse range of athletic poses encountered in different sports. Furthermore, by quantifying movement efficiency, power output, and range of motion, the system provides data-driven insights that can be used to optimize training programs. Pose estimation data analysis can also be used to develop personalized training plans that target specific weaknesses identified in an athlete’s movement patterns. To overcome the limitations posed by outdoor environments, the project employs strategies such as multi-camera configurations or depth sensing techniques. These approaches can enhance pose estimation accuracy in challenging lighting and occlusion scenarios, where pose estimation accuracy in challenging lighting and occlusion scenarios. A dataset is collected From the labs of Martin Luther King at San Jose State University. The system is evaluated through a series of tests that measure its efficiency and accuracy in real-world scenarios. Results indicate a high level of precision in recognizing different poses, substantiating the potential of this technology in practical applications. Challenges such as enhancing the system’s ability to operate in varied environmental conditions and further expanding the dataset for training were identified and discussed. Future work will refine the model’s adaptability and incorporate haptic feedback to enhance the interactivity and richness of the user experience. This project demonstrates the feasibility of an advanced pose detection model and lays the groundwork for future innovations in assistive enhancement technologies.

Keywords: computer vision, deep learning, human pose estimation, U-NET, CNN

Procedia PDF Downloads 16
3297 Performance of Non-Deterministic Structural Optimization Algorithms Applied to a Steel Truss Structure

Authors: Ersilio Tushaj

Abstract:

The efficient solution that satisfies the optimal condition is an important issue in the structural engineering design problem. The new codes of structural design consist in design methodology that looks after the exploitation of the total resources of the construction material. In recent years some non-deterministic or meta-heuristic structural optimization algorithms have been developed widely in the research community. These methods search the optimum condition starting from the simulation of a natural phenomenon, such as survival of the fittest, the immune system, swarm intelligence or the cooling process of molten metal through annealing. Among these techniques the most known are: the genetic algorithms, simulated annealing, evolution strategies, particle swarm optimization, tabu search, ant colony optimization, harmony search and big bang crunch optimization. In this study, five of these algorithms are applied for the optimum weight design of a steel truss structure with variable geometry but fixed topology. The design process selects optimum distances and size sections from a set of commercial steel profiles. In the formulation of the design problem are considered deflection limitations, buckling and allowable stress constraints. The approach is repeated starting from different initial populations. The design problem topology is taken from an existing steel structure. The optimization process helps the engineer to achieve good final solutions, avoiding the repetitive evaluation of alternative designs in a time consuming process. The algorithms used for the application, the results of the optimal solutions, the number of iterations and the minimal weight designs, will be reported in the paper. Based on these results, it would be estimated, the amount of the steel that could be saved by applying structural analysis combined with non-deterministic optimization methods.

Keywords: structural optimization, non-deterministic methods, truss structures, steel truss

Procedia PDF Downloads 207
3296 Incorporating Multiple Supervised Learning Algorithms for Effective Intrusion Detection

Authors: Umar Albalawi, Sang C. Suh, Jinoh Kim

Abstract:

As internet continues to expand its usage with an enormous number of applications, cyber-threats have significantly increased accordingly. Thus, accurate detection of malicious traffic in a timely manner is a critical concern in today’s Internet for security. One approach for intrusion detection is to use Machine Learning (ML) techniques. Several methods based on ML algorithms have been introduced over the past years, but they are largely limited in terms of detection accuracy and/or time and space complexity to run. In this work, we present a novel method for intrusion detection that incorporates a set of supervised learning algorithms. The proposed technique provides high accuracy and outperforms existing techniques that simply utilizes a single learning method. In addition, our technique relies on partial flow information (rather than full information) for detection, and thus, it is light-weight and desirable for online operations with the property of early identification. With the mid-Atlantic CCDC intrusion dataset publicly available, we show that our proposed technique yields a high degree of detection rate over 99% with a very low false alarm rate (0.4%).

Keywords: intrusion detection, supervised learning, traffic classification, computer networks

Procedia PDF Downloads 331
3295 Short Text Classification Using Part of Speech Feature to Analyze Students' Feedback of Assessment Components

Authors: Zainab Mutlaq Ibrahim, Mohamed Bader-El-Den, Mihaela Cocea

Abstract:

Students' textual feedback can hold unique patterns and useful information about learning process, it can hold information about advantages and disadvantages of teaching methods, assessment components, facilities, and other aspects of teaching. The results of analysing such a feedback can form a key point for institutions’ decision makers to advance and update their systems accordingly. This paper proposes a data mining framework for analysing end of unit general textual feedback using part of speech feature (PoS) with four machine learning algorithms: support vector machines, decision tree, random forest, and naive bays. The proposed framework has two tasks: first, to use the above algorithms to build an optimal model that automatically classifies the whole data set into two subsets, one subset is tailored to assessment practices (assessment related), and the other one is the non-assessment related data. Second task to use the same algorithms to build an optimal model for whole data set, and the new data subsets to automatically detect their sentiment. The significance of this paper is to compare the performance of the above four algorithms using part of speech feature to the performance of the same algorithms using n-grams feature. The paper follows Knowledge Discovery and Data Mining (KDDM) framework to construct the classification and sentiment analysis models, which is understanding the assessment domain, cleaning and pre-processing the data set, selecting and running the data mining algorithm, interpreting mined patterns, and consolidating the discovered knowledge. The results of this paper experiments show that both models which used both features performed very well regarding first task. But regarding the second task, models that used part of speech feature has underperformed in comparison with models that used unigrams and bigrams.

Keywords: assessment, part of speech, sentiment analysis, student feedback

Procedia PDF Downloads 122
3294 Comparative Analysis of Reinforcement Learning Algorithms for Autonomous Driving

Authors: Migena Mana, Ahmed Khalid Syed, Abdul Malik, Nikhil Cherian

Abstract:

In recent years, advancements in deep learning enabled researchers to tackle the problem of self-driving cars. Car companies use huge datasets to train their deep learning models to make autonomous cars a reality. However, this approach has certain drawbacks in that the state space of possible actions for a car is so huge that there cannot be a dataset for every possible road scenario. To overcome this problem, the concept of reinforcement learning (RL) is being investigated in this research. Since the problem of autonomous driving can be modeled in a simulation, it lends itself naturally to the domain of reinforcement learning. The advantage of this approach is that we can model different and complex road scenarios in a simulation without having to deploy in the real world. The autonomous agent can learn to drive by finding the optimal policy. This learned model can then be easily deployed in a real-world setting. In this project, we focus on three RL algorithms: Q-learning, Deep Deterministic Policy Gradient (DDPG), and Proximal Policy Optimization (PPO). To model the environment, we have used TORCS (The Open Racing Car Simulator), which provides us with a strong foundation to test our model. The inputs to the algorithms are the sensor data provided by the simulator such as velocity, distance from side pavement, etc. The outcome of this research project is a comparative analysis of these algorithms. Based on the comparison, the PPO algorithm gives the best results. When using PPO algorithm, the reward is greater, and the acceleration, steering angle and braking are more stable compared to the other algorithms, which means that the agent learns to drive in a better and more efficient way in this case. Additionally, we have come up with a dataset taken from the training of the agent with DDPG and PPO algorithms. It contains all the steps of the agent during one full training in the form: (all input values, acceleration, steering angle, break, loss, reward). This study can serve as a base for further complex road scenarios. Furthermore, it can be enlarged in the field of computer vision, using the images to find the best policy.

Keywords: autonomous driving, DDPG (deep deterministic policy gradient), PPO (proximal policy optimization), reinforcement learning

Procedia PDF Downloads 125
3293 Markowitz and Implementation of a Multi-Objective Evolutionary Technique Applied to the Colombia Stock Exchange (2009-2015)

Authors: Feijoo E. Colomine Duran, Carlos E. Peñaloza Corredor

Abstract:

There modeling component selection financial investment (Portfolio) a variety of problems that can be addressed with optimization techniques under evolutionary schemes. For his feature, the problem of selection of investment components of a dichotomous relationship between two elements that are opposed: The Portfolio Performance and Risk presented by choosing it. This relationship was modeled by Markowitz through a media problem (Performance) - variance (risk), ie must Maximize Performance and Minimize Risk. This research included the study and implementation of multi-objective evolutionary techniques to solve these problems, taking as experimental framework financial market equities Colombia Stock Exchange between 2009-2015. Comparisons three multiobjective evolutionary algorithms, namely the Nondominated Sorting Genetic Algorithm II (NSGA-II), the Strength Pareto Evolutionary Algorithm 2 (SPEA2) and Indicator-Based Selection in Multiobjective Search (IBEA) were performed using two measures well known performance: The Hypervolume indicator and R_2 indicator, also it became a nonparametric statistical analysis and the Wilcoxon rank-sum test. The comparative analysis also includes an evaluation of the financial efficiency of the investment portfolio chosen by the implementation of various algorithms through the Sharpe ratio. It is shown that the portfolio provided by the implementation of the algorithms mentioned above is very well located between the different stock indices provided by the Colombia Stock Exchange.

Keywords: finance, optimization, portfolio, Markowitz, evolutionary algorithms

Procedia PDF Downloads 282
3292 Internet of Things based AquaSwach Water Purifier

Authors: Karthiyayini J., Arpita Chowdary Vantipalli, Darshana Sailu Tanti, Malvika Ravi Kudari, Krtin Kannan

Abstract:

This paper is propelled from the generally existing undertaking of the smart water quality management, which addresses an IoT (Internet of things) based brilliant water quality observing (SWQM) framework which we call it AquaSwach that guides in the ceaseless estimation of water conditions dependent on five actual boundaries i.e., temperature, pH, electric conductivity and turbidity properties and water virtue estimation each time you drink water. Six sensors relate to Arduino-Mega in a discrete way to detect the water parameters. Extracted data from the sensors are transmitted to a desktop application developed in the NET platform and compared with the WHO (World Health Organization) standard values.

Keywords: AquaSwach, IoT, WHO, water quality

Procedia PDF Downloads 197
3291 Long Term Examination of the Profitability Estimation Focused on Benefits

Authors: Stephan Printz, Kristina Lahl, René Vossen, Sabina Jeschke

Abstract:

Strategic investment decisions are characterized by high innovation potential and long-term effects on the competitiveness of enterprises. Due to the uncertainty and risks involved in this complex decision making process, the need arises for well-structured support activities. A method that considers cost and the long-term added value is the cost-benefit effectiveness estimation. One of those methods is the “profitability estimation focused on benefits – PEFB”-method developed at the Institute of Management Cybernetics at RWTH Aachen University. The method copes with the challenges associated with strategic investment decisions by integrating long-term non-monetary aspects whilst also mapping the chronological sequence of an investment within the organization’s target system. Thus, this method is characterized as a holistic approach for the evaluation of costs and benefits of an investment. This participation-oriented method was applied to business environments in many workshops. The results of the workshops are a library of more than 96 cost aspects, as well as 122 benefit aspects. These aspects are preprocessed and comparatively analyzed with regards to their alignment to a series of risk levels. For the first time, an accumulation and a distribution of cost and benefit aspects regarding their impact and probability of occurrence are given. The results give evidence that the PEFB-method combines precise measures of financial accounting with the incorporation of benefits. Finally, the results constitute the basics for using information technology and data science for decision support when applying within the PEFB-method.

Keywords: cost-benefit analysis, multi-criteria decision, profitability estimation focused on benefits, risk and uncertainty analysis

Procedia PDF Downloads 429
3290 A Flexible Pareto Distribution Using α-Power Transformation

Authors: Shumaila Ehtisham

Abstract:

In Statistical Distribution Theory, considering an additional parameter to classical distributions is a usual practice. In this study, a new distribution referred to as α-Power Pareto distribution is introduced by including an extra parameter. Several properties of the proposed distribution including explicit expressions for the moment generating function, mode, quantiles, entropies and order statistics are obtained. Unknown parameters have been estimated by using maximum likelihood estimation technique. Two real datasets have been considered to examine the usefulness of the proposed distribution. It has been observed that α-Power Pareto distribution outperforms while compared to different variants of Pareto distribution on the basis of model selection criteria.

Keywords: α-power transformation, maximum likelihood estimation, moment generating function, Pareto distribution

Procedia PDF Downloads 204
3289 Combination of Unmanned Aerial Vehicle and Terrestrial Laser Scanner Data for Citrus Yield Estimation

Authors: Mohammed Hmimou, Khalid Amediaz, Imane Sebari, Nabil Bounajma

Abstract:

Annual crop production is one of the most important macroeconomic indicators for the majority of countries around the world. This information is valuable, especially for exporting countries which need a yield estimation before harvest in order to correctly plan the supply chain. When it comes to estimating agricultural yield, especially for arboriculture, conventional methods are mostly applied. In the case of the citrus industry, the sale before harvest is largely practiced, which requires an estimation of the production when the fruit is on the tree. However, conventional method based on the sampling surveys of some trees within the field is always used to perform yield estimation, and the success of this process mainly depends on the expertise of the ‘estimator agent’. The present study aims to propose a methodology based on the combination of unmanned aerial vehicle (UAV) images and terrestrial laser scanner (TLS) point cloud to estimate citrus production. During data acquisition, a fixed wing and rotatory drones, as well as a terrestrial laser scanner, were tested. After that, a pre-processing step was performed in order to generate point cloud and digital surface model. At the processing stage, a machine vision workflow was implemented to extract points corresponding to fruits from the whole tree point cloud, cluster them into fruits, and model them geometrically in a 3D space. By linking the resulting geometric properties to the fruit weight, the yield can be estimated, and the statistical distribution of fruits size can be generated. This later property, which is information required by importing countries of citrus, cannot be estimated before harvest using the conventional method. Since terrestrial laser scanner is static, data gathering using this technology can be performed over only some trees. So, integration of drone data was thought in order to estimate the yield over a whole orchard. To achieve that, features derived from drone digital surface model were linked to yield estimation by laser scanner of some trees to build a regression model that predicts the yield of a tree given its features. Several missions were carried out to collect drone and laser scanner data within citrus orchards of different varieties by testing several data acquisition parameters (fly height, images overlap, fly mission plan). The accuracy of the obtained results by the proposed methodology in comparison to the yield estimation results by the conventional method varies from 65% to 94% depending mainly on the phenological stage of the studied citrus variety during the data acquisition mission. The proposed approach demonstrates its strong potential for early estimation of citrus production and the possibility of its extension to other fruit trees.

Keywords: citrus, digital surface model, point cloud, terrestrial laser scanner, UAV, yield estimation, 3D modeling

Procedia PDF Downloads 123
3288 Hydrological, Hydraulics, Analysis and Design of the Aposto –Yirgalem Road Upgrading Project, Ethiopia

Authors: Azazhu Wassie

Abstract:

This study tried to analyze and identify the drainage pattern and catchment characteristics of the river basin and assess the impact of the hydrologic parameters (catchment area, rainfall intensity, runoff coefficient, land use, and soil type) on the referenced study area. Since there is no river gauging station near the road, even for large rivers, rainfall-runoff models are adopted for flood estimation, i.e., for catchment areas less than 50 ha, the rational method is used; for catchment areas, less than 65 km², the SCS unit hydrograph method is used; and for catchment areas greater than 65 km², HEC-HMS is adopted for flood estimation.

Keywords: Arc GIS, catchment area, land use/land cover, peak flood, rainfall intensity

Procedia PDF Downloads 0
3287 EnumTree: An Enumerative Biclustering Algorithm for DNA Microarray Data

Authors: Haifa Ben Saber, Mourad Elloumi

Abstract:

In a number of domains, like in DNA microarray data analysis, we need to cluster simultaneously rows (genes) and columns (conditions) of a data matrix to identify groups of constant rows with a group of columns. This kind of clustering is called biclustering. Biclustering algorithms are extensively used in DNA microarray data analysis. More effective biclustering algorithms are highly desirable and needed. We introduce a new algorithm called, Enumerative tree (EnumTree) for biclustering of binary microarray data. is an algorithm adopting the approach of enumerating biclusters. This algorithm extracts all biclusters consistent good quality. The main idea of ​​EnumLat is the construction of a new tree structure to represent adequately different biclusters discovered during the process of enumeration. This algorithm adopts the strategy of all biclusters at a time. The performance of the proposed algorithm is assessed using both synthetic and real DNA micryarray data, our algorithm outperforms other biclustering algorithms for binary microarray data. Biclusters with different numbers of rows. Moreover, we test the biological significance using a gene annotation web tool to show that our proposed method is able to produce biologically relevent biclusters.

Keywords: DNA microarray, biclustering, gene expression data, tree, datamining.

Procedia PDF Downloads 358
3286 Method for Evaluating the Monetary Value of a Customized Version of the Digital Twin for the Additive Manufacturing

Authors: Fabio Oettl, Sebastian Hoerbrand, Tobias Wittmeir, Johannes Schilp

Abstract:

By combining the additive manufacturing (AM)- process with digital concepts, like the digital twin (DT) or the downsized and basing concept of the digital part file (DPF), the competitiveness of additive manufacturing is enhanced and new use cases like decentral production are enabled. But in literature, one can´t find any quantitative approach for valuing the usage of a DT or DPF in AM. Out of this fact, such an approach will be developed within this paper in order to further promote or dissuade the usage of these concepts. The focus is set on the production as an early lifecycle phase, which means that the AM-production process gets analyzed regarding the potential advantages of using DPF in AM. These advantages are transferred to a monetary value with this approach. By calculating the costs of the DPF, an overall monetary value is a result. Thereon a tool, based on a simulation environment is constructed, where the algorithms are transformed into a program. The results of applying this tool show that an overall value of 20,81 € for the DPF can be realized for one special use case. For the future application of the DPF there is the recommendation to integrate especially sustainable information because out of this, a higher value of the DPF can be expected.

Keywords: additive manufacturing, digital concept costs, digital part file, digital twin, monetary value estimation

Procedia PDF Downloads 184
3285 Image Encryption Using Eureqa to Generate an Automated Mathematical Key

Authors: Halima Adel Halim Shnishah, David Mulvaney

Abstract:

Applying traditional symmetric cryptography algorithms while computing encryption and decryption provides immunity to secret keys against different attacks. One of the popular techniques generating automated secret keys is evolutionary computing by using Eureqa API tool, which got attention in 2013. In this paper, we are generating automated secret keys for image encryption and decryption using Eureqa API (tool which is used in evolutionary computing technique). Eureqa API models pseudo-random input data obtained from a suitable source to generate secret keys. The validation of generated secret keys is investigated by performing various statistical tests (histogram, chi-square, correlation of two adjacent pixels, correlation between original and encrypted images, entropy and key sensitivity). Experimental results obtained from methods including histogram analysis, correlation coefficient, entropy and key sensitivity, show that the proposed image encryption algorithms are secure and reliable, with the potential to be adapted for secure image communication applications.

Keywords: image encryption algorithms, Eureqa, statistical measurements, automated key generation

Procedia PDF Downloads 467
3284 A Novel Guided Search Based Multi-Objective Evolutionary Algorithm

Authors: A. Baviskar, C. Sandeep, K. Shankar

Abstract:

Solving Multi-objective Optimization Problems requires faster convergence and better spread. Though existing Evolutionary Algorithms (EA's) are able to achieve this, the computation effort can further be reduced by hybridizing them with innovative strategies. This study is focuses on converging to the pareto front faster while adapting the advantages of Strength Pareto Evolutionary Algorithm-II (SPEA-II) for a better spread. Two different approaches based on optimizing the objective functions independently are implemented. In the first method, the decision variables corresponding to the optima of individual objective functions are strategically used to guide the search towards the pareto front. In the second method, boundary points of the pareto front are calculated and their decision variables are seeded to the initial population. Both the methods are applied to different constrained and unconstrained multi-objective test functions. It is observed that proposed guided search based algorithm gives better convergence and diversity than several well-known existing algorithms (such as NSGA-II and SPEA-II) in considerably less number of iterations.

Keywords: boundary points, evolutionary algorithms (EA's), guided search, strength pareto evolutionary algorithm-II (SPEA-II)

Procedia PDF Downloads 252
3283 Feature Weighting Comparison Based on Clustering Centers in the Detection of Diabetic Retinopathy

Authors: Kemal Polat

Abstract:

In this paper, three feature weighting methods have been used to improve the classification performance of diabetic retinopathy (DR). To classify the diabetic retinopathy, features extracted from the output of several retinal image processing algorithms, such as image-level, lesion-specific and anatomical components, have been used and fed them into the classifier algorithms. The dataset used in this study has been taken from University of California, Irvine (UCI) machine learning repository. Feature weighting methods including the fuzzy c-means clustering based feature weighting, subtractive clustering based feature weighting, and Gaussian mixture clustering based feature weighting, have been used and compered with each other in the classification of DR. After feature weighting, five different classifier algorithms comprising multi-layer perceptron (MLP), k- nearest neighbor (k-NN), decision tree, support vector machine (SVM), and Naïve Bayes have been used. The hybrid method based on combination of subtractive clustering based feature weighting and decision tree classifier has been obtained the classification accuracy of 100% in the screening of DR. These results have demonstrated that the proposed hybrid scheme is very promising in the medical data set classification.

Keywords: machine learning, data weighting, classification, data mining

Procedia PDF Downloads 312
3282 A Method for Compression of Short Unicode Strings

Authors: Masoud Abedi, Abbas Malekpour, Peter Luksch, Mohammad Reza Mojtabaei

Abstract:

The use of short texts in communication has been greatly increasing in recent years. Applying different languages in short texts has led to compulsory use of Unicode strings. These strings need twice the space of common strings, hence, applying algorithms of compression for the purpose of accelerating transmission and reducing cost is worthwhile. Nevertheless, other compression methods like gzip, bzip2 or PAQ due to high overhead data size are not appropriate. The Huffman algorithm is one of the rare algorithms effective in reducing the size of short Unicode strings. In this paper, an algorithm is proposed for compression of very short Unicode strings. At first, every new character to be sent to a destination is inserted in the proposed mapping table. At the beginning, every character is new. In case the character is repeated for the same destination, it is not considered as a new character. Next, the new characters together with the mapping value of repeated characters are arranged through a specific technique and specially formatted to be transmitted. The results obtained from an assessment made on a set of short Persian and Arabic strings indicate that this proposed algorithm outperforms the Huffman algorithm in size reduction.

Keywords: Algorithms, Data Compression, Decoding, Encoding, Huffman Codes, Text Communication

Procedia PDF Downloads 331
3281 The Clustering of Multiple Sclerosis Subgroups through L2 Norm Multifractal Denoising Technique

Authors: Yeliz Karaca, Rana Karabudak

Abstract:

Multifractal Denoising techniques are used in the identification of significant attributes by removing the noise of the dataset. Magnetic resonance (MR) image technique is the most sensitive method so as to identify chronic disorders of the nervous system such as Multiple Sclerosis. MRI and Expanded Disability Status Scale (EDSS) data belonging to 120 individuals who have one of the subgroups of MS (Relapsing Remitting MS (RRMS), Secondary Progressive MS (SPMS), Primary Progressive MS (PPMS)) as well as 19 healthy individuals in the control group have been used in this study. The study is comprised of the following stages: (i) L2 Norm Multifractal Denoising technique, one of the multifractal technique, has been used with the application on the MS data (MRI and EDSS). In this way, the new dataset has been obtained. (ii) The new MS dataset obtained from the MS dataset and L2 Multifractal Denoising technique has been applied to the K-Means and Fuzzy C Means clustering algorithms which are among the unsupervised methods. Thus, the clustering performances have been compared. (iii) In the identification of significant attributes in the MS dataset through the Multifractal denoising (L2 Norm) technique using K-Means and FCM algorithms on the MS subgroups and control group of healthy individuals, excellent performance outcome has been yielded. According to the clustering results based on the MS subgroups obtained in the study, successful clustering results have been obtained in the K-Means and FCM algorithms by applying the L2 norm of multifractal denoising technique for the MS dataset. Clustering performance has been more successful with the MS Dataset (L2_Norm MS Data Set) K-Means and FCM in which significant attributes are obtained by applying L2 Norm Denoising technique.

Keywords: clinical decision support, clustering algorithms, multiple sclerosis, multifractal techniques

Procedia PDF Downloads 149
3280 On the Application of Heuristics of the Traveling Salesman Problem for the Task of Restoring the DNA Matrix

Authors: Boris Melnikov, Dmitrii Chaikovskii, Elena Melnikova

Abstract:

The traveling salesman problem (TSP) is a well-known optimization problem that seeks to find the shortest possible route that visits a set of points and returns to the starting point. In this paper, we apply some heuristics of the TSP for the task of restoring the DNA matrix. This restoration problem is often considered in biocybernetics. For it, we must recover the matrix of distances between DNA sequences if not all the elements of the matrix under consideration are known at the input. We consider the possibility of using this method in the testing of distance calculation algorithms between a pair of DNAs to restore the partially filled matrix.

Keywords: optimization problems, DNA matrix, partially filled matrix, traveling salesman problem, heuristic algorithms

Procedia PDF Downloads 133
3279 MIMO Radar-Based System for Structural Health Monitoring and Geophysical Applications

Authors: Davide D’Aria, Paolo Falcone, Luigi Maggi, Aldo Cero, Giovanni Amoroso

Abstract:

The paper presents a methodology for real-time structural health monitoring and geophysical applications. The key elements of the system are a high performance MIMO RADAR sensor, an optical camera and a dedicated set of software algorithms encompassing interferometry, tomography and photogrammetry. The MIMO Radar sensor proposed in this work, provides an extremely high sensitivity to displacements making the system able to react to tiny deformations (up to tens of microns) with a time scale which spans from milliseconds to hours. The MIMO feature of the system makes the system capable of providing a set of two-dimensional images of the observed scene, each mapped on the azimuth-range directions with noticeably resolution in both the dimensions and with an outstanding repetition rate. The back-scattered energy, which is distributed in the 3D space, is projected on a 2D plane, where each pixel has as coordinates the Line-Of-Sight distance and the cross-range azimuthal angle. At the same time, the high performing processing unit allows to sense the observed scene with remarkable refresh periods (up to milliseconds), thus opening the way for combined static and dynamic structural health monitoring. Thanks to the smart TX/RX antenna array layout, the MIMO data can be processed through a tomographic approach to reconstruct the three-dimensional map of the observed scene. This 3D point cloud is then accurately mapped on a 2D digital optical image through photogrammetric techniques, allowing for easy and straightforward interpretations of the measurements. Once the three-dimensional image is reconstructed, a 'repeat-pass' interferometric approach is exploited to provide the user of the system with high frequency three-dimensional motion/vibration estimation of each point of the reconstructed image. At this stage, the methodology leverages consolidated atmospheric correction algorithms to provide reliable displacement and vibration measurements.

Keywords: interferometry, MIMO RADAR, SAR, tomography

Procedia PDF Downloads 173
3278 Data Mining in Medicine Domain Using Decision Trees and Vector Support Machine

Authors: Djamila Benhaddouche, Abdelkader Benyettou

Abstract:

In this paper, we used data mining to extract biomedical knowledge. In general, complex biomedical data collected in studies of populations are treated by statistical methods, although they are robust, they are not sufficient in themselves to harness the potential wealth of data. For that you used in step two learning algorithms: the Decision Trees and Support Vector Machine (SVM). These supervised classification methods are used to make the diagnosis of thyroid disease. In this context, we propose to promote the study and use of symbolic data mining techniques.

Keywords: biomedical data, learning, classifier, algorithms decision tree, knowledge extraction

Procedia PDF Downloads 534
3277 Predication Model for Leukemia Diseases Based on Data Mining Classification Algorithms with Best Accuracy

Authors: Fahd Sabry Esmail, M. Badr Senousy, Mohamed Ragaie

Abstract:

In recent years, there has been an explosion in the rate of using technology that help discovering the diseases. For example, DNA microarrays allow us for the first time to obtain a "global" view of the cell. It has great potential to provide accurate medical diagnosis, to help in finding the right treatment and cure for many diseases. Various classification algorithms can be applied on such micro-array datasets to devise methods that can predict the occurrence of Leukemia disease. In this study, we compared the classification accuracy and response time among eleven decision tree methods and six rule classifier methods using five performance criteria. The experiment results show that the performance of Random Tree is producing better result. Also it takes lowest time to build model in tree classifier. The classification rules algorithms such as nearest- neighbor-like algorithm (NNge) is the best algorithm due to the high accuracy and it takes lowest time to build model in classification.

Keywords: data mining, classification techniques, decision tree, classification rule, leukemia diseases, microarray data

Procedia PDF Downloads 304
3276 MapReduce Logistic Regression Algorithms with RHadoop

Authors: Byung Ho Jung, Dong Hoon Lim

Abstract:

Logistic regression is a statistical method for analyzing a dataset in which there are one or more independent variables that determine an outcome. Logistic regression is used extensively in numerous disciplines, including the medical and social science fields. In this paper, we address the problem of estimating parameters in the logistic regression based on MapReduce framework with RHadoop that integrates R and Hadoop environment applicable to large scale data. There exist three learning algorithms for logistic regression, namely Gradient descent method, Cost minimization method and Newton-Rhapson's method. The Newton-Rhapson's method does not require a learning rate, while gradient descent and cost minimization methods need to manually pick a learning rate. The experimental results demonstrated that our learning algorithms using RHadoop can scale well and efficiently process large data sets on commodity hardware. We also compared the performance of our Newton-Rhapson's method with gradient descent and cost minimization methods. The results showed that our newton's method appeared to be the most robust to all data tested.

Keywords: big data, logistic regression, MapReduce, RHadoop

Procedia PDF Downloads 258