Search results for: data fitting
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 24604

Search results for: data fitting

24274 Cloud Design for Storing Large Amount of Data

Authors: M. Strémy, P. Závacký, P. Cuninka, M. Juhás

Abstract:

Main goal of this paper is to introduce our design of private cloud for storing large amount of data, especially pictures, and to provide good technological backend for data analysis based on parallel processing and business intelligence. We have tested hypervisors, cloud management tools, storage for storing all data and Hadoop to provide data analysis on unstructured data. Providing high availability, virtual network management, logical separation of projects and also rapid deployment of physical servers to our environment was also needed.

Keywords: cloud, glusterfs, hadoop, juju, kvm, maas, openstack, virtualization

Procedia PDF Downloads 339
24273 Predicting Growth of Eucalyptus Marginata in a Mediterranean Climate Using an Individual-Based Modelling Approach

Authors: S.K. Bhandari, E. Veneklaas, L. McCaw, R. Mazanec, K. Whitford, M. Renton

Abstract:

Eucalyptus marginata, E. diversicolor and Corymbia calophylla form widespread forests in south-west Western Australia (SWWA). These forests have economic and ecological importance, and therefore, tree growth and sustainable management are of high priority. This paper aimed to analyse and model the growth of these species at both stand and individual levels, but this presentation will focus on predicting the growth of E. Marginata at the individual tree level. More specifically, the study wanted to investigate how well individual E. marginata tree growth could be predicted by considering the diameter and height of the tree at the start of the growth period, and whether this prediction could be improved by also accounting for the competition from neighbouring trees in different ways. The study also wanted to investigate how many neighbouring trees or what neighbourhood distance needed to be considered when accounting for competition. To achieve this aim, the Pearson correlation coefficient was examined among competition indices (CIs), between CIs and dbh growth, and selected the competition index that can best predict the diameter growth of individual trees of E. marginata forest managed under different thinning regimes at Inglehope in SWWA. Furthermore, individual tree growth models were developed using simple linear regression, multiple linear regression, and linear mixed effect modelling approaches. Individual tree growth models were developed for thinned and unthinned stand separately. The developed models were validated using two approaches. In the first approach, models were validated using a subset of data that was not used in model fitting. In the second approach, the model of the one growth period was validated with the data of another growth period. Tree size (diameter and height) was a significant predictor of growth. This prediction was improved when the competition was included in the model. The fit statistic (coefficient of determination) of the model ranged from 0.31 to 0.68. The model with spatial competition indices validated as being more accurate than with non-spatial indices. The model prediction can be optimized if 10 to 15 competitors (by number) or competitors within ~10 m (by distance) from the base of the subject tree are included in the model, which can reduce the time and cost of collecting the information about the competitors. As competition from neighbours was a significant predictor with a negative effect on growth, it is recommended including neighbourhood competition when predicting growth and considering thinning treatments to minimize the effect of competition on growth. These model approaches are likely to be useful tools for the conservations and sustainable management of forests of E. marginata in SWWA. As a next step in optimizing the number and distance of competitors, further studies in larger size plots and with a larger number of plots than those used in the present study are recommended.

Keywords: competition, growth, model, thinning

Procedia PDF Downloads 111
24272 Estimation of Missing Values in Aggregate Level Spatial Data

Authors: Amitha Puranik, V. S. Binu, Seena Biju

Abstract:

Missing data is a common problem in spatial analysis especially at the aggregate level. Missing can either occur in covariate or in response variable or in both in a given location. Many missing data techniques are available to estimate the missing data values but not all of these methods can be applied on spatial data since the data are autocorrelated. Hence there is a need to develop a method that estimates the missing values in both response variable and covariates in spatial data by taking account of the spatial autocorrelation. The present study aims to develop a model to estimate the missing data points at the aggregate level in spatial data by accounting for (a) Spatial autocorrelation of the response variable (b) Spatial autocorrelation of covariates and (c) Correlation between covariates and the response variable. Estimating the missing values of spatial data requires a model that explicitly account for the spatial autocorrelation. The proposed model not only accounts for spatial autocorrelation but also utilizes the correlation that exists between covariates, within covariates and between a response variable and covariates. The precise estimation of the missing data points in spatial data will result in an increased precision of the estimated effects of independent variables on the response variable in spatial regression analysis.

Keywords: spatial regression, missing data estimation, spatial autocorrelation, simulation analysis

Procedia PDF Downloads 361
24271 Convergence Analysis of Training Two-Hidden-Layer Partially Over-Parameterized ReLU Networks via Gradient Descent

Authors: Zhifeng Kong

Abstract:

Over-parameterized neural networks have attracted a great deal of attention in recent deep learning theory research, as they challenge the classic perspective of over-fitting when the model has excessive parameters and have gained empirical success in various settings. While a number of theoretical works have been presented to demystify properties of such models, the convergence properties of such models are still far from being thoroughly understood. In this work, we study the convergence properties of training two-hidden-layer partially over-parameterized fully connected networks with the Rectified Linear Unit activation via gradient descent. To our knowledge, this is the first theoretical work to understand convergence properties of deep over-parameterized networks without the equally-wide-hidden-layer assumption and other unrealistic assumptions. We provide a probabilistic lower bound of the widths of hidden layers and proved linear convergence rate of gradient descent. We also conducted experiments on synthetic and real-world datasets to validate our theory.

Keywords: over-parameterization, rectified linear units ReLU, convergence, gradient descent, neural networks

Procedia PDF Downloads 130
24270 Hyperspectral Imagery for Tree Speciation and Carbon Mass Estimates

Authors: Jennifer Buz, Alvin Spivey

Abstract:

The most common greenhouse gas emitted through human activities, carbon dioxide (CO2), is naturally consumed by plants during photosynthesis. This process is actively being monetized by companies wishing to offset their carbon dioxide emissions. For example, companies are now able to purchase protections for vegetated land due-to-be clear cut or purchase barren land for reforestation. Therefore, by actively preventing the destruction/decay of plant matter or by introducing more plant matter (reforestation), a company can theoretically offset some of their emissions. One of the biggest issues in the carbon credit market is validating and verifying carbon offsets. There is a need for a system that can accurately and frequently ensure that the areas sold for carbon credits have the vegetation mass (and therefore for carbon offset capability) they claim. Traditional techniques for measuring vegetation mass and determining health are costly and require many person-hours. Orbital Sidekick offers an alternative approach that accurately quantifies carbon mass and assesses vegetation health through satellite hyperspectral imagery, a technique which enables us to remotely identify material composition (including plant species) and condition (e.g., health and growth stage). How much carbon a plant is capable of storing ultimately is tied to many factors, including material density (primarily species-dependent), plant size, and health (trees that are actively decaying are not effectively storing carbon). All of these factors are capable of being observed through satellite hyperspectral imagery. This abstract focuses on speciation. To build a species classification model, we matched pixels in our remote sensing imagery to plants on the ground for which we know the species. To accomplish this, we collaborated with the researchers at the Teakettle Experimental Forest. Our remote sensing data comes from our airborne “Kato” sensor, which flew over the study area and acquired hyperspectral imagery (400-2500 nm, 472 bands) at ~0.5 m/pixel resolution. Coverage of the entire teakettle experimental forest required capturing dozens of individual hyperspectral images. In order to combine these images into a mosaic, we accounted for potential variations of atmospheric conditions throughout the data collection. To do this, we ran an open source atmospheric correction routine called ISOFIT1 (Imaging Spectrometer Optiman FITting), which converted all of our remote sensing data from radiance to reflectance. A database of reflectance spectra for each of the tree species within the study area was acquired using the Teakettle stem map and the geo-referenced hyperspectral images. We found that a wide variety of machine learning classifiers were able to identify the species within our images with high (>95%) accuracy. For the most robust quantification of carbon mass and the best assessment of the health of a vegetated area, speciation is critical. Through the use of high resolution hyperspectral data, ground-truth databases, and complex analytical techniques, we are able to determine the species present within a pixel to a high degree of accuracy. These species identifications will feed directly into our carbon mass model.

Keywords: hyperspectral, satellite, carbon, imagery, python, machine learning, speciation

Procedia PDF Downloads 104
24269 Association Rules Mining and NOSQL Oriented Document in Big Data

Authors: Sarra Senhadji, Imene Benzeguimi, Zohra Yagoub

Abstract:

Big Data represents the recent technology of manipulating voluminous and unstructured data sets over multiple sources. Therefore, NOSQL appears to handle the problem of unstructured data. Association rules mining is one of the popular techniques of data mining to extract hidden relationship from transactional databases. The algorithm for finding association dependencies is well-solved with Map Reduce. The goal of our work is to reduce the time of generating of frequent itemsets by using Map Reduce and NOSQL database oriented document. A comparative study is given to evaluate the performances of our algorithm with the classical algorithm Apriori.

Keywords: Apriori, Association rules mining, Big Data, Data Mining, Hadoop, MapReduce, MongoDB, NoSQL

Procedia PDF Downloads 144
24268 Immunization-Data-Quality in Public Health Facilities in the Pastoralist Communities: A Comparative Study Evidence from Afar and Somali Regional States, Ethiopia

Authors: Melaku Tsehay

Abstract:

The Consortium of Christian Relief and Development Associations (CCRDA), and the CORE Group Polio Partners (CGPP) Secretariat have been working with Global Alliance for Vac-cines and Immunization (GAVI) to improve the immunization data quality in Afar and Somali Regional States. The main aim of this study was to compare the quality of immunization data before and after the above interventions in health facilities in the pastoralist communities in Ethiopia. To this end, a comparative-cross-sectional study was conducted on 51 health facilities. The baseline data was collected in May 2019, while the end line data in August 2021. The WHO data quality self-assessment tool (DQS) was used to collect data. A significant improvment was seen in the accuracy of the pentavalent vaccine (PT)1 (p = 0.012) data at the health posts (HP), while PT3 (p = 0.010), and Measles (p = 0.020) at the health centers (HC). Besides, a highly sig-nificant improvment was observed in the accuracy of tetanus toxoid (TT)2 data at HP (p < 0.001). The level of over- or under-reporting was found to be < 8%, at the HP, and < 10% at the HC for PT3. The data completeness was also increased from 72.09% to 88.89% at the HC. Nearly 74% of the health facilities timely reported their respective immunization data, which is much better than the baseline (7.1%) (p < 0.001). These findings may provide some hints for the policies and pro-grams targetting on improving immunization data qaulity in the pastoralist communities.

Keywords: data quality, immunization, verification factor, pastoralist region

Procedia PDF Downloads 84
24267 Identifying Critical Success Factors for Data Quality Management through a Delphi Study

Authors: Maria Paula Santos, Ana Lucas

Abstract:

Organizations support their operations and decision making on the data they have at their disposal, so the quality of these data is remarkably important and Data Quality (DQ) is currently a relevant issue, the literature being unanimous in pointing out that poor DQ can result in large costs for organizations. The literature review identified and described 24 Critical Success Factors (CSF) for Data Quality Management (DQM) that were presented to a panel of experts, who ordered them according to their degree of importance, using the Delphi method with the Q-sort technique, based on an online questionnaire. The study shows that the five most important CSF for DQM are: definition of appropriate policies and standards, control of inputs, definition of a strategic plan for DQ, organizational culture focused on quality of the data and obtaining top management commitment and support.

Keywords: critical success factors, data quality, data quality management, Delphi, Q-Sort

Procedia PDF Downloads 202
24266 Adding a Degree of Freedom to Opinion Dynamics Models

Authors: Dino Carpentras, Alejandro Dinkelberg, Michael Quayle

Abstract:

Within agent-based modeling, opinion dynamics is the field that focuses on modeling people's opinions. In this prolific field, most of the literature is dedicated to the exploration of the two 'degrees of freedom' and how they impact the model’s properties (e.g., the average final opinion, the number of final clusters, etc.). These degrees of freedom are (1) the interaction rule, which determines how agents update their own opinion, and (2) the network topology, which defines the possible interaction among agents. In this work, we show that the third degree of freedom exists. This can be used to change a model's output up to 100% of its initial value or to transform two models (both from the literature) into each other. Since opinion dynamics models are representations of the real world, it is fundamental to understand how people’s opinions can be measured. Even for abstract models (i.e., not intended for the fitting of real-world data), it is important to understand if the way of numerically representing opinions is unique; and, if this is not the case, how the model dynamics would change by using different representations. The process of measuring opinions is non-trivial as it requires transforming real-world opinion (e.g., supporting most of the liberal ideals) to a number. Such a process is usually not discussed in opinion dynamics literature, but it has been intensively studied in a subfield of psychology called psychometrics. In psychometrics, opinion scales can be converted into each other, similarly to how meters can be converted to feet. Indeed, psychometrics routinely uses both linear and non-linear transformations of opinion scales. Here, we analyze how this transformation affects opinion dynamics models. We analyze this effect by using mathematical modeling and then validating our analysis with agent-based simulations. Firstly, we study the case of perfect scales. In this way, we show that scale transformations affect the model’s dynamics up to a qualitative level. This means that if two researchers use the same opinion dynamics model and even the same dataset, they could make totally different predictions just because they followed different renormalization processes. A similar situation appears if two different scales are used to measure opinions even on the same population. This effect may be as strong as providing an uncertainty of 100% on the simulation’s output (i.e., all results are possible). Still, by using perfect scales, we show that scales transformations can be used to perfectly transform one model to another. We test this using two models from the standard literature. Finally, we test the effect of scale transformation in the case of finite precision using a 7-points Likert scale. In this way, we show how a relatively small-scale transformation introduces both changes at the qualitative level (i.e., the most shared opinion at the end of the simulation) and in the number of opinion clusters. Thus, scale transformation appears to be a third degree of freedom of opinion dynamics models. This result deeply impacts both theoretical research on models' properties and on the application of models on real-world data.

Keywords: degrees of freedom, empirical validation, opinion scale, opinion dynamics

Procedia PDF Downloads 109
24265 Mapping of Urban Micro-Climate in Lyon (France) by Integrating Complementary Predictors at Different Scales into Multiple Linear Regression Models

Authors: Lucille Alonso, Florent Renard

Abstract:

The characterizations of urban heat island (UHI) and their interactions with climate change and urban climates are the main research and public health issue, due to the increasing urbanization of the population. These solutions require a better knowledge of the UHI and micro-climate in urban areas, by combining measurements and modelling. This study is part of this topic by evaluating microclimatic conditions in dense urban areas in the Lyon Metropolitan Area (France) using a combination of data traditionally used such as topography, but also from LiDAR (Light Detection And Ranging) data, Landsat 8 satellite observation and Sentinel and ground measurements by bike. These bicycle-dependent weather data collections are used to build the database of the variable to be modelled, the air temperature, over Lyon’s hyper-center. This study aims to model the air temperature, measured during 6 mobile campaigns in Lyon in clear weather, using multiple linear regressions based on 33 explanatory variables. They are of various categories such as meteorological parameters from remote sensing, topographic variables, vegetation indices, the presence of water, humidity, bare soil, buildings, radiation, urban morphology or proximity and density to various land uses (water surfaces, vegetation, bare soil, etc.). The acquisition sources are multiple and come from the Landsat 8 and Sentinel satellites, LiDAR points, and cartographic products downloaded from an open data platform in Greater Lyon. Regarding the presence of low, medium, and high vegetation, the presence of buildings and ground, several buffers close to these factors were tested (5, 10, 20, 25, 50, 100, 200 and 500m). The buffers with the best linear correlations with air temperature for ground are 5m around the measurement points, for low and medium vegetation, and for building 50m and for high vegetation is 100m. The explanatory model of the dependent variable is obtained by multiple linear regression of the remaining explanatory variables (Pearson correlation matrix with a |r| < 0.7 and VIF with < 5) by integrating a stepwise sorting algorithm. Moreover, holdout cross-validation is performed, due to its ability to detect over-fitting of multiple regression, although multiple regression provides internal validation and randomization (80% training, 20% testing). Multiple linear regression explained, on average, 72% of the variance for the study days, with an average RMSE of only 0.20°C. The impact on the model of surface temperature in the estimation of air temperature is the most important variable. Other variables are recurrent such as distance to subway stations, distance to water areas, NDVI, digital elevation model, sky view factor, average vegetation density, or building density. Changing urban morphology influences the city's thermal patterns. The thermal atmosphere in dense urban areas can only be analysed on a microscale to be able to consider the local impact of trees, streets, and buildings. There is currently no network of fixed weather stations sufficiently deployed in central Lyon and most major urban areas. Therefore, it is necessary to use mobile measurements, followed by modelling to characterize the city's multiple thermal environments.

Keywords: air temperature, LIDAR, multiple linear regression, surface temperature, urban heat island

Procedia PDF Downloads 120
24264 An Investigation into the Crystallization Tendency/Kinetics of Amorphous Active Pharmaceutical Ingredients: A Case Study with Dipyridamole and Cinnarizine

Authors: Shrawan Baghel, Helen Cathcart, Biall J. O'Reilly

Abstract:

Amorphous drug formulations have great potential to enhance solubility and thus bioavailability of BCS class II drugs. However, the higher free energy and molecular mobility of the amorphous form lowers the activation energy barrier for crystallization and thermodynamically drives it towards the crystalline state which makes them unstable. Accurate determination of the crystallization tendency/kinetics is the key to the successful design and development of such systems. In this study, dipyridamole (DPM) and cinnarizine (CNZ) has been selected as model compounds. Thermodynamic fragility (m_T) is measured from the heat capacity change at the glass transition temperature (Tg) whereas dynamic fragility (m_D) is evaluated using methods based on extrapolation of configurational entropy to zero 〖(m〗_(D_CE )), and heating rate dependence of Tg 〖(m〗_(D_Tg)). The mean relaxation time of amorphous drugs was calculated from Vogel-Tammann-Fulcher (VTF) equation. Furthermore, the correlation between fragility and glass forming ability (GFA) of model drugs has been established and the relevance of these parameters to crystallization of amorphous drugs is also assessed. Moreover, the crystallization kinetics of model drugs under isothermal conditions has been studied using Johnson-Mehl-Avrami (JMA) approach to determine the Avrami constant ‘n’ which provides an insight into the mechanism of crystallization. To further probe into the crystallization mechanism, the non-isothermal crystallization kinetics of model systems was also analysed by statistically fitting the crystallization data to 15 different kinetic models and the relevance of model-free kinetic approach has been established. In addition, the crystallization mechanism for DPM and CNZ at each extent of transformation has been predicted. The calculated fragility, glass forming ability (GFA) and crystallization kinetics is found to be in good correlation with the stability prediction of amorphous solid dispersions. Thus, this research work involves a multidisciplinary approach to establish fragility, GFA and crystallization kinetics as stability predictors for amorphous drug formulations.

Keywords: amorphous, fragility, glass forming ability, molecular mobility, mean relaxation time, crystallization kinetics, stability

Procedia PDF Downloads 339
24263 Role of Web Graphics and Interface in Creating Visitor Trust

Authors: Pramika J. Muthya

Abstract:

This paper investigates the impact of web graphics and interface design on building visitor trust in websites. A quantitative survey approach was used to examine how aesthetic and usability elements of website design influence user perceptions of trustworthiness. 133 participants aged 18-25 who live in urban Bangalore and engage in online transactions were recruited via convenience sampling. Data was collected through an online survey measuring trust levels based on website design, using validated constructs like the Visual Aesthetic of Websites Inventory (VisAWI). Statistical analysis, including ordinal regression, was conducted to analyze the results. The findings show a statistically significant relationship between web graphics and interface design and the level of trust visitors place in a website. The goodness-of-fit statistics and highly significant model fitting information provide strong evidence for rejecting the null hypothesis of no relationship. Well-designed visual aesthetics like simplicity, diversity, colorfulness, and craftsmanship are key drivers of perceived credibility. Intuitive navigation and usability also increase trust. The results emphasize the strategic importance for companies to invest in appealing graphic design, consistent with existing theoretical frameworks. There are also implications for taking a user-centric approach to web design and acknowledging the reciprocal link between pre-existing user trust and perception of visuals. While generalizable, limitations include possible sampling and self-report biases. Further research can build on these findings to deepen understanding of nuanced cultural and temporal factors influencing online trust. Overall, this study makes a significant contribution by providing empirical evidence that reinforces the crucial impact of thoughtful graphic design in fostering lasting user trust in websites.

Keywords: web graphics, interface design, visitor trust, website design, aesthetics, user experience, online trust, visual design, graphic design, user perceptions, user expectations

Procedia PDF Downloads 34
24262 Approximating Maximum Speed on Road from Curvature Information of Bezier Curve

Authors: M. Yushalify Misro, Ahmad Ramli, Jamaludin M. Ali

Abstract:

Bezier curves have useful properties for path generation problem, for instance, it can generate the reference trajectory for vehicles to satisfy the path constraints. Both algorithms join cubic Bezier curve segment smoothly to generate the path. Some of the useful properties of Bezier are curvature. In mathematics, the curvature is the amount by which a geometric object deviates from being flat, or straight in the case of a line. Another extrinsic example of curvature is a circle, where the curvature is equal to the reciprocal of its radius at any point on the circle. The smaller the radius, the higher the curvature thus the vehicle needs to bend sharply. In this study, we use Bezier curve to fit highway-like curve. We use the different approach to finding the best approximation for the curve so that it will resemble highway-like curve. We compute curvature value by analytical differentiation of the Bezier Curve. We will then compute the maximum speed for driving using the curvature information obtained. Our research works on some assumptions; first the Bezier curve estimates the real shape of the curve which can be verified visually. Even, though, the fitting process of Bezier curve does not interpolate exactly on the curve of interest, we believe that the estimation of speed is acceptable. We verified our result with the manual calculation of the curvature from the map.

Keywords: speed estimation, path constraints, reference trajectory, Bezier curve

Procedia PDF Downloads 360
24261 Data Mining in Medicine Domain Using Decision Trees and Vector Support Machine

Authors: Djamila Benhaddouche, Abdelkader Benyettou

Abstract:

In this paper, we used data mining to extract biomedical knowledge. In general, complex biomedical data collected in studies of populations are treated by statistical methods, although they are robust, they are not sufficient in themselves to harness the potential wealth of data. For that you used in step two learning algorithms: the Decision Trees and Support Vector Machine (SVM). These supervised classification methods are used to make the diagnosis of thyroid disease. In this context, we propose to promote the study and use of symbolic data mining techniques.

Keywords: biomedical data, learning, classifier, algorithms decision tree, knowledge extraction

Procedia PDF Downloads 535
24260 Analysis of Different Classification Techniques Using WEKA for Diabetic Disease

Authors: Usama Ahmed

Abstract:

Data mining is the process of analyze data which are used to predict helpful information. It is the field of research which solve various type of problem. In data mining, classification is an important technique to classify different kind of data. Diabetes is most common disease. This paper implements different classification technique using Waikato Environment for Knowledge Analysis (WEKA) on diabetes dataset and find which algorithm is suitable for working. The best classification algorithm based on diabetic data is Naïve Bayes. The accuracy of Naïve Bayes is 76.31% and take 0.06 seconds to build the model.

Keywords: data mining, classification, diabetes, WEKA

Procedia PDF Downloads 132
24259 Application of Natural Language Processing in Education

Authors: Khaled M. Alhawiti

Abstract:

Reading capability is a major segment of language competency. On the other hand, discovering topical writings at a fitting level for outside and second language learners is a test for educators. We address this issue utilizing natural language preparing innovation to survey reading level and streamline content. In the connection of outside and second-language learning, existing measures of reading level are not appropriate to this errand. Related work has demonstrated the profit of utilizing measurable language preparing procedures; we expand these thoughts and incorporate other potential peculiarities to measure intelligibility. In the first piece of this examination, we join characteristics from measurable language models, customary reading level measures and other language preparing apparatuses to deliver a finer technique for recognizing reading level. We examine the execution of human annotators and assess results for our finders concerning human appraisals. A key commitment is that our identifiers are trainable; with preparing and test information from the same space, our finders beat more general reading level instruments (Flesch-Kincaid and Lexile). Trainability will permit execution to be tuned to address the needs of specific gatherings or understudies.

Keywords: natural language processing, trainability, syntactic simplification tools, education

Procedia PDF Downloads 472
24258 Comprehensive Study of Data Science

Authors: Asifa Amara, Prachi Singh, Kanishka, Debargho Pathak, Akshat Kumar, Jayakumar Eravelly

Abstract:

Today's generation is totally dependent on technology that uses data as its fuel. The present study is all about innovations and developments in data science and gives an idea about how efficiently to use the data provided. This study will help to understand the core concepts of data science. The concept of artificial intelligence was introduced by Alan Turing in which the main principle was to create an artificial system that can run independently of human-given programs and can function with the help of analyzing data to understand the requirements of the users. Data science comprises business understanding, analyzing data, ethical concerns, understanding programming languages, various fields and sources of data, skills, etc. The usage of data science has evolved over the years. In this review article, we have covered a part of data science, i.e., machine learning. Machine learning uses data science for its work. Machines learn through their experience, which helps them to do any work more efficiently. This article includes a comparative study image between human understanding and machine understanding, advantages, applications, and real-time examples of machine learning. Data science is an important game changer in the life of human beings. Since the advent of data science, we have found its benefits and how it leads to a better understanding of people, and how it cherishes individual needs. It has improved business strategies, services provided by them, forecasting, the ability to attend sustainable developments, etc. This study also focuses on a better understanding of data science which will help us to create a better world.

Keywords: data science, machine learning, data analytics, artificial intelligence

Procedia PDF Downloads 62
24257 Evaluation of the Effect of Milk Recording Intervals on the Accuracy of an Empirical Model Fitted to Dairy Sheep Lactations

Authors: L. Guevara, Glória L. S., Corea E. E, A. Ramírez-Zamora M., Salinas-Martinez J. A., Angeles-Hernandez J. C.

Abstract:

Mathematical models are useful for identifying the characteristics of sheep lactation curves to develop and implement improved strategies. However, the accuracy of these models is influenced by factors such as the recording regime, mainly the intervals between test day records (TDR). The current study aimed to evaluate the effect of different TDR intervals on the goodness of fit of the Wood model (WM) applied to dairy sheep lactations. A total of 4,494 weekly TDRs from 156 lactations of dairy crossbred sheep were analyzed. Three new databases were generated from the original weekly TDR data (7D), comprising intervals of 14(14D), 21(21D), and 28(28D) days. The parameters of WM were estimated using the “minpack.lm” package in the R software. The shape of the lactation curve (typical and atypical) was defined based on the WM parameters. The goodness of fit was evaluated using the mean square of prediction error (MSPE), Root of MSPE (RMSPE), Akaike´s Information Criterion (AIC), Bayesian´s Information Criterion (BIC), and the coefficient of correlation (r) between the actual and estimated total milk yield (TMY). WM showed an adequate estimate of TMY regardless of the TDR interval (P=0.21) and shape of the lactation curve (P=0.42). However, we found higher values of r for typical curves compared to atypical curves (0.9vs.0.74), with the highest values for the 28D interval (r=0.95). In the same way, we observed an overestimated peak yield (0.92vs.6.6 l) and underestimated time of peak yield (21.5vs.1.46) in atypical curves. The best values of RMSPE were observed for the 28D interval in both lactation curve shapes. The significant lowest values of AIC (P=0.001) and BIC (P=0.001) were shown by the 7D interval for typical and atypical curves. These results represent the first approach to define the adequate interval to record the regime of dairy sheep in Latin America and showed a better fitting for the Wood model using a 7D interval. However, it is possible to obtain good estimates of TMY using a 28D interval, which reduces the sampling frequency and would save additional costs to dairy sheep producers.

Keywords: gamma incomplete, ewes, shape curves, modeling

Procedia PDF Downloads 55
24256 Application of Artificial Neural Network Technique for Diagnosing Asthma

Authors: Azadeh Bashiri

Abstract:

Introduction: Lack of proper diagnosis and inadequate treatment of asthma leads to physical and financial complications. This study aimed to use data mining techniques and creating a neural network intelligent system for diagnosis of asthma. Methods: The study population is the patients who had visited one of the Lung Clinics in Tehran. Data were analyzed using the SPSS statistical tool and the chi-square Pearson's coefficient was the basis of decision making for data ranking. The considered neural network is trained using back propagation learning technique. Results: According to the analysis performed by means of SPSS to select the top factors, 13 effective factors were selected, in different performances, data was mixed in various forms, so the different models were made for training the data and testing networks and in all different modes, the network was able to predict correctly 100% of all cases. Conclusion: Using data mining methods before the design structure of system, aimed to reduce the data dimension and the optimum choice of the data, will lead to a more accurate system. Therefore, considering the data mining approaches due to the nature of medical data is necessary.

Keywords: asthma, data mining, Artificial Neural Network, intelligent system

Procedia PDF Downloads 258
24255 Interpreting Privacy Harms from a Non-Economic Perspective

Authors: Christopher Muhawe, Masooda Bashir

Abstract:

With increased Internet Communication Technology(ICT), the virtual world has become the new normal. At the same time, there is an unprecedented collection of massive amounts of data by both private and public entities. Unfortunately, this increase in data collection has been in tandem with an increase in data misuse and data breach. Regrettably, the majority of data breach and data misuse claims have been unsuccessful in the United States courts for the failure of proof of direct injury to physical or economic interests. The requirement to express data privacy harms from an economic or physical stance negates the fact that not all data harms are physical or economic in nature. The challenge is compounded by the fact that data breach harms and risks do not attach immediately. This research will use a descriptive and normative approach to show that not all data harms can be expressed in economic or physical terms. Expressing privacy harms purely from an economic or physical harm perspective negates the fact that data insecurity may result into harms which run counter the functions of privacy in our lives. The promotion of liberty, selfhood, autonomy, promotion of human social relations and the furtherance of the existence of a free society. There is no economic value that can be placed on these functions of privacy. The proposed approach addresses data harms from a psychological and social perspective.

Keywords: data breach and misuse, economic harms, privacy harms, psychological harms

Procedia PDF Downloads 176
24254 Enhancing the Recruitment Process through Machine Learning: An Automated CV Screening System

Authors: Kaoutar Ben Azzou, Hanaa Talei

Abstract:

Human resources is an important department in each organization as it manages the life cycle of employees from recruitment training to retirement or termination of contracts. The recruitment process starts with a job opening, followed by a selection of the best-fit candidates from all applicants. Matching the best profile for a job position requires a manual way of looking at many CVs, which requires hours of work that can sometimes lead to choosing not the best profile. The work presented in this paper aims at reducing the workload of HR personnel by automating the preliminary stages of the candidate screening process, thereby fostering a more streamlined recruitment workflow. This tool introduces an automated system designed to help with the recruitment process by scanning candidates' CVs, extracting pertinent features, and employing machine learning algorithms to decide the most fitting job profile for each candidate. Our work employs natural language processing (NLP) techniques to identify and extract key features from unstructured text extracted from a CV, such as education, work experience, and skills. Subsequently, the system utilizes these features to match candidates with job profiles, leveraging the power of classification algorithms.

Keywords: automated recruitment, candidate screening, machine learning, human resources management

Procedia PDF Downloads 35
24253 Machine Learning Analysis of Student Success in Introductory Calculus Based Physics I Course

Authors: Chandra Prayaga, Aaron Wade, Lakshmi Prayaga, Gopi Shankar Mallu

Abstract:

This paper presents the use of machine learning algorithms to predict the success of students in an introductory physics course. Data having 140 rows pertaining to the performance of two batches of students was used. The lack of sufficient data to train robust machine learning models was compensated for by generating synthetic data similar to the real data. CTGAN and CTGAN with Gaussian Copula (Gaussian) were used to generate synthetic data, with the real data as input. To check the similarity between the real data and each synthetic dataset, pair plots were made. The synthetic data was used to train machine learning models using the PyCaret package. For the CTGAN data, the Ada Boost Classifier (ADA) was found to be the ML model with the best fit, whereas the CTGAN with Gaussian Copula yielded Logistic Regression (LR) as the best model. Both models were then tested for accuracy with the real data. ROC-AUC analysis was performed for all the ten classes of the target variable (Grades A, A-, B+, B, B-, C+, C, C-, D, F). The ADA model with CTGAN data showed a mean AUC score of 0.4377, but the LR model with the Gaussian data showed a mean AUC score of 0.6149. ROC-AUC plots were obtained for each Grade value separately. The LR model with Gaussian data showed consistently better AUC scores compared to the ADA model with CTGAN data, except in two cases of the Grade value, C- and A-.

Keywords: machine learning, student success, physics course, grades, synthetic data, CTGAN, gaussian copula CTGAN

Procedia PDF Downloads 28
24252 Design and Developing the Infrared Sensor for Detection and Measuring Mass Flow Rate in Seed Drills

Authors: Bahram Besharti, Hossein Navid, Hadi Karimi, Hossein Behfar, Iraj Eskandari

Abstract:

Multiple or miss sowing by seed drills is a common problem on the farm. This problem causes overuse of seeds, wasting energy, rising crop treatment cost and reducing crop yield in harvesting. To be informed of mentioned faults and monitoring the performance of seed drills during sowing, developing a seed sensor for detecting seed mass flow rate and monitoring in a delivery tube is essential. In this research, an infrared seed sensor was developed to estimate seed mass flow rate in seed drills. The developed sensor comprised of a pair of spaced apart circuits one acting as an IR transmitter and the other acting as an IR receiver. Optical coverage in the sensing section was obtained by setting IR LEDs and photo-diodes directly on opposite sides. Passing seeds made interruption in radiation beams to the photo-diode which caused output voltages to change. The voltage difference of sensing units summed by a microcontroller and were converted to an analog value by DAC chip. The sensor was tested by using a roller seed metering device with three types of seeds consist of chickpea, wheat, and alfalfa (representing large, medium and fine seed, respectively). The results revealed a good fitting between voltage received from seed sensor and mass flow of seeds in the delivery tube. A linear trend line was set for three seeds collected data as a model of the mass flow of seeds. A final mass flow model was developed for various size seeds based on receiving voltages from the seed sensor, thousand seed weight and equivalent diameter of seeds. The developed infrared seed sensor, besides monitoring mass flow of seeds in field operations, can be used for the assessment of mechanical planter seed metering unit performance in the laboratory and provide an easy calibrating method for seed drills before planting in the field.

Keywords: seed flow, infrared, seed sensor, seed drills

Procedia PDF Downloads 343
24251 Data Access, AI Intensity, and Scale Advantages

Authors: Chuping Lo

Abstract:

This paper presents a simple model demonstrating that ceteris paribus countries with lower barriers to accessing global data tend to earn higher incomes than other countries. Therefore, large countries that inherently have greater data resources tend to have higher incomes than smaller countries, such that the former may be more hesitant than the latter to liberalize cross-border data flows to maintain this advantage. Furthermore, countries with higher artificial intelligence (AI) intensity in production technologies tend to benefit more from economies of scale in data aggregation, leading to higher income and more trade as they are better able to utilize global data.

Keywords: digital intensity, digital divide, international trade, scale of economics

Procedia PDF Downloads 50
24250 Secured Transmission and Reserving Space in Images Before Encryption to Embed Data

Authors: G. R. Navaneesh, E. Nagarajan, C. H. Rajam Raju

Abstract:

Nowadays the multimedia data are used to store some secure information. All previous methods allocate a space in image for data embedding purpose after encryption. In this paper, we propose a novel method by reserving space in image with a boundary surrounded before encryption with a traditional RDH algorithm, which makes it easy for the data hider to reversibly embed data in the encrypted images. The proposed method can achieve real time performance, that is, data extraction and image recovery are free of any error. A secure transmission process is also discussed in this paper, which improves the efficiency by ten times compared to other processes as discussed.

Keywords: secure communication, reserving room before encryption, least significant bits, image encryption, reversible data hiding

Procedia PDF Downloads 395
24249 Identity Verification Using k-NN Classifiers and Autistic Genetic Data

Authors: Fuad M. Alkoot

Abstract:

DNA data have been used in forensics for decades. However, current research looks at using the DNA as a biometric identity verification modality. The goal is to improve the speed of identification. We aim at using gene data that was initially used for autism detection to find if and how accurate is this data for identification applications. Mainly our goal is to find if our data preprocessing technique yields data useful as a biometric identification tool. We experiment with using the nearest neighbor classifier to identify subjects. Results show that optimal classification rate is achieved when the test set is corrupted by normally distributed noise with zero mean and standard deviation of 1. The classification rate is close to optimal at higher noise standard deviation reaching 3. This shows that the data can be used for identity verification with high accuracy using a simple classifier such as the k-nearest neighbor (k-NN). 

Keywords: biometrics, genetic data, identity verification, k nearest neighbor

Procedia PDF Downloads 235
24248 A Review on Intelligent Systems for Geoscience

Authors: R Palson Kennedy, P.Kiran Sai

Abstract:

This article introduces machine learning (ML) researchers to the hurdles that geoscience problems present, as well as the opportunities for improvement in both ML and geosciences. This article presents a review from the data life cycle perspective to meet that need. Numerous facets of geosciences present unique difficulties for the study of intelligent systems. Geosciences data is notoriously difficult to analyze since it is frequently unpredictable, intermittent, sparse, multi-resolution, and multi-scale. The first half addresses data science’s essential concepts and theoretical underpinnings, while the second section contains key themes and sharing experiences from current publications focused on each stage of the data life cycle. Finally, themes such as open science, smart data, and team science are considered.

Keywords: Data science, intelligent system, machine learning, big data, data life cycle, recent development, geo science

Procedia PDF Downloads 123
24247 Data Quality as a Pillar of Data-Driven Organizations: Exploring the Benefits of Data Mesh

Authors: Marc Bachelet, Abhijit Kumar Chatterjee, José Manuel Avila

Abstract:

Data quality is a key component of any data-driven organization. Without data quality, organizations cannot effectively make data-driven decisions, which often leads to poor business performance. Therefore, it is important for an organization to ensure that the data they use is of high quality. This is where the concept of data mesh comes in. Data mesh is an organizational and architectural decentralized approach to data management that can help organizations improve the quality of data. The concept of data mesh was first introduced in 2020. Its purpose is to decentralize data ownership, making it easier for domain experts to manage the data. This can help organizations improve data quality by reducing the reliance on centralized data teams and allowing domain experts to take charge of their data. This paper intends to discuss how a set of elements, including data mesh, are tools capable of increasing data quality. One of the key benefits of data mesh is improved metadata management. In a traditional data architecture, metadata management is typically centralized, which can lead to data silos and poor data quality. With data mesh, metadata is managed in a decentralized manner, ensuring accurate and up-to-date metadata, thereby improving data quality. Another benefit of data mesh is the clarification of roles and responsibilities. In a traditional data architecture, data teams are responsible for managing all aspects of data, which can lead to confusion and ambiguity in responsibilities. With data mesh, domain experts are responsible for managing their own data, which can help provide clarity in roles and responsibilities and improve data quality. Additionally, data mesh can also contribute to a new form of organization that is more agile and adaptable. By decentralizing data ownership, organizations can respond more quickly to changes in their business environment, which in turn can help improve overall performance by allowing better insights into business as an effect of better reports and visualization tools. Monitoring and analytics are also important aspects of data quality. With data mesh, monitoring, and analytics are decentralized, allowing domain experts to monitor and analyze their own data. This will help in identifying and addressing data quality problems in quick time, leading to improved data quality. Data culture is another major aspect of data quality. With data mesh, domain experts are encouraged to take ownership of their data, which can help create a data-driven culture within the organization. This can lead to improved data quality and better business outcomes. Finally, the paper explores the contribution of AI in the coming years. AI can help enhance data quality by automating many data-related tasks, like data cleaning and data validation. By integrating AI into data mesh, organizations can further enhance the quality of their data. The concepts mentioned above are illustrated by AEKIDEN experience feedback. AEKIDEN is an international data-driven consultancy that has successfully implemented a data mesh approach. By sharing their experience, AEKIDEN can help other organizations understand the benefits and challenges of implementing data mesh and improving data quality.

Keywords: data culture, data-driven organization, data mesh, data quality for business success

Procedia PDF Downloads 117
24246 Effectiveness of Participatory Ergonomic Education on Pain Due to Work Related Musculoskeletal Disorders in Food Processing Industrial Workers

Authors: Salima Bijapuri, Shweta Bhatbolan, Sejalben Patel

Abstract:

Ergonomics concerns the fitting of the environment and the equipment to the worker. Ergonomic principles can be employed in different dimensions of the industrial sector. Participation of all the stakeholders is the key to the formulation of a multifaceted and comprehensive approach to lessen the burden of occupational hazards. Taking responsibility for one’s own work activities by acquiring sufficient knowledge and potential to influence the practices and outcomes is the basis of participatory ergonomics and even hastens the process to identify workplace hazards. The study was aimed to check how participatory ergonomics can be effective in the management of work-related musculoskeletal disorders. Method: A mega kitchen was identified in a twin city of Karnataka, India. Consent was taken, and the screening of workers was done using observation methods. Kitchen work was structured to include different tasks, which included preparation, cooking, distributing, and serving food, packing food to be delivered to schools, dishwashing, cleaning and maintenance of kitchen and equipment, and receiving and storing raw material. Total 100 workers attended the education session on participatory ergonomics and its role in implementing the correct ergonomic practices, thus preventing WRMSDs. Demographic details and baseline data on related musculoskeletal pain and discomfort were collected using the Nordic pain questionnaire and VAS score pre- and post-study. Monthly visits were made, and the education sessions were reiterated on each visit, thus reminding, correcting, and problem-solving of each worker. After 9 months with a total of 4 such education session, the post education data was collected. The software SPSS 20 was used to analyse the collected data. Results: The majority of them (78%), depending on the availability and feasibility, participated in the intervention workshops were arranged four times. The average age of the participants was 39 years. The percentage of female participants was 79.49%, and 20.51% of participants comprised of males. The Nordic Musculoskeletal Questionnaire (NMQ) showed that knee pain was the most commonly reported complaint (62%) from the last 12 months with a mean VAS of 6.27, followed by low back pain. Post intervention, the mean VAS Score was reduced significantly to 2.38. The comparison of pre-post scores was made using Wilcoxon matched pairs test. Upon enquiring, it was found that, the participants learned the importance of applying ergonomics at their workplace which inturn was beneficial for them to handle any problems arising at their workplace on their own with self confidence. Conclusion: The participatory ergonomics proved effective with workers of mega kitchen, and it is a feasible and practical approach. The advantage of the given study area was that it had a sophisticated and ergonomically designed workstation; thus it was the lack of education and practical knowledge to use these stations was of utmost need. There was a significant reduction in VAS scores with the implementation of changes in the working style, and the knowledge of ergonomics helped to decrease physical load and improve musculoskeletal health.

Keywords: ergonomic awareness session, mega kitchen, participatory ergonomics, work related musculoskeletal disorders

Procedia PDF Downloads 121
24245 Big Data Analysis with RHadoop

Authors: Ji Eun Shin, Byung Ho Jung, Dong Hoon Lim

Abstract:

It is almost impossible to store or analyze big data increasing exponentially with traditional technologies. Hadoop is a new technology to make that possible. R programming language is by far the most popular statistical tool for big data analysis based on distributed processing with Hadoop technology. With RHadoop that integrates R and Hadoop environment, we implemented parallel multiple regression analysis with different sizes of actual data. Experimental results showed our RHadoop system was much faster as the number of data nodes increases. We also compared the performance of our RHadoop with lm function and big lm packages available on big memory. The results showed that our RHadoop was faster than other packages owing to paralleling processing with increasing the number of map tasks as the size of data increases.

Keywords: big data, Hadoop, parallel regression analysis, R, RHadoop

Procedia PDF Downloads 420