Search results for: data fitting
24919 Repeatable Surface Enhanced Raman Spectroscopy Substrates from SERSitive for Wide Range of Chemical and Biological Substances
Authors: Monika Ksiezopolska-Gocalska, Pawel Albrycht, Robert Holyst
Abstract:
Surface Enhanced Raman Spectroscopy (SERS) is a technique used to analyze very low concentrations of substances in solutions, even in aqueous solutions - which is its advantage over IR. This technique can be used in the pharmacy (to check the purity of products); forensics (whether at a crime scene there were any illegal substances); or medicine (serving as a medical test) and lots more. Due to the high potential of this technique, its increasing popularity in analytical laboratories, and simultaneously - the absence of appropriate platforms enhancing the SERS signal (crucial to observe the Raman effect at low analyte concentration in solutions (1 ppm)), we decided to invent our own SERS platforms. As an enhancing layer, we have chosen gold and silver nanoparticles, because these two have the best SERS properties, and each has an affinity for the other kind of particles, which increases the range of research capabilities. The next step was to commercialize them, which resulted in the creation of the company ‘SERSitive.eu’ focusing on production of highly sensitive (Ef = 10⁵ – 10⁶), homogeneous and reproducible (70 - 80%) substrates. SERStive SERS substrates are made using the electrodeposition of silver or silver-gold nanoparticles technique. Thanks to a very detailed analysis of data based on studies optimizing such parameters as deposition time, temperature of the reaction solution, applied potential, used reducer, or reagent concentrations using a standardized compound - p-mercaptobenzoic acid (PMBA) at a concentration of 10⁻⁶ M, we have developed a high-performance process for depositing precious metal nanoparticles on the surface of ITO glass. In order to check a quality of the SERSitive platforms, we examined the wide range of the chemical compounds and the biological substances. Apart from analytes that have great affinity to the metal surfaces (e.g. PMBA) we obtained very good results for those fitting less the SERS measurements. Successfully we received intensive, and what’s more important - very repetitive spectra for; amino acids (phenyloalanine, 10⁻³ M), drugs (amphetamine, 10⁻⁴ M), designer drugs (cathinone derivatives, 10⁻³ M), medicines and ending with bacteria (Listeria, Salmonella, Escherichia coli) and fungi.Keywords: nanoparticles, Raman spectroscopy, SERS, SERS applications, SERS substrates, SERSitive
Procedia PDF Downloads 15124918 Data Mining in Medicine Domain Using Decision Trees and Vector Support Machine
Authors: Djamila Benhaddouche, Abdelkader Benyettou
Abstract:
In this paper, we used data mining to extract biomedical knowledge. In general, complex biomedical data collected in studies of populations are treated by statistical methods, although they are robust, they are not sufficient in themselves to harness the potential wealth of data. For that you used in step two learning algorithms: the Decision Trees and Support Vector Machine (SVM). These supervised classification methods are used to make the diagnosis of thyroid disease. In this context, we propose to promote the study and use of symbolic data mining techniques.Keywords: biomedical data, learning, classifier, algorithms decision tree, knowledge extraction
Procedia PDF Downloads 55924917 Analysis of Different Classification Techniques Using WEKA for Diabetic Disease
Authors: Usama Ahmed
Abstract:
Data mining is the process of analyze data which are used to predict helpful information. It is the field of research which solve various type of problem. In data mining, classification is an important technique to classify different kind of data. Diabetes is most common disease. This paper implements different classification technique using Waikato Environment for Knowledge Analysis (WEKA) on diabetes dataset and find which algorithm is suitable for working. The best classification algorithm based on diabetic data is Naïve Bayes. The accuracy of Naïve Bayes is 76.31% and take 0.06 seconds to build the model.Keywords: data mining, classification, diabetes, WEKA
Procedia PDF Downloads 14724916 The Algorithm to Solve the Extend General Malfatti’s Problem in a Convex Circular Triangle
Authors: Ching-Shoei Chiang
Abstract:
The Malfatti’s Problem solves the problem of fitting 3 circles into a right triangle such that these 3 circles are tangent to each other, and each circle is also tangent to a pair of the triangle’s sides. This problem has been extended to any triangle (called general Malfatti’s Problem). Furthermore, the problem has been extended to have 1+2+…+n circles inside the triangle with special tangency properties among circles and triangle sides; we call it extended general Malfatti’s problem. In the extended general Malfatti’s problem, call it Tri(Tn), where Tn is the triangle number, there are closed-form solutions for Tri(T₁) (inscribed circle) problem and Tri(T₂) (3 Malfatti’s circles) problem. These problems become more complex when n is greater than 2. In solving Tri(Tn) problem, n>2, algorithms have been proposed to solve these problems numerically. With a similar idea, this paper proposed an algorithm to find the radii of circles with the same tangency properties. Instead of the boundary of the triangle being a straight line, we use a convex circular arc as the boundary and try to find Tn circles inside this convex circular triangle with the same tangency properties among circles and boundary Carc. We call these problems the Carc(Tn) problems. The CPU time it takes for Carc(T16) problem, which finds 136 circles inside a convex circular triangle with specified tangency properties, is less than one second.Keywords: circle packing, computer-aided geometric design, geometric constraint solver, Malfatti’s problem
Procedia PDF Downloads 11024915 Comprehensive Study of Data Science
Authors: Asifa Amara, Prachi Singh, Kanishka, Debargho Pathak, Akshat Kumar, Jayakumar Eravelly
Abstract:
Today's generation is totally dependent on technology that uses data as its fuel. The present study is all about innovations and developments in data science and gives an idea about how efficiently to use the data provided. This study will help to understand the core concepts of data science. The concept of artificial intelligence was introduced by Alan Turing in which the main principle was to create an artificial system that can run independently of human-given programs and can function with the help of analyzing data to understand the requirements of the users. Data science comprises business understanding, analyzing data, ethical concerns, understanding programming languages, various fields and sources of data, skills, etc. The usage of data science has evolved over the years. In this review article, we have covered a part of data science, i.e., machine learning. Machine learning uses data science for its work. Machines learn through their experience, which helps them to do any work more efficiently. This article includes a comparative study image between human understanding and machine understanding, advantages, applications, and real-time examples of machine learning. Data science is an important game changer in the life of human beings. Since the advent of data science, we have found its benefits and how it leads to a better understanding of people, and how it cherishes individual needs. It has improved business strategies, services provided by them, forecasting, the ability to attend sustainable developments, etc. This study also focuses on a better understanding of data science which will help us to create a better world.Keywords: data science, machine learning, data analytics, artificial intelligence
Procedia PDF Downloads 8224914 Application of Artificial Neural Network Technique for Diagnosing Asthma
Authors: Azadeh Bashiri
Abstract:
Introduction: Lack of proper diagnosis and inadequate treatment of asthma leads to physical and financial complications. This study aimed to use data mining techniques and creating a neural network intelligent system for diagnosis of asthma. Methods: The study population is the patients who had visited one of the Lung Clinics in Tehran. Data were analyzed using the SPSS statistical tool and the chi-square Pearson's coefficient was the basis of decision making for data ranking. The considered neural network is trained using back propagation learning technique. Results: According to the analysis performed by means of SPSS to select the top factors, 13 effective factors were selected, in different performances, data was mixed in various forms, so the different models were made for training the data and testing networks and in all different modes, the network was able to predict correctly 100% of all cases. Conclusion: Using data mining methods before the design structure of system, aimed to reduce the data dimension and the optimum choice of the data, will lead to a more accurate system. Therefore, considering the data mining approaches due to the nature of medical data is necessary.Keywords: asthma, data mining, Artificial Neural Network, intelligent system
Procedia PDF Downloads 27324913 Improvement of Electric Aircraft Endurance through an Optimal Propeller Design Using Combined BEM, Vortex and CFD Methods
Authors: Jose Daniel Hoyos Giraldo, Jesus Hernan Jimenez Giraldo, Juan Pablo Alvarado Perilla
Abstract:
Range and endurance are the main limitations of electric aircraft due to the nature of its source of power. The improvement of efficiency on this kind of systems is extremely meaningful to encourage the aircraft operation with less environmental impact. The propeller efficiency highly affects the overall efficiency of the propulsion system; hence its optimization can have an outstanding effect on the aircraft performance. An optimization method is applied to an aircraft propeller in order to maximize its range and endurance by estimating the best combination of geometrical parameters such as diameter and airfoil, chord and pitch distribution for a specific aircraft design at a certain cruise speed, then the rotational speed at which the propeller operates at minimum current consumption is estimated. The optimization is based on the Blade Element Momentum (BEM) method, additionally corrected to account for tip and hub losses, Mach number and rotational effects; furthermore an airfoil lift and drag coefficients approximation is implemented from Computational Fluid Dynamics (CFD) simulations supported by preliminary studies of grid independence and suitability of different turbulence models, to feed the BEM method, with the aim of achieve more reliable results. Additionally, Vortex Theory is employed to find the optimum pitch and chord distribution to achieve a minimum induced loss propeller design. Moreover, the optimization takes into account the well-known brushless motor model, thrust constraints for take-off runway limitations, maximum allowable propeller diameter due to aircraft height and maximum motor power. The BEM-CFD method is validated by comparing its predictions for a known APC propeller with both available experimental tests and APC reported performance curves which are based on Vortex Theory fed with the NASA Transonic Airfoil code, showing a adequate fitting with experimental data even more than reported APC data. Optimal propeller predictions are validated by wind tunnel tests, CFD propeller simulations and a study of how the propeller will perform if it replaces the one of on known aircraft. Some tendency charts relating a wide range of parameters such as diameter, voltage, pitch, rotational speed, current, propeller and electric efficiencies are obtained and discussed. The implementation of CFD tools shows an improvement in the accuracy of BEM predictions. Results also showed how a propeller has higher efficiency peaks when it operates at high rotational speed due to the higher Reynolds at which airfoils present lower drag. On the other hand, the behavior of the current consumption related to the propulsive efficiency shows counterintuitive results, the best range and endurance is not necessary achieved in an efficiency peak.Keywords: BEM, blade design, CFD, electric aircraft, endurance, optimization, range
Procedia PDF Downloads 10824912 Interpreting Privacy Harms from a Non-Economic Perspective
Authors: Christopher Muhawe, Masooda Bashir
Abstract:
With increased Internet Communication Technology(ICT), the virtual world has become the new normal. At the same time, there is an unprecedented collection of massive amounts of data by both private and public entities. Unfortunately, this increase in data collection has been in tandem with an increase in data misuse and data breach. Regrettably, the majority of data breach and data misuse claims have been unsuccessful in the United States courts for the failure of proof of direct injury to physical or economic interests. The requirement to express data privacy harms from an economic or physical stance negates the fact that not all data harms are physical or economic in nature. The challenge is compounded by the fact that data breach harms and risks do not attach immediately. This research will use a descriptive and normative approach to show that not all data harms can be expressed in economic or physical terms. Expressing privacy harms purely from an economic or physical harm perspective negates the fact that data insecurity may result into harms which run counter the functions of privacy in our lives. The promotion of liberty, selfhood, autonomy, promotion of human social relations and the furtherance of the existence of a free society. There is no economic value that can be placed on these functions of privacy. The proposed approach addresses data harms from a psychological and social perspective.Keywords: data breach and misuse, economic harms, privacy harms, psychological harms
Procedia PDF Downloads 19524911 An Empirical Study of the Best Fitting Probability Distributions for Stock Returns Modeling
Authors: Jayanta Pokharel, Gokarna Aryal, Netra Kanaal, Chris Tsokos
Abstract:
Investment in stocks and shares aims to seek potential gains while weighing the risk of future needs, such as retirement, children's education etc. Analysis of the behavior of the stock market returns and making prediction is important for investors to mitigate risk on investment. Historically, the normal variance models have been used to describe the behavior of stock market returns. However, the returns of the financial assets are actually skewed with higher kurtosis, heavier tails, and a higher center than the normal distribution. The Laplace distribution and its family are natural candidates for modeling stock returns. The Variance-Gamma (VG) distribution is the most sought-after distributions for modeling asset returns and has been extensively discussed in financial literatures. In this paper, it explore the other Laplace family, such as Asymmetric Laplace, Skewed Laplace, Kumaraswamy Laplace (KS) together with Variance-Gamma to model the weekly returns of the S&P 500 Index and it's eleven business sector indices. The method of maximum likelihood is employed to estimate the parameters of the distributions and our empirical inquiry shows that the Kumaraswamy Laplace distribution performs much better for stock returns modeling among the choice of distributions used in this study and in practice, KS can be used as a strong alternative to VG distribution.Keywords: stock returns, variance-gamma, kumaraswamy laplace, maximum likelihood
Procedia PDF Downloads 7024910 PointNetLK-OBB: A Point Cloud Registration Algorithm with High Accuracy
Authors: Wenhao Lan, Ning Li, Qiang Tong
Abstract:
To improve the registration accuracy of a source point cloud and template point cloud when the initial relative deflection angle is too large, a PointNetLK algorithm combined with an oriented bounding box (PointNetLK-OBB) is proposed. In this algorithm, the OBB of a 3D point cloud is used to represent the macro feature of source and template point clouds. Under the guidance of the iterative closest point algorithm, the OBB of the source and template point clouds is aligned, and a mirror symmetry effect is produced between them. According to the fitting degree of the source and template point clouds, the mirror symmetry plane is detected, and the optimal rotation and translation of the source point cloud is obtained to complete the 3D point cloud registration task. To verify the effectiveness of the proposed algorithm, a comparative experiment was performed using the publicly available ModelNet40 dataset. The experimental results demonstrate that, compared with PointNetLK, PointNetLK-OBB improves the registration accuracy of the source and template point clouds when the initial relative deflection angle is too large, and the sensitivity of the initial relative position between the source point cloud and template point cloud is reduced. The primary contribution of this paper is the use of PointNetLK to avoid the non-convex problem of traditional point cloud registration and leveraging the regularity of the OBB to avoid the local optimization problem in the PointNetLK context.Keywords: mirror symmetry, oriented bounding box, point cloud registration, PointNetLK-OBB
Procedia PDF Downloads 15024909 The Philippine Collegian and the Catalyst's Journalistic Presentation of the UP and PUP: A Content Analysis
Authors: Diana Mariz Catangay, Irish-Ann Montano, Frances Janine Suyat
Abstract:
As an active pedestal for student’s interaction with both issues happening inside the school and out; may it be political, societal, international, or other current events, a school paper should at least meet the standard of providing a representation of the school’s morals and values and help the institution uplift its image. The researchers seek to ascertain how the two student publications from the Philippines’ two prime state universities, the University of the Philippines’ Philippine Collegian, and the Polytechnic University of the Philippines’ the Catalyst, presents iii their school through balanced journalism and objective documentation. The objectives include determining the number of school-related articles published versus those articles that are concerned outside the school’s jurisdiction, analyzing the insight it provides on the image of the university, assessing the similarities and/or differences between the two publications, and, finally, coming up with the conclusion of how the two newspapers uses their medium to present their respective schools. The research used the quantitative method of research in order to further analyze the articles that will serve as bases in coming up with the right conclusion based on the objectives of the study. Coding sheets and coding guides are utilized for the chosen research method. The gathered findings will then be interpreted as fitting to the goal of the research.Keywords: content analysis, journalistic presentation, student publications, state universities
Procedia PDF Downloads 18124908 Machine Learning Analysis of Student Success in Introductory Calculus Based Physics I Course
Authors: Chandra Prayaga, Aaron Wade, Lakshmi Prayaga, Gopi Shankar Mallu
Abstract:
This paper presents the use of machine learning algorithms to predict the success of students in an introductory physics course. Data having 140 rows pertaining to the performance of two batches of students was used. The lack of sufficient data to train robust machine learning models was compensated for by generating synthetic data similar to the real data. CTGAN and CTGAN with Gaussian Copula (Gaussian) were used to generate synthetic data, with the real data as input. To check the similarity between the real data and each synthetic dataset, pair plots were made. The synthetic data was used to train machine learning models using the PyCaret package. For the CTGAN data, the Ada Boost Classifier (ADA) was found to be the ML model with the best fit, whereas the CTGAN with Gaussian Copula yielded Logistic Regression (LR) as the best model. Both models were then tested for accuracy with the real data. ROC-AUC analysis was performed for all the ten classes of the target variable (Grades A, A-, B+, B, B-, C+, C, C-, D, F). The ADA model with CTGAN data showed a mean AUC score of 0.4377, but the LR model with the Gaussian data showed a mean AUC score of 0.6149. ROC-AUC plots were obtained for each Grade value separately. The LR model with Gaussian data showed consistently better AUC scores compared to the ADA model with CTGAN data, except in two cases of the Grade value, C- and A-.Keywords: machine learning, student success, physics course, grades, synthetic data, CTGAN, gaussian copula CTGAN
Procedia PDF Downloads 4424907 Data Access, AI Intensity, and Scale Advantages
Authors: Chuping Lo
Abstract:
This paper presents a simple model demonstrating that ceteris paribus countries with lower barriers to accessing global data tend to earn higher incomes than other countries. Therefore, large countries that inherently have greater data resources tend to have higher incomes than smaller countries, such that the former may be more hesitant than the latter to liberalize cross-border data flows to maintain this advantage. Furthermore, countries with higher artificial intelligence (AI) intensity in production technologies tend to benefit more from economies of scale in data aggregation, leading to higher income and more trade as they are better able to utilize global data.Keywords: digital intensity, digital divide, international trade, scale of economics
Procedia PDF Downloads 6824906 Secured Transmission and Reserving Space in Images Before Encryption to Embed Data
Authors: G. R. Navaneesh, E. Nagarajan, C. H. Rajam Raju
Abstract:
Nowadays the multimedia data are used to store some secure information. All previous methods allocate a space in image for data embedding purpose after encryption. In this paper, we propose a novel method by reserving space in image with a boundary surrounded before encryption with a traditional RDH algorithm, which makes it easy for the data hider to reversibly embed data in the encrypted images. The proposed method can achieve real time performance, that is, data extraction and image recovery are free of any error. A secure transmission process is also discussed in this paper, which improves the efficiency by ten times compared to other processes as discussed.Keywords: secure communication, reserving room before encryption, least significant bits, image encryption, reversible data hiding
Procedia PDF Downloads 41224905 Identity Verification Using k-NN Classifiers and Autistic Genetic Data
Authors: Fuad M. Alkoot
Abstract:
DNA data have been used in forensics for decades. However, current research looks at using the DNA as a biometric identity verification modality. The goal is to improve the speed of identification. We aim at using gene data that was initially used for autism detection to find if and how accurate is this data for identification applications. Mainly our goal is to find if our data preprocessing technique yields data useful as a biometric identification tool. We experiment with using the nearest neighbor classifier to identify subjects. Results show that optimal classification rate is achieved when the test set is corrupted by normally distributed noise with zero mean and standard deviation of 1. The classification rate is close to optimal at higher noise standard deviation reaching 3. This shows that the data can be used for identity verification with high accuracy using a simple classifier such as the k-nearest neighbor (k-NN).Keywords: biometrics, genetic data, identity verification, k nearest neighbor
Procedia PDF Downloads 25824904 Models to Calculate Lattice Spacing, Melting Point and Lattice Thermal Expansion of Ga₂Se₃ Nanoparticles
Authors: Mustafa Saeed Omar
Abstract:
The formula which contains the maximum increase of mean bond length, melting entropy and critical particle radius is used to calculate lattice volume in nanoscale size crystals of Ga₂Se₃. This compound belongs to the binary group of III₂VI₃. The critical radius is calculated from the values of the first surface atomic layer height which is equal to 0.336nm. The size-dependent mean bond length is calculated by using an equation-free from fitting parameters. The size-dependent lattice parameter then is accordingly used to calculate the size-dependent lattice volume. The lattice size in the nanoscale region increases to about 77.6 A³, which is up to four times of its bulk state value 19.97 A³. From the values of the nanosize scale dependence of lattice volume, the nanoscale size dependence of melting temperatures is calculated. The melting temperature decreases with the nanoparticles size reduction, it becomes zero when the radius reaches to its critical value. Bulk melting temperature for Ga₂Se₃, for example, has values of 1293 K. From the size-dependent melting temperature and mean bond length, the size-dependent lattice thermal expansion is calculated. Lattice thermal expansion decreases with the decrease of nanoparticles size and reaches to its minimum value as the radius drops down to about 5nm.Keywords: Ga₂Se₃, lattice volume, lattice thermal expansion, melting point, nanoparticles
Procedia PDF Downloads 16824903 Advanced Technologies for Detector Readout in Particle Physics
Authors: Y. Venturini, C. Tintori
Abstract:
Given the continuous demand for improved readout performances in particle and dark matter physics, CAEN SpA is pushing on the development of advanced technologies for detector readout. We present the Digitizers 2.0, the result of the success of the previous Digitizers generation, combined with expanded capabilities and a renovation of the user experience introducing the open FPGA. The first product of the family is the VX2740 (64 ch, 125 MS/s, 16 bit) for advanced waveform recording and Digital Pulse Processing, fitting with the special requirements of Dark Matter and Neutrino experiments. In parallel, CAEN is developing the FERS-5200 platform, a Front-End Readout System designed to read out large multi-detector arrays, such as SiPMs, multi-anode PMTs, silicon strip detectors, wire chambers, GEM, gas tubes, and others. This is a highly-scalable distributed platform, based on small Front-End cards synchronized and read out by a concentrator board, allowing to build extremely large experimental setup. We plan to develop a complete family of cost-effective Front-End cards tailored to specific detectors and applications. The first one available is the A5202, a 64-channel unit for SiPM readout based on CITIROC ASIC by Weeroc.Keywords: dark matter, digitizers, front-end electronics, open FPGA, SiPM
Procedia PDF Downloads 12824902 A Review on Intelligent Systems for Geoscience
Authors: R Palson Kennedy, P.Kiran Sai
Abstract:
This article introduces machine learning (ML) researchers to the hurdles that geoscience problems present, as well as the opportunities for improvement in both ML and geosciences. This article presents a review from the data life cycle perspective to meet that need. Numerous facets of geosciences present unique difficulties for the study of intelligent systems. Geosciences data is notoriously difficult to analyze since it is frequently unpredictable, intermittent, sparse, multi-resolution, and multi-scale. The first half addresses data science’s essential concepts and theoretical underpinnings, while the second section contains key themes and sharing experiences from current publications focused on each stage of the data life cycle. Finally, themes such as open science, smart data, and team science are considered.Keywords: Data science, intelligent system, machine learning, big data, data life cycle, recent development, geo science
Procedia PDF Downloads 13524901 Data Quality as a Pillar of Data-Driven Organizations: Exploring the Benefits of Data Mesh
Authors: Marc Bachelet, Abhijit Kumar Chatterjee, José Manuel Avila
Abstract:
Data quality is a key component of any data-driven organization. Without data quality, organizations cannot effectively make data-driven decisions, which often leads to poor business performance. Therefore, it is important for an organization to ensure that the data they use is of high quality. This is where the concept of data mesh comes in. Data mesh is an organizational and architectural decentralized approach to data management that can help organizations improve the quality of data. The concept of data mesh was first introduced in 2020. Its purpose is to decentralize data ownership, making it easier for domain experts to manage the data. This can help organizations improve data quality by reducing the reliance on centralized data teams and allowing domain experts to take charge of their data. This paper intends to discuss how a set of elements, including data mesh, are tools capable of increasing data quality. One of the key benefits of data mesh is improved metadata management. In a traditional data architecture, metadata management is typically centralized, which can lead to data silos and poor data quality. With data mesh, metadata is managed in a decentralized manner, ensuring accurate and up-to-date metadata, thereby improving data quality. Another benefit of data mesh is the clarification of roles and responsibilities. In a traditional data architecture, data teams are responsible for managing all aspects of data, which can lead to confusion and ambiguity in responsibilities. With data mesh, domain experts are responsible for managing their own data, which can help provide clarity in roles and responsibilities and improve data quality. Additionally, data mesh can also contribute to a new form of organization that is more agile and adaptable. By decentralizing data ownership, organizations can respond more quickly to changes in their business environment, which in turn can help improve overall performance by allowing better insights into business as an effect of better reports and visualization tools. Monitoring and analytics are also important aspects of data quality. With data mesh, monitoring, and analytics are decentralized, allowing domain experts to monitor and analyze their own data. This will help in identifying and addressing data quality problems in quick time, leading to improved data quality. Data culture is another major aspect of data quality. With data mesh, domain experts are encouraged to take ownership of their data, which can help create a data-driven culture within the organization. This can lead to improved data quality and better business outcomes. Finally, the paper explores the contribution of AI in the coming years. AI can help enhance data quality by automating many data-related tasks, like data cleaning and data validation. By integrating AI into data mesh, organizations can further enhance the quality of their data. The concepts mentioned above are illustrated by AEKIDEN experience feedback. AEKIDEN is an international data-driven consultancy that has successfully implemented a data mesh approach. By sharing their experience, AEKIDEN can help other organizations understand the benefits and challenges of implementing data mesh and improving data quality.Keywords: data culture, data-driven organization, data mesh, data quality for business success
Procedia PDF Downloads 13524900 Big Data Analysis with RHadoop
Authors: Ji Eun Shin, Byung Ho Jung, Dong Hoon Lim
Abstract:
It is almost impossible to store or analyze big data increasing exponentially with traditional technologies. Hadoop is a new technology to make that possible. R programming language is by far the most popular statistical tool for big data analysis based on distributed processing with Hadoop technology. With RHadoop that integrates R and Hadoop environment, we implemented parallel multiple regression analysis with different sizes of actual data. Experimental results showed our RHadoop system was much faster as the number of data nodes increases. We also compared the performance of our RHadoop with lm function and big lm packages available on big memory. The results showed that our RHadoop was faster than other packages owing to paralleling processing with increasing the number of map tasks as the size of data increases.Keywords: big data, Hadoop, parallel regression analysis, R, RHadoop
Procedia PDF Downloads 43724899 A Mutually Exclusive Task Generation Method Based on Data Augmentation
Authors: Haojie Wang, Xun Li, Rui Yin
Abstract:
In order to solve the memorization overfitting in the meta-learning MAML algorithm, a method of generating mutually exclusive tasks based on data augmentation is proposed. This method generates a mutex task by corresponding one feature of the data to multiple labels, so that the generated mutex task is inconsistent with the data distribution in the initial dataset. Because generating mutex tasks for all data will produce a large number of invalid data and, in the worst case, lead to exponential growth of computation, this paper also proposes a key data extraction method, that only extracts part of the data to generate the mutex task. The experiments show that the method of generating mutually exclusive tasks can effectively solve the memorization overfitting in the meta-learning MAML algorithm.Keywords: data augmentation, mutex task generation, meta-learning, text classification.
Procedia PDF Downloads 9324898 Efficient Positioning of Data Aggregation Point for Wireless Sensor Network
Authors: Sifat Rahman Ahona, Rifat Tasnim, Naima Hassan
Abstract:
Data aggregation is a helpful technique for reducing the data communication overhead in wireless sensor network. One of the important tasks of data aggregation is positioning of the aggregator points. There are a lot of works done on data aggregation. But, efficient positioning of the aggregators points is not focused so much. In this paper, authors are focusing on the positioning or the placement of the aggregation points in wireless sensor network. Authors proposed an algorithm to select the aggregators positions for a scenario where aggregator nodes are more powerful than sensor nodes.Keywords: aggregation point, data communication, data aggregation, wireless sensor network
Procedia PDF Downloads 15724897 Spatial Econometric Approaches for Count Data: An Overview and New Directions
Authors: Paula Simões, Isabel Natário
Abstract:
This paper reviews a number of theoretical aspects for implementing an explicit spatial perspective in econometrics for modelling non-continuous data, in general, and count data, in particular. It provides an overview of the several spatial econometric approaches that are available to model data that are collected with reference to location in space, from the classical spatial econometrics approaches to the recent developments on spatial econometrics to model count data, in a Bayesian hierarchical setting. Considerable attention is paid to the inferential framework, necessary for structural consistent spatial econometric count models, incorporating spatial lag autocorrelation, to the corresponding estimation and testing procedures for different assumptions, to the constrains and implications embedded in the various specifications in the literature. This review combines insights from the classical spatial econometrics literature as well as from hierarchical modeling and analysis of spatial data, in order to look for new possible directions on the processing of count data, in a spatial hierarchical Bayesian econometric context.Keywords: spatial data analysis, spatial econometrics, Bayesian hierarchical models, count data
Procedia PDF Downloads 59324896 A NoSQL Based Approach for Real-Time Managing of Robotics's Data
Authors: Gueidi Afef, Gharsellaoui Hamza, Ben Ahmed Samir
Abstract:
This paper deals with the secret of the continual progression data that new data management solutions have been emerged: The NoSQL databases. They crossed several areas like personalization, profile management, big data in real-time, content management, catalog, view of customers, mobile applications, internet of things, digital communication and fraud detection. Nowadays, these database management systems are increasing. These systems store data very well and with the trend of big data, a new challenge’s store demands new structures and methods for managing enterprise data. The new intelligent machine in the e-learning sector, thrives on more data, so smart machines can learn more and faster. The robotics are our use case to focus on our test. The implementation of NoSQL for Robotics wrestle all the data they acquire into usable form because with the ordinary type of robotics; we are facing very big limits to manage and find the exact information in real-time. Our original proposed approach was demonstrated by experimental studies and running example used as a use case.Keywords: NoSQL databases, database management systems, robotics, big data
Procedia PDF Downloads 35424895 Fuzzy Optimization Multi-Objective Clustering Ensemble Model for Multi-Source Data Analysis
Authors: C. B. Le, V. N. Pham
Abstract:
In modern data analysis, multi-source data appears more and more in real applications. Multi-source data clustering has emerged as a important issue in the data mining and machine learning community. Different data sources provide information about different data. Therefore, multi-source data linking is essential to improve clustering performance. However, in practice multi-source data is often heterogeneous, uncertain, and large. This issue is considered a major challenge from multi-source data. Ensemble is a versatile machine learning model in which learning techniques can work in parallel, with big data. Clustering ensemble has been shown to outperform any standard clustering algorithm in terms of accuracy and robustness. However, most of the traditional clustering ensemble approaches are based on single-objective function and single-source data. This paper proposes a new clustering ensemble method for multi-source data analysis. The fuzzy optimized multi-objective clustering ensemble method is called FOMOCE. Firstly, a clustering ensemble mathematical model based on the structure of multi-objective clustering function, multi-source data, and dark knowledge is introduced. Then, rules for extracting dark knowledge from the input data, clustering algorithms, and base clusterings are designed and applied. Finally, a clustering ensemble algorithm is proposed for multi-source data analysis. The experiments were performed on the standard sample data set. The experimental results demonstrate the superior performance of the FOMOCE method compared to the existing clustering ensemble methods and multi-source clustering methods.Keywords: clustering ensemble, multi-source, multi-objective, fuzzy clustering
Procedia PDF Downloads 18924894 Estimates of (Co)Variance Components and Genetic Parameters for Body Weights and Growth Efficiency Traits in the New Zealand White Rabbits
Authors: M. Sakthivel, A. Devaki, D. Balasubramanyam, P. Kumarasamy, A. Raja, R. Anilkumar, H. Gopi
Abstract:
The genetic parameters of growth traits in the New Zealand White rabbits maintained at Sheep Breeding and Research Station, Sandynallah, The Nilgiris, India were estimated by partitioning the variance and covariance components. The (co)variance components of body weights at weaning (W42), post-weaning (W70) and marketing (W135) age and growth efficiency traits viz., average daily gain (ADG), relative growth rate (RGR) and Kleiber ratio (KR) estimated on a daily basis at different age intervals (1=42 to 70 days; 2=70 to 135 days and 3=42 to 135 days) from weaning to marketing were estimated by restricted maximum likelihood, fitting six animal models with various combinations of direct and maternal effects. Data were collected over a period of 15 years (1998 to 2012). A log-likelihood ratio test was used to select the most appropriate univariate model for each trait, which was subsequently used in bivariate analysis. Heritability estimates for W42, W70 and W135 were 0.42 ± 0.07, 0.40 ± 0.08 and 0.27 ± 0.07, respectively. Heritability estimates of growth efficiency traits were moderate to high (0.18 to 0.42). Of the total phenotypic variation, maternal genetic effect contributed 14 to 32% for early body weight traits (W42 and W70) and ADG1. The contribution of maternal permanent environmental effect varied from 6 to 18% for W42 and for all the growth efficiency traits except for KR2. Maternal permanent environmental effect on most of the growth efficiency traits was a carryover effect of maternal care during weaning. Direct maternal genetic correlations, for the traits in which maternal genetic effect was significant, were moderate to high in magnitude and negative in direction. Maternal effect declined as the age of the animal increased. The estimates of total heritability and maternal across year repeatability for growth traits were moderate and an optimum rate of genetic progress seems possible in the herd by mass selection. The estimates of genetic and phenotypic correlations among body weight traits were moderate to high and positive; among growth efficiency traits were low to high with varying directions; between body weights and growth efficiency traits were very low to high in magnitude and mostly negative in direction. Moderate to high heritability and higher genetic correlation in body weight traits promise good scope for genetic improvement provided measures are taken to keep the inbreeding at the lowest level.Keywords: genetic parameters, growth traits, maternal effects, rabbit genetics
Procedia PDF Downloads 44724893 A Mutually Exclusive Task Generation Method Based on Data Augmentation
Authors: Haojie Wang, Xun Li, Rui Yin
Abstract:
In order to solve the memorization overfitting in the model-agnostic meta-learning MAML algorithm, a method of generating mutually exclusive tasks based on data augmentation is proposed. This method generates a mutex task by corresponding one feature of the data to multiple labels so that the generated mutex task is inconsistent with the data distribution in the initial dataset. Because generating mutex tasks for all data will produce a large number of invalid data and, in the worst case, lead to an exponential growth of computation, this paper also proposes a key data extraction method that only extract part of the data to generate the mutex task. The experiments show that the method of generating mutually exclusive tasks can effectively solve the memorization overfitting in the meta-learning MAML algorithm.Keywords: mutex task generation, data augmentation, meta-learning, text classification.
Procedia PDF Downloads 14324892 Probability Modeling and Genetic Algorithms in Small Wind Turbine Design Optimization: Mentored Interdisciplinary Undergraduate Research at LaGuardia Community College
Authors: Marina Nechayeva, Malgorzata Marciniak, Vladimir Przhebelskiy, A. Dragutan, S. Lamichhane, S. Oikawa
Abstract:
This presentation is a progress report on a faculty-student research collaboration at CUNY LaGuardia Community College (LaGCC) aimed at designing a small horizontal axis wind turbine optimized for the wind patterns on the roof of our campus. Our project combines statistical and engineering research. Our wind modeling protocol is based upon a recent wind study by a faculty-student research group at MIT, and some of our blade design methods are adopted from a senior engineering project at CUNY City College. Our use of genetic algorithms has been inspired by the work on small wind turbines’ design by David Wood. We combine these diverse approaches in our interdisciplinary project in a way that has not been done before and improve upon certain techniques used by our predecessors. We employ several estimation methods to determine the best fitting parametric probability distribution model for the local wind speed data obtained through correlating short-term on-site measurements with a long-term time series at the nearby airport. The model serves as a foundation for engineering research that focuses on adapting and implementing genetic algorithms (GAs) to engineering optimization of the wind turbine design using Blade Element Momentum Theory. GAs are used to create new airfoils with desirable aerodynamic specifications. Small scale models of best performing designs are 3D printed and tested in the wind tunnel to verify the accuracy of relevant calculations. Genetic algorithms are applied to selected airfoils to determine the blade design (radial cord and pitch distribution) that would optimize the coefficient of power profile of the turbine. Our approach improves upon the traditional blade design methods in that it lets us dispense with assumptions necessary to simplify the system of Blade Element Momentum Theory equations, thus resulting in more accurate aerodynamic performance calculations. Furthermore, it enables us to design blades optimized for a whole range of wind speeds rather than a single value. Lastly, we improve upon known GA-based methods in that our algorithms are constructed to work with XFoil generated airfoils data which enables us to optimize blades using our own high glide ratio airfoil designs, without having to rely upon available empirical data from existing airfoils, such as NACA series. Beyond its immediate goal, this ongoing project serves as a training and selection platform for CUNY Research Scholars Program (CRSP) through its annual Aerodynamics and Wind Energy Research Seminar (AWERS), an undergraduate summer research boot camp, designed to introduce prospective researchers to the relevant theoretical background and methodology, get them up to speed with the current state of our research, and test their abilities and commitment to the program. Furthermore, several aspects of the research (e.g., writing code for 3D printing of airfoils) are adapted in the form of classroom research activities to enhance Calculus sequence instruction at LaGCC.Keywords: engineering design optimization, genetic algorithms, horizontal axis wind turbine, wind modeling
Procedia PDF Downloads 23124891 Revolutionizing Traditional Farming Using Big Data/Cloud Computing: A Review on Vertical Farming
Authors: Milind Chaudhari, Suhail Balasinor
Abstract:
Due to massive deforestation and an ever-increasing population, the organic content of the soil is depleting at a much faster rate. Due to this, there is a big chance that the entire food production in the world will drop by 40% in the next two decades. Vertical farming can help in aiding food production by leveraging big data and cloud computing to ensure plants are grown naturally by providing the optimum nutrients sunlight by analyzing millions of data points. This paper outlines the most important parameters in vertical farming and how a combination of big data and AI helps in calculating and analyzing these millions of data points. Finally, the paper outlines how different organizations are controlling the indoor environment by leveraging big data in enhancing food quantity and quality.Keywords: big data, IoT, vertical farming, indoor farming
Procedia PDF Downloads 17524890 Data Challenges Facing Implementation of Road Safety Management Systems in Egypt
Authors: A. Anis, W. Bekheet, A. El Hakim
Abstract:
Implementing a Road Safety Management System (SMS) in a crowded developing country such as Egypt is a necessity. Beginning a sustainable SMS requires a comprehensive reliable data system for all information pertinent to road crashes. In this paper, a survey for the available data in Egypt and validating it for using in an SMS in Egypt. The research provides some missing data, and refer to the unavailable data in Egypt, looking forward to the contribution of the scientific society, the authorities, and the public in solving the problem of missing or unreliable crash data. The required data for implementing an SMS in Egypt are divided into three categories; the first is available data such as fatality and injury rates and it is proven in this research that it may be inconsistent and unreliable, the second category of data is not available, but it may be estimated, an example of estimating vehicle cost is available in this research, the third is not available and can be measured case by case such as the functional and geometric properties of a facility. Some inquiries are provided in this research for the scientific society, such as how to improve the links among stakeholders of road safety in order to obtain a consistent, non-biased, and reliable data system.Keywords: road safety management system, road crash, road fatality, road injury
Procedia PDF Downloads 147