Commenced in January 2007

Frequency: Monthly

Edition: International

Paper Count: 24800

Search results for: data clustering

24260 Analysis of Different Classification Techniques Using WEKA for Diabetic Disease

Abstract:

Data mining is the process of analyze data which are used to predict helpful information. It is the field of research which solve various type of problem. In data mining, classification is an important technique to classify different kind of data. Diabetes is most common disease. This paper implements different classification technique using Waikato Environment for Knowledge Analysis (WEKA) on diabetes dataset and find which algorithm is suitable for working. The best classification algorithm based on diabetic data is Naïve Bayes. The accuracy of Naïve Bayes is 76.31% and take 0.06 seconds to build the model.

Keywords: data mining, classification, diabetes, WEKA

Procedia PDF Downloads 134

24259 Genetic Structuring of Four Tectona grandis L. F. Seed Production Areas in Southern India

Authors: P. M. Sreekanth

Abstract:

Teak (Tectona grandis L. f.) is a tree species indigenous to India and other Southeastern countries. It produces high-value timber and is easily established in plantations. Reforestation requires a constant supply of high quality seeds. Seed Production Areas (SPA) of teak are improved stands used for collection of open-pollinated quality seeds in large quantities. Information on the genetic diversity of major teak SPAs in India is scanty. The genetic structure of four important seed production areas of Kerala State in Southern India was analyzed employing amplified fragment length polymorphism markers using ten selective primer combinations on 80 samples (4 populations X 20 trees). The study revealed that the gene diversity of the SPAs varied from 0.169 (Konni SPA) to 0.203 (Wayanad SPA). The percentage of polymorphic loci ranged from 74.42 (Parambikulam SPA) to 84.06 (Konni SPA). The mean total gene diversity index (HT) of all the four SPAs was 0.2296 ±0.02. A high proportion of genetic diversity was observed within the populations (83%) while diversity between populations was lower (17%) (GST = 0.17). Principal coordinate analysis and STRUCTURE analysis of the genotypes indicated that the pattern of clustering was in accordance with the origin and geographic location of SPAs, indicating specific identity of each population. A UPGMA dendrogram was prepared and showed that all the twenty samples from each of Konni and Parambikulam SPAs clustered into two separate groups, respectively. However, five Nilambur genotypes and one Wayanad genotype intruded into the Konni cluster. The higher gene flow estimated (Nm = 2.4) reflected the inclusion of Konni origin planting stock in the Nilambur and Wayanad plantations. Evidence for population structure investigated using 3D Principal Coordinate Analysis of FAMD software 1.30 indicated that the pattern of clustering was in accordance with the origin of SPAs. The present study showed that assessment of genetic diversity in seed production plantations can be achieved using AFLP markers. The AFLP fingerprinting was also capable of identifying the geographical origin of planting stock and there by revealing the occurrence of the errors in genotype labeling. Molecular marker-based selective culling of genetically similar trees from a stand so as to increase the genetic base of seed production areas could be a new proposition to improve quality of seeds required for raising commercial plantations of teak. The technique can also be used to assess the genetic diversity status of plus trees within provenances during their selection for raising clonal seed orchards for assuring the quality of seeds available for raising future plantations.

Keywords: AFLP, genetic structure, spa, teak

Procedia PDF Downloads 302

24258 Comprehensive Study of Data Science

Authors: Asifa Amara, Prachi Singh, Kanishka, Debargho Pathak, Akshat Kumar, Jayakumar Eravelly

Abstract:

Today's generation is totally dependent on technology that uses data as its fuel. The present study is all about innovations and developments in data science and gives an idea about how efficiently to use the data provided. This study will help to understand the core concepts of data science. The concept of artificial intelligence was introduced by Alan Turing in which the main principle was to create an artificial system that can run independently of human-given programs and can function with the help of analyzing data to understand the requirements of the users. Data science comprises business understanding, analyzing data, ethical concerns, understanding programming languages, various fields and sources of data, skills, etc. The usage of data science has evolved over the years. In this review article, we have covered a part of data science, i.e., machine learning. Machine learning uses data science for its work. Machines learn through their experience, which helps them to do any work more efficiently. This article includes a comparative study image between human understanding and machine understanding, advantages, applications, and real-time examples of machine learning. Data science is an important game changer in the life of human beings. Since the advent of data science, we have found its benefits and how it leads to a better understanding of people, and how it cherishes individual needs. It has improved business strategies, services provided by them, forecasting, the ability to attend sustainable developments, etc. This study also focuses on a better understanding of data science which will help us to create a better world.

Keywords: data science, machine learning, data analytics, artificial intelligence

Procedia PDF Downloads 69

24257 Cardiac Biosignal and Adaptation in Confined Nuclear Submarine Patrol

Authors: B. Lefranc, C. Aufauvre-Poupon, C. Martin-Krumm, M. Trousselard

Abstract:

Isolated and confined environments (ICE) present several challenges which may adversely affect human’s psychology and physiology. Submariners in Sub-Surface Ballistic Nuclear (SSBN) mission exposed to these environmental constraints must be able to perform complex tasks as part of their normal duties, as well as during crisis periods when emergency actions are required or imminent. The operational and environmental constraints they face contribute to challenge human adaptability. The impact of such a constrained environment has yet to be explored. Establishing a knowledge framework is a determining factor, particularly in view of the next long space travels. Ensuring that the crews are maintained in optimal operational conditions is a real challenge because the success of the mission depends on them. This study focused on the evaluation of the impact of stress on mental health and sensory degradation of submariners during a mission on SSBN using cardiac biosignal (heart rate variability, HRV) clustering. This is a pragmatic exploratory study of a prospective cohort included 19 submariner volunteers. HRV was recorded at baseline to classify by clustering the submariners according to their stress level based on parasympathetic (Pa) activity. Impacts of high Pa (HPa) versus low Pa (LPa) level at baseline were assessed on emotional state and sensory perception (interoception and exteroception) as a cardiac biosignal during the patrol and at a recovery time one month after. Whatever the time, no significant difference was found in mental health between groups. There are significant differences in the interoceptive, exteroceptive and physiological functioning during the patrol and at recovery time. To sum up, compared to the LPa group, the HPa maintains a higher level in psychosensory functioning during the patrol and at recovery but exhibits a decrease in Pa level. The HPa group has less adaptable HRV characteristics, less unpredictability and flexibility of cardiac biosignals while the LPa group increases them during the patrol and at recovery time. This dissociation between psychosensory and physiological adaptation suggests two treatment modalities for ICE environments. To our best knowledge, our results are the first to highlight the impact of physiological differences in the HRV profile on the adaptability of submariners. Further studies are needed to evaluate the negative emotional and cognitive effects of ICEs based on the cardiac profile. Artificial intelligence offers a promising future for maintaining high level of operational conditions. These future perspectives will not only allow submariners to be better prepared, but also to design feasible countermeasures that will help support analog environments that bring us closer to a trip to Mars.

Keywords: adaptation, exteroception, HRV, ICE, interoception, SSBN

Procedia PDF Downloads 161

24256 Microstructure Evolution and Pre-transformation Microstructure Reconstruction in Ti-6Al-4V Alloy

Authors: Shreyash Hadke, Manendra Singh Parihar, Rajesh Khatirkar

Abstract:

In the present investigation, the variation in the microstructure with the changes in the heat treatment conditions i.e. temperature and time was observed. Ti-6Al-4V alloy was subject to solution annealing treatments in β (1066C) and α+β phase (930C and 850C) followed by quenching, air cooling and furnace cooling to room temperature respectively. The effect of solution annealing and cooling on the microstructure was studied by using optical microscopy (OM), scanning electron microscopy (SEM), electron backscattered diffraction (EBSD) and x-ray diffraction (XRD). The chemical composition of the β phase for different conditions was determined with the help of energy dispersive spectrometer (EDS) attached to SEM. Furnace cooling resulted in the development of coarser structure (α+β), while air cooling resulted in much finer structure with widmanstatten morphology of α at the grain boundaries. Quenching from solution annealing temperature formed α’ martensite, their proportion being dependent on the temperature in β phase field. It is well known that the transformation of β to α follows Burger orientation relationship (OR). In order to reconstruct the microstructure of parent β phase, a MATLAB code was written using neighbor-to-neighbor, triplet method and Tari’s method. The code was tested on the annealed samples (1066C solution annealing temperature followed by furnace cooling to room temperature). The parent phase data thus generated was then plotted using the TSL-OIM software. The reconstruction results of the above methods were compared and analyzed. The Tari’s approach (clustering approach) gave better results compared to neighbor-to-neighbor and triplet method but the time taken by the triplet method was least compared to the other two methods.

Keywords: Ti-6Al-4V alloy, microstructure, electron backscattered diffraction, parent phase reconstruction

Procedia PDF Downloads 437

24255 Estimation of Source Parameters and Moment Tensor Solution through Waveform Modeling of 2013 Kishtwar Earthquake

Authors: Shveta Puri, Shiv Jyoti Pandey, G. M. Bhat, Neha Raina

Abstract:

TheJammu and Kashmir region of the Northwest Himalaya had witnessed many devastating earthquakes in the recent past and has remained unexplored for any kind of seismic investigations except scanty records of the earthquakes that occurred in this region in the past. In this study, we have used local seismic data of year 2013 that was recorded by the network of Broadband Seismographs in J&K. During this period, our seismic stations recorded about 207 earthquakes including two moderate events of Mw 5.7 on 1st May, 2013 and Mw 5.1 of 2nd August, 2013.We analyzed the events of Mw 3-4.6 and the main events only (for minimizing the error) for source parameters, b value and sense of movement through waveform modeling for understanding seismotectonic and seismic hazard of the region. It has been observed that most of the events are bounded between 32.9° N – 33.3° N latitude and 75.4° E – 76.1° E longitudes, Moment Magnitude (Mw) ranges from Mw 3 to 5.7, Source radius (r), from 0.21 to 3.5 km, stress drop, from 1.90 bars to 71.1 bars and Corner frequency, from 0.39 – 6.06 Hz. The b-value for this region was found to be 0.83±0 from these events which are lower than the normal value (b=1), indicating the area is under high stress. The travel time inversion and waveform inversion method suggest focal depth up to 10 km probably above the detachment depth of the Himalayan region. Moment tensor solution of the (Mw 5.1, 02:32:47 UTC) main event of 2ndAugust suggested that the source fault is striking at 295° with dip of 33° and rake value of 85°. It was found that these events form intense clustering of small to moderate events within a narrow zone between Panjal Thrust and Kishtwar Window. Moment tensor solution of the main events and their aftershocks indicating thrust type of movement is occurring in this region.

Keywords: b-value, moment tensor, seismotectonics, source parameters

Procedia PDF Downloads 305

24254 Application of Artificial Neural Network Technique for Diagnosing Asthma

Authors: Azadeh Bashiri

Abstract:

Introduction: Lack of proper diagnosis and inadequate treatment of asthma leads to physical and financial complications. This study aimed to use data mining techniques and creating a neural network intelligent system for diagnosis of asthma. Methods: The study population is the patients who had visited one of the Lung Clinics in Tehran. Data were analyzed using the SPSS statistical tool and the chi-square Pearson's coefficient was the basis of decision making for data ranking. The considered neural network is trained using back propagation learning technique. Results: According to the analysis performed by means of SPSS to select the top factors, 13 effective factors were selected, in different performances, data was mixed in various forms, so the different models were made for training the data and testing networks and in all different modes, the network was able to predict correctly 100% of all cases. Conclusion: Using data mining methods before the design structure of system, aimed to reduce the data dimension and the optimum choice of the data, will lead to a more accurate system. Therefore, considering the data mining approaches due to the nature of medical data is necessary.

Keywords: asthma, data mining, Artificial Neural Network, intelligent system

Procedia PDF Downloads 261

24253 Using Nonhomogeneous Poisson Process with Compound Distribution to Price Catastrophe Options

Authors: Rong-Tsorng Wang

Abstract:

In this paper, we derive a pricing formula for catastrophe equity put options (or CatEPut) with non-homogeneous loss and approximated compound distributions. We assume that the loss claims arrival process is a nonhomogeneous Poisson process (NHPP) representing the clustering occurrences of loss claims, the size of loss claims is a sequence of independent and identically distributed random variables, and the accumulated loss distribution forms a compound distribution and is approximated by a heavy-tailed distribution. A numerical example is given to calibrate parameters, and we discuss how the value of CatEPut is affected by the changes of parameters in the pricing model we provided.

Keywords: catastrophe equity put options, compound distributions, nonhomogeneous Poisson process, pricing model

Procedia PDF Downloads 154

24252 Optimized Cluster Head Selection Algorithm Based on LEACH Protocol for Wireless Sensor Networks

Authors: Wided Abidi, Tahar Ezzedine

Abstract:

Low-Energy Adaptive Clustering Hierarchy (LEACH) has been considered as one of the effective hierarchical routing algorithms that optimize energy and prolong the lifetime of network. Since the selection of Cluster Head (CH) in LEACH is carried out randomly, in this paper, we propose an approach of electing CH based on LEACH protocol. In other words, we present a formula for calculating the threshold responsible for CH election. In fact, we adopt three principle criteria: the remaining energy of node, the number of neighbors within cluster range and the distance between node and CH. Simulation results show that our proposed approach beats LEACH protocol in regards of prolonging the lifetime of network and saving residual energy.

Keywords: wireless sensors networks, LEACH protocol, cluster head election, energy efficiency

Procedia PDF Downloads 321

24251 Interpreting Privacy Harms from a Non-Economic Perspective

Authors: Christopher Muhawe, Masooda Bashir

Abstract:

With increased Internet Communication Technology(ICT), the virtual world has become the new normal. At the same time, there is an unprecedented collection of massive amounts of data by both private and public entities. Unfortunately, this increase in data collection has been in tandem with an increase in data misuse and data breach. Regrettably, the majority of data breach and data misuse claims have been unsuccessful in the United States courts for the failure of proof of direct injury to physical or economic interests. The requirement to express data privacy harms from an economic or physical stance negates the fact that not all data harms are physical or economic in nature. The challenge is compounded by the fact that data breach harms and risks do not attach immediately. This research will use a descriptive and normative approach to show that not all data harms can be expressed in economic or physical terms. Expressing privacy harms purely from an economic or physical harm perspective negates the fact that data insecurity may result into harms which run counter the functions of privacy in our lives. The promotion of liberty, selfhood, autonomy, promotion of human social relations and the furtherance of the existence of a free society. There is no economic value that can be placed on these functions of privacy. The proposed approach addresses data harms from a psychological and social perspective.

Keywords: data breach and misuse, economic harms, privacy harms, psychological harms

Procedia PDF Downloads 183

24250 Machine Learning Analysis of Student Success in Introductory Calculus Based Physics I Course

Authors: Chandra Prayaga, Aaron Wade, Lakshmi Prayaga, Gopi Shankar Mallu

Abstract:

This paper presents the use of machine learning algorithms to predict the success of students in an introductory physics course. Data having 140 rows pertaining to the performance of two batches of students was used. The lack of sufficient data to train robust machine learning models was compensated for by generating synthetic data similar to the real data. CTGAN and CTGAN with Gaussian Copula (Gaussian) were used to generate synthetic data, with the real data as input. To check the similarity between the real data and each synthetic dataset, pair plots were made. The synthetic data was used to train machine learning models using the PyCaret package. For the CTGAN data, the Ada Boost Classifier (ADA) was found to be the ML model with the best fit, whereas the CTGAN with Gaussian Copula yielded Logistic Regression (LR) as the best model. Both models were then tested for accuracy with the real data. ROC-AUC analysis was performed for all the ten classes of the target variable (Grades A, A-, B+, B, B-, C+, C, C-, D, F). The ADA model with CTGAN data showed a mean AUC score of 0.4377, but the LR model with the Gaussian data showed a mean AUC score of 0.6149. ROC-AUC plots were obtained for each Grade value separately. The LR model with Gaussian data showed consistently better AUC scores compared to the ADA model with CTGAN data, except in two cases of the Grade value, C- and A-.

Keywords: machine learning, student success, physics course, grades, synthetic data, CTGAN, gaussian copula CTGAN

Procedia PDF Downloads 32

24249 Efficient Subgoal Discovery for Hierarchical Reinforcement Learning Using Local Computations

Authors: Adrian Millea

Abstract:

In hierarchical reinforcement learning, one of the main issues encountered is the discovery of subgoal states or options (which are policies reaching subgoal states) by partitioning the environment in a meaningful way. This partitioning usually requires an expensive global clustering operation or eigendecomposition of the Laplacian of the states graph. We propose a local solution to this issue, much more efficient than algorithms using global information, which successfully discovers subgoal states by computing a simple function, which we call heterogeneity for each state as a function of its neighbors. Moreover, we construct a value function using the difference in heterogeneity from one step to the next, as reward, such that we are able to explore the state space much more efficiently than say epsilon-greedy. The same principle can then be applied to higher level of the hierarchy, where now states are subgoals discovered at the level below.

Keywords: exploration, hierarchical reinforcement learning, locality, options, value functions

Procedia PDF Downloads 155

24248 Data Access, AI Intensity, and Scale Advantages

Authors: Chuping Lo

Abstract:

This paper presents a simple model demonstrating that ceteris paribus countries with lower barriers to accessing global data tend to earn higher incomes than other countries. Therefore, large countries that inherently have greater data resources tend to have higher incomes than smaller countries, such that the former may be more hesitant than the latter to liberalize cross-border data flows to maintain this advantage. Furthermore, countries with higher artificial intelligence (AI) intensity in production technologies tend to benefit more from economies of scale in data aggregation, leading to higher income and more trade as they are better able to utilize global data.

Keywords: digital intensity, digital divide, international trade, scale of economics

Procedia PDF Downloads 54

24247 Secured Transmission and Reserving Space in Images Before Encryption to Embed Data

Authors: G. R. Navaneesh, E. Nagarajan, C. H. Rajam Raju

Abstract:

Nowadays the multimedia data are used to store some secure information. All previous methods allocate a space in image for data embedding purpose after encryption. In this paper, we propose a novel method by reserving space in image with a boundary surrounded before encryption with a traditional RDH algorithm, which makes it easy for the data hider to reversibly embed data in the encrypted images. The proposed method can achieve real time performance, that is, data extraction and image recovery are free of any error. A secure transmission process is also discussed in this paper, which improves the efficiency by ten times compared to other processes as discussed.

Keywords: secure communication, reserving room before encryption, least significant bits, image encryption, reversible data hiding

Procedia PDF Downloads 402

24246 Identity Verification Using k-NN Classifiers and Autistic Genetic Data

Authors: Fuad M. Alkoot

Abstract:

DNA data have been used in forensics for decades. However, current research looks at using the DNA as a biometric identity verification modality. The goal is to improve the speed of identification. We aim at using gene data that was initially used for autism detection to find if and how accurate is this data for identification applications. Mainly our goal is to find if our data preprocessing technique yields data useful as a biometric identification tool. We experiment with using the nearest neighbor classifier to identify subjects. Results show that optimal classification rate is achieved when the test set is corrupted by normally distributed noise with zero mean and standard deviation of 1. The classification rate is close to optimal at higher noise standard deviation reaching 3. This shows that the data can be used for identity verification with high accuracy using a simple classifier such as the k-nearest neighbor (k-NN).

Keywords: biometrics, genetic data, identity verification, k nearest neighbor

Procedia PDF Downloads 241

24245 A Review on Intelligent Systems for Geoscience

Authors: R Palson Kennedy, P.Kiran Sai

Abstract:

This article introduces machine learning (ML) researchers to the hurdles that geoscience problems present, as well as the opportunities for improvement in both ML and geosciences. This article presents a review from the data life cycle perspective to meet that need. Numerous facets of geosciences present unique difficulties for the study of intelligent systems. Geosciences data is notoriously difficult to analyze since it is frequently unpredictable, intermittent, sparse, multi-resolution, and multi-scale. The first half addresses data science’s essential concepts and theoretical underpinnings, while the second section contains key themes and sharing experiences from current publications focused on each stage of the data life cycle. Finally, themes such as open science, smart data, and team science are considered.

Keywords: Data science, intelligent system, machine learning, big data, data life cycle, recent development, geo science

Procedia PDF Downloads 124

24244 Data Quality as a Pillar of Data-Driven Organizations: Exploring the Benefits of Data Mesh

Authors: Marc Bachelet, Abhijit Kumar Chatterjee, José Manuel Avila

Abstract:

Data quality is a key component of any data-driven organization. Without data quality, organizations cannot effectively make data-driven decisions, which often leads to poor business performance. Therefore, it is important for an organization to ensure that the data they use is of high quality. This is where the concept of data mesh comes in. Data mesh is an organizational and architectural decentralized approach to data management that can help organizations improve the quality of data. The concept of data mesh was first introduced in 2020. Its purpose is to decentralize data ownership, making it easier for domain experts to manage the data. This can help organizations improve data quality by reducing the reliance on centralized data teams and allowing domain experts to take charge of their data. This paper intends to discuss how a set of elements, including data mesh, are tools capable of increasing data quality. One of the key benefits of data mesh is improved metadata management. In a traditional data architecture, metadata management is typically centralized, which can lead to data silos and poor data quality. With data mesh, metadata is managed in a decentralized manner, ensuring accurate and up-to-date metadata, thereby improving data quality. Another benefit of data mesh is the clarification of roles and responsibilities. In a traditional data architecture, data teams are responsible for managing all aspects of data, which can lead to confusion and ambiguity in responsibilities. With data mesh, domain experts are responsible for managing their own data, which can help provide clarity in roles and responsibilities and improve data quality. Additionally, data mesh can also contribute to a new form of organization that is more agile and adaptable. By decentralizing data ownership, organizations can respond more quickly to changes in their business environment, which in turn can help improve overall performance by allowing better insights into business as an effect of better reports and visualization tools. Monitoring and analytics are also important aspects of data quality. With data mesh, monitoring, and analytics are decentralized, allowing domain experts to monitor and analyze their own data. This will help in identifying and addressing data quality problems in quick time, leading to improved data quality. Data culture is another major aspect of data quality. With data mesh, domain experts are encouraged to take ownership of their data, which can help create a data-driven culture within the organization. This can lead to improved data quality and better business outcomes. Finally, the paper explores the contribution of AI in the coming years. AI can help enhance data quality by automating many data-related tasks, like data cleaning and data validation. By integrating AI into data mesh, organizations can further enhance the quality of their data. The concepts mentioned above are illustrated by AEKIDEN experience feedback. AEKIDEN is an international data-driven consultancy that has successfully implemented a data mesh approach. By sharing their experience, AEKIDEN can help other organizations understand the benefits and challenges of implementing data mesh and improving data quality.

Keywords: data culture, data-driven organization, data mesh, data quality for business success

Procedia PDF Downloads 121

24243 Investigation and Analysis of Residential Building Energy End-Use Profile in Hot and Humid Area with Reference to Zhuhai City in China

Authors: Qingqing Feng, S. Thomas Ng, Frank Xu

Abstract:

Energy consumption in domestic sector has been increasing rapidly in China all along these years. Confronted with environmental challenges, the international society has made a concerted effort by setting the Paris Agreement, the Sustainable Development Goals, and the New Urban Agenda. Thus it’s very important for China to put forward reasonable countermeasures to boost building energy conservation which necessitates looking into the actuality of residential energy end-use profile and its influence factors. In this study, questionnaire surveys have been conducted in Zhuhai city in China, a typical city in hot summer warm winter climate zone. The data solicited mainly include the occupancy schedule, building’s information, residents’ information, household energy uses, the type, quantity and use patterns of appliances and occupants’ satisfaction. Over 200 valid samples have been collected through face-to-face interviews. Descriptive analysis, clustering analysis, correlation analysis and sensitivity analysis were then conducted on the dataset to understand the energy end-use profile. The findings identify: 1) several typical clusters of occupancy patterns and appliances utilization patterns; 2) the top three sensitive factors influencing energy consumption; 3) the correlations between satisfaction and energy consumption. For China with many different climates zones, it’s difficult to find a silver bullet on energy conservation. The aim of this paper is to provide a theoretical basis for multi-stakeholders including policy makers, residents, and academic communities to formulate reasonable energy saving blueprints for hot and humid urban residential buildings in China.

Keywords: residential building, energy end-use profile, questionnaire survey, sustainability

Procedia PDF Downloads 117

24242 Genome-Wide Identification and Characterization of MLO Family Genes in Pumpkin (Cucurbita maxima Duch.)

Authors: Khin Thanda Win, Chunying Zhang, Sanghyeob Lee

Abstract:

Mildew resistance locus o (Mlo), a plant-specific gene family with seven-transmembrane (TM), plays an important role in plant resistance to powdery mildew (PM). PM caused by Podosphaera xanthii is a widespread plant disease and probably represents the major fungal threat for many Cucurbits. The recent Cucurbita maxima genome sequence data provides an opportunity to identify and characterize the MLO gene family in this species. Total twenty genes (designated CmaMLO1 through CmaMLO20) have been identified by using an in silico cloning method with the MLO gene sequences of Cucumis sativus, Cucumis melo, Citrullus lanatus and Cucurbita pepo as probes. These CmaMLOs were evenly distributed on 15 chromosomes of 20 C. maxima chromosomes without any obvious clustering. Multiple sequence alignment showed that the common structural features of MLO gene family, such as TM domains, a calmodulin-binding domain and 30 important amino acid residues for MLO function, were well conserved. Phylogenetic analysis of the CmaMLO genes and other plant species reveals seven different clades (I through VII) and only clade IV is specific to monocots (rice, barley, and wheat). Phylogenetic and structural analyses provided preliminary evidence that five genes belonged to clade V could be the susceptibility genes which may play the importance role in PM resistance. This study is the first comprehensive report on MLO genes in C. maxima to our knowledge. These findings will facilitate the functional analysis of the MLOs related to PM susceptibility and are valuable resources for the development of disease resistance in pumpkin.

Keywords: Mildew resistance locus o (Mlo), powdery mildew, phylogenetic relationship, susceptibility genes

Procedia PDF Downloads 172

24241 Big Data Analysis with RHadoop

Authors: Ji Eun Shin, Byung Ho Jung, Dong Hoon Lim

Abstract:

It is almost impossible to store or analyze big data increasing exponentially with traditional technologies. Hadoop is a new technology to make that possible. R programming language is by far the most popular statistical tool for big data analysis based on distributed processing with Hadoop technology. With RHadoop that integrates R and Hadoop environment, we implemented parallel multiple regression analysis with different sizes of actual data. Experimental results showed our RHadoop system was much faster as the number of data nodes increases. We also compared the performance of our RHadoop with lm function and big lm packages available on big memory. The results showed that our RHadoop was faster than other packages owing to paralleling processing with increasing the number of map tasks as the size of data increases.

Keywords: big data, Hadoop, parallel regression analysis, R, RHadoop

Procedia PDF Downloads 423

24240 A Mutually Exclusive Task Generation Method Based on Data Augmentation

Authors: Haojie Wang, Xun Li, Rui Yin

Abstract:

In order to solve the memorization overfitting in the meta-learning MAML algorithm, a method of generating mutually exclusive tasks based on data augmentation is proposed. This method generates a mutex task by corresponding one feature of the data to multiple labels, so that the generated mutex task is inconsistent with the data distribution in the initial dataset. Because generating mutex tasks for all data will produce a large number of invalid data and, in the worst case, lead to exponential growth of computation, this paper also proposes a key data extraction method, that only extracts part of the data to generate the mutex task. The experiments show that the method of generating mutually exclusive tasks can effectively solve the memorization overfitting in the meta-learning MAML algorithm.

Keywords: data augmentation, mutex task generation, meta-learning, text classification.

Procedia PDF Downloads 82

24239 Mechanisms and Regulation of the Bi-directional Motility of Mitotic Kinesin Nano-motors

Authors: Larisa Gheber

Abstract:

Mitosis is an essential process by which duplicated genetic information is transmitted from mother to daughter cells. Incorrect chromosome segregation during mitosis can lead to genetic diseases, chromosome instability and cancer. This process is mediated by a dynamic microtubule-based intracellular structure, the mitotic spindle. One of the major factors that govern the mitotic spindle dynamics are the kinesin-5 biological nano motors that were believed to move unidirectionally on the microtubule filaments, using ATP hydrolysis, thus performing essential functions in mitotic spindle dynamics. Surprisingly, several reports from our and other laboratories have demonstrated that some kinesin-5 motors are bi-directional: they move in minus-end direction on the microtubules as single-molecules and can switch directionality under a number of conditions. These findings broke a twenty-five-years old dogma regarding kinesin directionality (1, 2). The mechanism of this bi-directional motility and its physiological significance remain unclear. To address this unresolved problem, we apply an interdisciplinary approach combining live cell imaging, biophysical single molecule, and structural experiments to examine the activity of these motors and their mutated variants in vivo and in vitro. Our data shows that factors such as protein phosphorylation (3, 4), motor clustering on the microtubules (5, 6) and structural elements (7, 8) regulate the bi-directional motility of kinesin motors. We also show, using Cryo-EM, that bi-directional kinesin motors obtain non-canonical microtubule binding, which is essential to their special motile properties and intracellular functions. We will discuss the implication of these findings to mechanism bi-directional motility and physiological roles in mitosis.

Keywords: mitosis, cancer, kinesin, microtubules, biochemistry, biophysics

Procedia PDF Downloads 65

24238 Efficient Positioning of Data Aggregation Point for Wireless Sensor Network

Authors: Sifat Rahman Ahona, Rifat Tasnim, Naima Hassan

Abstract:

Data aggregation is a helpful technique for reducing the data communication overhead in wireless sensor network. One of the important tasks of data aggregation is positioning of the aggregator points. There are a lot of works done on data aggregation. But, efficient positioning of the aggregators points is not focused so much. In this paper, authors are focusing on the positioning or the placement of the aggregation points in wireless sensor network. Authors proposed an algorithm to select the aggregators positions for a scenario where aggregator nodes are more powerful than sensor nodes.

Keywords: aggregation point, data communication, data aggregation, wireless sensor network

Procedia PDF Downloads 145

24237 Spatial Econometric Approaches for Count Data: An Overview and New Directions

Authors: Paula Simões, Isabel Natário

Abstract:

This paper reviews a number of theoretical aspects for implementing an explicit spatial perspective in econometrics for modelling non-continuous data, in general, and count data, in particular. It provides an overview of the several spatial econometric approaches that are available to model data that are collected with reference to location in space, from the classical spatial econometrics approaches to the recent developments on spatial econometrics to model count data, in a Bayesian hierarchical setting. Considerable attention is paid to the inferential framework, necessary for structural consistent spatial econometric count models, incorporating spatial lag autocorrelation, to the corresponding estimation and testing procedures for different assumptions, to the constrains and implications embedded in the various specifications in the literature. This review combines insights from the classical spatial econometrics literature as well as from hierarchical modeling and analysis of spatial data, in order to look for new possible directions on the processing of count data, in a spatial hierarchical Bayesian econometric context.

Keywords: spatial data analysis, spatial econometrics, Bayesian hierarchical models, count data

Procedia PDF Downloads 578

24236 A NoSQL Based Approach for Real-Time Managing of Robotics's Data

Authors: Gueidi Afef, Gharsellaoui Hamza, Ben Ahmed Samir

Abstract:

This paper deals with the secret of the continual progression data that new data management solutions have been emerged: The NoSQL databases. They crossed several areas like personalization, profile management, big data in real-time, content management, catalog, view of customers, mobile applications, internet of things, digital communication and fraud detection. Nowadays, these database management systems are increasing. These systems store data very well and with the trend of big data, a new challenge’s store demands new structures and methods for managing enterprise data. The new intelligent machine in the e-learning sector, thrives on more data, so smart machines can learn more and faster. The robotics are our use case to focus on our test. The implementation of NoSQL for Robotics wrestle all the data they acquire into usable form because with the ordinary type of robotics; we are facing very big limits to manage and find the exact information in real-time. Our original proposed approach was demonstrated by experimental studies and running example used as a use case.

Keywords: NoSQL databases, database management systems, robotics, big data

Procedia PDF Downloads 336

24235 Integrating Radar Sensors with an Autonomous Vehicle Simulator for an Enhanced Smart Parking Management System

Authors: Mohamed Gazzeh, Bradley Null, Fethi Tlili, Hichem Besbes

Abstract:

The burgeoning global ownership of personal vehicles has posed a significant strain on urban infrastructure, notably parking facilities, leading to traffic congestion and environmental concerns. Effective parking management systems (PMS) are indispensable for optimizing urban traffic flow and reducing emissions. The most commonly deployed systems nowadays rely on computer vision technology. This paper explores the integration of radar sensors and simulation in the context of smart parking management. We concentrate on radar sensors due to their versatility and utility in automotive applications, which extends to PMS. Additionally, radar sensors play a crucial role in driver assistance systems and autonomous vehicle development. However, the resource-intensive nature of radar data collection for algorithm development and testing necessitates innovative solutions. Simulation, particularly the monoDrive simulator, an internal development tool used by NI the Test and Measurement division of Emerson, offers a practical means to overcome this challenge. The primary objectives of this study encompass simulating radar sensors to generate a substantial dataset for algorithm development, testing, and, critically, assessing the transferability of models between simulated and real radar data. We focus on occupancy detection in parking as a practical use case, categorizing each parking space as vacant or occupied. The simulation approach using monoDrive enables algorithm validation and reliability assessment for virtual radar sensors. It meticulously designed various parking scenarios, involving manual measurements of parking spot coordinates, orientations, and the utilization of TI AWR1843 radar. To create a diverse dataset, we generated 4950 scenarios, comprising a total of 455,400 parking spots. This extensive dataset encompasses radar configuration details, ground truth occupancy information, radar detections, and associated object attributes such as range, azimuth, elevation, radar cross-section, and velocity data. The paper also addresses the intricacies and challenges of real-world radar data collection, highlighting the advantages of simulation in producing radar data for parking lot applications. We developed classification models based on Support Vector Machines (SVM) and Density-Based Spatial Clustering of Applications with Noise (DBSCAN), exclusively trained and evaluated on simulated data. Subsequently, we applied these models to real-world data, comparing their performance against the monoDrive dataset. The study demonstrates the feasibility of transferring models from a simulated environment to real-world applications, achieving an impressive accuracy score of 92% using only one radar sensor. This finding underscores the potential of radar sensors and simulation in the development of smart parking management systems, offering significant benefits for improving urban mobility and reducing environmental impact. The integration of radar sensors and simulation represents a promising avenue for enhancing smart parking management systems, addressing the challenges posed by the exponential growth in personal vehicle ownership. This research contributes valuable insights into the practicality of using simulated radar data in real-world applications and underscores the role of radar technology in advancing urban sustainability.

Keywords: autonomous vehicle simulator, FMCW radar sensors, occupancy detection, smart parking management, transferability of models

Procedia PDF Downloads 66

24234 Quantified Metabolomics for the Determination of Phenotypes and Biomarkers across Species in Health and Disease

Authors: Miroslava Cuperlovic-Culf, Lipu Wang, Ketty Boyle, Nadine Makley, Ian Burton, Anissa Belkaid, Mohamed Touaibia, Marc E. Surrette

Abstract:

Metabolic changes are one of the major factors in the development of a variety of diseases in various species. Metabolism of agricultural plants is altered the following infection with pathogens sometimes contributing to resistance. At the same time, pathogens use metabolites for infection and progression. In humans, metabolism is a hallmark of cancer development for example. Quantified metabolomics data combined with other omics or clinical data and analyzed using various unsupervised and supervised methods can lead to better diagnosis and prognosis. It can also provide information about resistance as well as contribute knowledge of compounds significant for disease progression or prevention. In this work, different methods for metabolomics quantification and analysis from Nuclear Magnetic Resonance (NMR) measurements that are used for investigation of disease development in wheat and human cells will be presented. One-dimensional 1H NMR spectra are used extensively for metabolic profiling due to their high reliability, wide range of applicability, speed, trivial sample preparation and low cost. This presentation will describe a new method for metabolite quantification from NMR data that combines alignment of spectra of standards to sample spectra followed by multivariate linear regression optimization of spectra of assigned metabolites to samples’ spectra. Several different alignment methods were tested and multivariate linear regression result has been compared with other quantification methods. Quantified metabolomics data can be analyzed in the variety of ways and we will present different clustering methods used for phenotype determination, network analysis providing knowledge about the relationships between metabolites through metabolic network as well as biomarker selection providing novel markers. These analysis methods have been utilized for the investigation of fusarium head blight resistance in wheat cultivars as well as analysis of the effect of estrogen receptor and carbonic anhydrase activation and inhibition on breast cancer cell metabolism. Metabolic changes in spikelet’s of wheat cultivars FL62R1, Stettler, MuchMore and Sumai3 following fusarium graminearum infection were explored. Extensive 1D 1H and 2D NMR measurements provided information for detailed metabolite assignment and quantification leading to possible metabolic markers discriminating resistance level in wheat subtypes. Quantification data is compared to results obtained using other published methods. Fusarium infection induced metabolic changes in different wheat varieties are discussed in the context of metabolic network and resistance. Quantitative metabolomics has been used for the investigation of the effect of targeted enzyme inhibition in cancer. In this work, the effect of 17 β -estradiol and ferulic acid on metabolism of ER+ breast cancer cells has been compared to their effect on ER- control cells. The effect of the inhibitors of carbonic anhydrase on the observed metabolic changes resulting from ER activation has also been determined. Metabolic profiles were studied using 1D and 2D metabolomic NMR experiments, combined with the identification and quantification of metabolites, and the annotation of the results is provided in the context of biochemical pathways.

Keywords: metabolic biomarkers, metabolic network, metabolomics, multivariate linear regression, NMR quantification, quantified metabolomics, spectral alignment

Procedia PDF Downloads 329

24233 Modeling Activity Pattern Using XGBoost for Mining Smart Card Data

Authors: Eui-Jin Kim, Hasik Lee, Su-Jin Park, Dong-Kyu Kim

Abstract:

Smart-card data are expected to provide information on activity pattern as an alternative to conventional person trip surveys. The focus of this study is to propose a method for training the person trip surveys to supplement the smart-card data that does not contain the purpose of each trip. We selected only available features from smart card data such as spatiotemporal information on the trip and geographic information system (GIS) data near the stations to train the survey data. XGboost, which is state-of-the-art tree-based ensemble classifier, was used to train data from multiple sources. This classifier uses a more regularized model formalization to control the over-fitting and show very fast execution time with well-performance. The validation results showed that proposed method efficiently estimated the trip purpose. GIS data of station and duration of stay at the destination were significant features in modeling trip purpose.

Keywords: activity pattern, data fusion, smart-card, XGboost

Procedia PDF Downloads 230

24232 A Mutually Exclusive Task Generation Method Based on Data Augmentation

Authors: Haojie Wang, Xun Li, Rui Yin

Abstract:

In order to solve the memorization overfitting in the model-agnostic meta-learning MAML algorithm, a method of generating mutually exclusive tasks based on data augmentation is proposed. This method generates a mutex task by corresponding one feature of the data to multiple labels so that the generated mutex task is inconsistent with the data distribution in the initial dataset. Because generating mutex tasks for all data will produce a large number of invalid data and, in the worst case, lead to an exponential growth of computation, this paper also proposes a key data extraction method that only extract part of the data to generate the mutex task. The experiments show that the method of generating mutually exclusive tasks can effectively solve the memorization overfitting in the meta-learning MAML algorithm.

Keywords: mutex task generation, data augmentation, meta-learning, text classification.

Procedia PDF Downloads 123

24231 Revolutionizing Traditional Farming Using Big Data/Cloud Computing: A Review on Vertical Farming

Authors: Milind Chaudhari, Suhail Balasinor

Abstract:

Due to massive deforestation and an ever-increasing population, the organic content of the soil is depleting at a much faster rate. Due to this, there is a big chance that the entire food production in the world will drop by 40% in the next two decades. Vertical farming can help in aiding food production by leveraging big data and cloud computing to ensure plants are grown naturally by providing the optimum nutrients sunlight by analyzing millions of data points. This paper outlines the most important parameters in vertical farming and how a combination of big data and AI helps in calculating and analyzing these millions of data points. Finally, the paper outlines how different organizations are controlling the indoor environment by leveraging big data in enhancing food quantity and quality.

Keywords: big data, IoT, vertical farming, indoor farming

Procedia PDF Downloads 162