Search results for: image and telemetric data
25592 A Construct to Perform in Situ Deformation Measurement of Material Extrusion-Fabricated Structures
Authors: Daniel Nelson, Valeria La Saponara
Abstract:
Material extrusion is an additive manufacturing modality that continues to show great promise in the ability to create low-cost, highly intricate, and exceedingly useful structural elements. As more capable and versatile filament materials are devised, and the resolution of manufacturing systems continues to increase, the need to understand and predict manufacturing-induced warping will gain ever greater importance. The following study presents an in situ remote sensing and data analysis construct that allows for the in situ mapping and quantification of surface displacements induced by residual stresses on a specified test structure. This proof-of-concept experimental process shows that it is possible to provide designers and manufacturers with insight into the manufacturing parameters that lead to the manifestation of these deformations and a greater understanding of the behavior of these warping events over the course of the manufacturing process.Keywords: additive manufacturing, deformation, digital image correlation, fused filament fabrication, residual stress, warping
Procedia PDF Downloads 9225591 Data Access, AI Intensity, and Scale Advantages
Authors: Chuping Lo
Abstract:
This paper presents a simple model demonstrating that ceteris paribus countries with lower barriers to accessing global data tend to earn higher incomes than other countries. Therefore, large countries that inherently have greater data resources tend to have higher incomes than smaller countries, such that the former may be more hesitant than the latter to liberalize cross-border data flows to maintain this advantage. Furthermore, countries with higher artificial intelligence (AI) intensity in production technologies tend to benefit more from economies of scale in data aggregation, leading to higher income and more trade as they are better able to utilize global data.Keywords: digital intensity, digital divide, international trade, scale of economics
Procedia PDF Downloads 7025590 Identity Verification Using k-NN Classifiers and Autistic Genetic Data
Authors: Fuad M. Alkoot
Abstract:
DNA data have been used in forensics for decades. However, current research looks at using the DNA as a biometric identity verification modality. The goal is to improve the speed of identification. We aim at using gene data that was initially used for autism detection to find if and how accurate is this data for identification applications. Mainly our goal is to find if our data preprocessing technique yields data useful as a biometric identification tool. We experiment with using the nearest neighbor classifier to identify subjects. Results show that optimal classification rate is achieved when the test set is corrupted by normally distributed noise with zero mean and standard deviation of 1. The classification rate is close to optimal at higher noise standard deviation reaching 3. This shows that the data can be used for identity verification with high accuracy using a simple classifier such as the k-nearest neighbor (k-NN).Keywords: biometrics, genetic data, identity verification, k nearest neighbor
Procedia PDF Downloads 25925589 Population Dynamics and Land Use/Land Cover Change on the Chilalo-Galama Mountain Range, Ethiopia
Authors: Yusuf Jundi Sado
Abstract:
Changes in land use are mostly credited to human actions that result in negative impacts on biodiversity and ecosystem functions. This study aims to analyze the dynamics of land use and land cover changes for sustainable natural resources planning and management. Chilalo-Galama Mountain Range, Ethiopia. This study used Thematic Mapper 05 (TM) for 1986, 2001 and Landsat 8 (OLI) data 2017. Additionally, data from the Central Statistics Agency on human population growth were analyzed. Semi-Automatic classification plugin (SCP) in QGIS 3.2.3 software was used for image classification. Global positioning system, field observations and focus group discussions were used for ground verification. Land Use Land Cover (LU/LC) change analysis was using maximum likelihood supervised classification and changes were calculated for the 1986–2001 and the 2001–2017 and 1986-2017 periods. The results show that agricultural land increased from 27.85% (1986) to 44.43% and 51.32% in 2001 and 2017, respectively with the overall accuracies of 92% (1986), 90.36% (2001), and 88% (2017). On the other hand, forests decreased from 8.51% (1986) to 7.64 (2001) and 4.46% (2017), and grassland decreased from 37.47% (1986) to 15.22%, and 15.01% in 2001 and 2017, respectively. It indicates for the years 1986–2017 the largest area cover gain of agricultural land was obtained from grassland. The matrix also shows that shrubland gained land from agricultural land, afro-alpine, and forest land. Population dynamics is found to be one of the major driving forces for the LU/LU changes in the study area.Keywords: Landsat, LU/LC change, Semi-Automatic classification plugin, population dynamics, Ethiopia
Procedia PDF Downloads 8925588 A Review on Intelligent Systems for Geoscience
Authors: R Palson Kennedy, P.Kiran Sai
Abstract:
This article introduces machine learning (ML) researchers to the hurdles that geoscience problems present, as well as the opportunities for improvement in both ML and geosciences. This article presents a review from the data life cycle perspective to meet that need. Numerous facets of geosciences present unique difficulties for the study of intelligent systems. Geosciences data is notoriously difficult to analyze since it is frequently unpredictable, intermittent, sparse, multi-resolution, and multi-scale. The first half addresses data science’s essential concepts and theoretical underpinnings, while the second section contains key themes and sharing experiences from current publications focused on each stage of the data life cycle. Finally, themes such as open science, smart data, and team science are considered.Keywords: Data science, intelligent system, machine learning, big data, data life cycle, recent development, geo science
Procedia PDF Downloads 13825587 Integrative Omics-Portrayal Disentangles Molecular Heterogeneity and Progression Mechanisms of Cancer
Authors: Binder Hans
Abstract:
Cancer is no longer seen as solely a genetic disease where genetic defects such as mutations and copy number variations affect gene regulation and eventually lead to aberrant cell functioning which can be monitored by transcriptome analysis. It has become obvious that epigenetic alterations represent a further important layer of (de-)regulation of gene activity. For example, aberrant DNA methylation is a hallmark of many cancer types, and methylation patterns were successfully used to subtype cancer heterogeneity. Hence, unraveling the interplay between different omics levels such as genome, transcriptome and epigenome is inevitable for a mechanistic understanding of molecular deregulation causing complex diseases such as cancer. This objective requires powerful downstream integrative bioinformatics methods as an essential prerequisite to discover the whole genome mutational, transcriptome and epigenome landscapes of cancer specimen and to discover cancer genesis, progression and heterogeneity. Basic challenges and tasks arise ‘beyond sequencing’ because of the big size of the data, their complexity, the need to search for hidden structures in the data, for knowledge mining to discover biological function and also systems biology conceptual models to deduce developmental interrelations between different cancer states. These tasks are tightly related to cancer biology as an (epi-)genetic disease giving rise to aberrant genomic regulation under micro-environmental control and clonal evolution which leads to heterogeneous cellular states. Machine learning algorithms such as self organizing maps (SOM) represent one interesting option to tackle these bioinformatics tasks. The SOMmethod enables recognizing complex patterns in large-scale data generated by highthroughput omics technologies. It portrays molecular phenotypes by generating individualized, easy to interpret images of the data landscape in combination with comprehensive analysis options. Our image-based, reductionist machine learning methods provide one interesting perspective how to deal with massive data in the discovery of complex diseases, gliomas, melanomas and colon cancer on molecular level. As an important new challenge, we address the combined portrayal of different omics data such as genome-wide genomic, transcriptomic and methylomic ones. The integrative-omics portrayal approach is based on the joint training of the data and it provides separate personalized data portraits for each patient and data type which can be analyzed by visual inspection as one option. The new method enables an integrative genome-wide view on the omics data types and the underlying regulatory modes. It is applied to high and low-grade gliomas and to melanomas where it disentangles transversal and longitudinal molecular heterogeneity in terms of distinct molecular subtypes and progression paths with prognostic impact.Keywords: integrative bioinformatics, machine learning, molecular mechanisms of cancer, gliomas and melanomas
Procedia PDF Downloads 14925586 Shark Detection and Classification with Deep Learning
Authors: Jeremy Jenrette, Z. Y. C. Liu, Pranav Chimote, Edward Fox, Trevor Hastie, Francesco Ferretti
Abstract:
Suitable shark conservation depends on well-informed population assessments. Direct methods such as scientific surveys and fisheries monitoring are adequate for defining population statuses, but species-specific indices of abundance and distribution coming from these sources are rare for most shark species. We can rapidly fill these information gaps by boosting media-based remote monitoring efforts with machine learning and automation. We created a database of shark images by sourcing 24,546 images covering 219 species of sharks from the web application spark pulse and the social network Instagram. We used object detection to extract shark features and inflate this database to 53,345 images. We packaged object-detection and image classification models into a Shark Detector bundle. We developed the Shark Detector to recognize and classify sharks from videos and images using transfer learning and convolutional neural networks (CNNs). We applied these models to common data-generation approaches of sharks: boosting training datasets, processing baited remote camera footage and online videos, and data-mining Instagram. We examined the accuracy of each model and tested genus and species prediction correctness as a result of training data quantity. The Shark Detector located sharks in baited remote footage and YouTube videos with an average accuracy of 89\%, and classified located subjects to the species level with 69\% accuracy (n =\ eight species). The Shark Detector sorted heterogeneous datasets of images sourced from Instagram with 91\% accuracy and classified species with 70\% accuracy (n =\ 17 species). Data-mining Instagram can inflate training datasets and increase the Shark Detector’s accuracy as well as facilitate archiving of historical and novel shark observations. Base accuracy of genus prediction was 68\% across 25 genera. The average base accuracy of species prediction within each genus class was 85\%. The Shark Detector can classify 45 species. All data-generation methods were processed without manual interaction. As media-based remote monitoring strives to dominate methods for observing sharks in nature, we developed an open-source Shark Detector to facilitate common identification applications. Prediction accuracy of the software pipeline increases as more images are added to the training dataset. We provide public access to the software on our GitHub page.Keywords: classification, data mining, Instagram, remote monitoring, sharks
Procedia PDF Downloads 12325585 An Investigation of the University Council’s Image: A Case of Suan Sunandha Rajabhat University
Authors: Phitsanu Phunphetchphan
Abstract:
The purposes of this research was to investigate opinions of Rajabhat University staff towards performance of the university council committee by focusing on (1) personal characteristics of the committees; (2) duties designated by the university council; and (3) relationship between university council and staff. The population of this study included all high level of management from Suan Sunandha Rajabhat University which made a total of 200 respondents. Data analysis included frequency, percentage, mean and standard deviation. The findings revealed that the majority of staff rated the performance of university council at a high level. The 'overall appropriate qualification of the university council' was rated as the highest score while 'good governance' was rated as the lowest mean score. Moreover, the findings also revealed that the relationship between university council’s members and the staff was rated at a high level while 'the integrity of policy implementation' was rated as the lowest score.Keywords: investigation, performance, university council, management
Procedia PDF Downloads 25125584 Optical-Based Lane-Assist System for Rowing Boats
Authors: Stephen Tullis, M. David DiDonato, Hong Sung Park
Abstract:
Rowing boats (shells) are often steered by a small rudder operated by one of the backward-facing rowers; the attention required of that athlete then slightly decreases the power that that athlete can provide. Reducing the steering distraction would then increase the overall boat speed. Races are straight 2000 m courses with each boat in a 13.5 m wide lane marked by small (~15 cm) widely-spaced (~10 m) buoys, and the boat trajectory is affected by both cross-currents and winds. An optical buoy recognition and tracking system has been developed that provides the boat’s location and orientation with respect to the lane edges. This information is provided to the steering athlete as either: a simple overlay on a video display, or fed to a simplified autopilot system giving steering directions to the athlete or directly controlling the rudder. The system is then effectively a “lane-assist” device but with small, widely-spaced lane markers viewed from a very shallow angle due to constraints on camera height. The image is captured with a lightweight 1080p webcam, and most of the image analysis is done in OpenCV. The colour RGB-image is converted to a grayscale using the difference of the red and blue channels, which provides good contrast between the red/yellow buoys and the water, sky, land background and white reflections and noise. Buoy detection is done with thresholding within a tight mask applied to the image. Robust linear regression using Tukey’s biweight estimator of the previously detected buoy locations is used to develop the mask; this avoids the false detection of noise such as waves (reflections) and, in particular, buoys in other lanes. The robust regression also provides the current lane edges in the camera frame that are used to calculate the displacement of the boat from the lane centre (lane location), and its yaw angle. The interception of the detected lane edges provides a lane vanishing point, and yaw angle can be calculated simply based on the displacement of this vanishing point from the camera axis and the image plane distance. Lane location is simply based on the lateral displacement of the vanishing point from any horizontal cut through the lane edges. The boat lane position and yaw are currently fed what is essentially a stripped down marine auto-pilot system. Currently, only the lane location is used in a PID controller of a rudder actuator with integrator anti-windup to deal with saturation of the rudder angle. Low Kp and Kd values decrease unnecessarily fast return to lane centrelines and response to noise, and limiters can be used to avoid lane departure and disqualification. Yaw is not used as a control input, as cross-winds and currents can cause a straight course with considerable yaw or crab angle. Mapping of the controller with rudder angle “overall effectiveness” has not been finalized - very large rudder angles stall and have decreased turning moments, but at less extreme angles the increased rudder drag slows the boat and upsets boat balance. The full system has many features similar to automotive lane-assist systems, but with the added constraints of the lane markers, camera positioning, control response and noise increasing the challenge.Keywords: auto-pilot, lane-assist, marine, optical, rowing
Procedia PDF Downloads 13425583 Segmentation of Liver Using Random Forest Classifier
Authors: Gajendra Kumar Mourya, Dinesh Bhatia, Akash Handique, Sunita Warjri, Syed Achaab Amir
Abstract:
Nowadays, Medical imaging has become an integral part of modern healthcare. Abdominal CT images are an invaluable mean for abdominal organ investigation and have been widely studied in the recent years. Diagnosis of liver pathologies is one of the major areas of current interests in the field of medical image processing and is still an open problem. To deeply study and diagnose the liver, segmentation of liver is done to identify which part of the liver is mostly affected. Manual segmentation of the liver in CT images is time-consuming and suffers from inter- and intra-observer differences. However, automatic or semi-automatic computer aided segmentation of the Liver is a challenging task due to inter-patient Liver shape and size variability. In this paper, we present a technique for automatic segmenting the liver from CT images using Random Forest Classifier. Random forests or random decision forests are an ensemble learning method for classification that operate by constructing a multitude of decision trees at training time and outputting the class that is the mode of the classes of the individual trees. After comparing with various other techniques, it was found that Random Forest Classifier provide a better segmentation results with respect to accuracy and speed. We have done the validation of our results using various techniques and it shows above 89% accuracy in all the cases.Keywords: CT images, image validation, random forest, segmentation
Procedia PDF Downloads 31825582 Voxel Models as Input for Heat Transfer Simulations with Siemens NX Based on X-Ray Microtomography Images of Random Fibre Reinforced Composites
Authors: Steven Latré, Frederik Desplentere, Ilya Straumit, Stepan V. Lomov
Abstract:
A method is proposed in order to create a three-dimensional finite element model representing fibre reinforced insulation materials for the simulation software Siemens NX. VoxTex software, a tool for quantification of µCT images of fibrous materials, is used for the transformation of microtomography images of random fibre reinforced composites into finite element models. An automatic tool was developed to execute the import of the models to the thermal solver module of Siemens NX. The paper describes the numerical tools used for the image quantification and the transformation and illustrates them on several thermal simulations of fibre reinforced insulation blankets filled with low thermal conductive fillers. The calculation of thermal conductivity is validated by comparison with the experimental data.Keywords: analysis, modelling, thermal, voxel
Procedia PDF Downloads 28825581 Introduction of Integrated Image Deep Learning Solution and How It Brought Laboratorial Level Heart Rate and Blood Oxygen Results to Everyone
Authors: Zhuang Hou, Xiaolei Cao
Abstract:
The general public and medical professionals recognized the importance of accurately measuring and storing blood oxygen levels and heart rate during the COVID-19 pandemic. The demand for accurate contactless devices was motivated by the need for cross-infection reduction and the shortage of conventional oximeters, partially due to the global supply chain issue. This paper evaluated a contactless mini program HealthyPai’s heart rate (HR) and oxygen saturation (SpO2) measurements compared with other wearable devices. In the HR study of 185 samples (81 in the laboratory environment, 104 in the real-life environment), the mean absolute error (MAE) ± standard deviation was 1.4827 ± 1.7452 in the lab, 6.9231 ± 5.6426 in the real-life setting. In the SpO2 study of 24 samples, the MAE ± standard deviation of the measurement was 1.0375 ± 0.7745. Our results validated that HealthyPai utilizing the Integrated Image Deep Learning Solution (IIDLS) framework, can accurately measure HR and SpO2, providing the test quality at least comparable to other FDA-approved wearable devices in the market and surpassing the consumer-grade and research-grade wearable standards.Keywords: remote photoplethysmography, heart rate, oxygen saturation, contactless measurement, mini program
Procedia PDF Downloads 13825580 Data Quality as a Pillar of Data-Driven Organizations: Exploring the Benefits of Data Mesh
Authors: Marc Bachelet, Abhijit Kumar Chatterjee, José Manuel Avila
Abstract:
Data quality is a key component of any data-driven organization. Without data quality, organizations cannot effectively make data-driven decisions, which often leads to poor business performance. Therefore, it is important for an organization to ensure that the data they use is of high quality. This is where the concept of data mesh comes in. Data mesh is an organizational and architectural decentralized approach to data management that can help organizations improve the quality of data. The concept of data mesh was first introduced in 2020. Its purpose is to decentralize data ownership, making it easier for domain experts to manage the data. This can help organizations improve data quality by reducing the reliance on centralized data teams and allowing domain experts to take charge of their data. This paper intends to discuss how a set of elements, including data mesh, are tools capable of increasing data quality. One of the key benefits of data mesh is improved metadata management. In a traditional data architecture, metadata management is typically centralized, which can lead to data silos and poor data quality. With data mesh, metadata is managed in a decentralized manner, ensuring accurate and up-to-date metadata, thereby improving data quality. Another benefit of data mesh is the clarification of roles and responsibilities. In a traditional data architecture, data teams are responsible for managing all aspects of data, which can lead to confusion and ambiguity in responsibilities. With data mesh, domain experts are responsible for managing their own data, which can help provide clarity in roles and responsibilities and improve data quality. Additionally, data mesh can also contribute to a new form of organization that is more agile and adaptable. By decentralizing data ownership, organizations can respond more quickly to changes in their business environment, which in turn can help improve overall performance by allowing better insights into business as an effect of better reports and visualization tools. Monitoring and analytics are also important aspects of data quality. With data mesh, monitoring, and analytics are decentralized, allowing domain experts to monitor and analyze their own data. This will help in identifying and addressing data quality problems in quick time, leading to improved data quality. Data culture is another major aspect of data quality. With data mesh, domain experts are encouraged to take ownership of their data, which can help create a data-driven culture within the organization. This can lead to improved data quality and better business outcomes. Finally, the paper explores the contribution of AI in the coming years. AI can help enhance data quality by automating many data-related tasks, like data cleaning and data validation. By integrating AI into data mesh, organizations can further enhance the quality of their data. The concepts mentioned above are illustrated by AEKIDEN experience feedback. AEKIDEN is an international data-driven consultancy that has successfully implemented a data mesh approach. By sharing their experience, AEKIDEN can help other organizations understand the benefits and challenges of implementing data mesh and improving data quality.Keywords: data culture, data-driven organization, data mesh, data quality for business success
Procedia PDF Downloads 13925579 Architectures and Implementations of Data Spaces: A Comparative Study of Gaia-X and Eclipse Data Space Components Frameworks
Authors: Ryan Kelvin Ford
Abstract:
For individuals and organizations, significant potential benefits were assured by sharing the data in a secure, trusted, and standardized environment. Technical trust and standards help each participant to use data space securely to share and access data. Sharing data in a safe environment helps acquire new business opportunities. Data sovereignty, interoperability, and trust were considered key factors to evaluate data spaces. Businesses and policymakers assure a fair data economy by integrating data space in organizations. A collaborative environment was needed to facilitate data sharing among organizations, satisfied with the implementation of different architectures using data spaces such as Eclipse Data Space Components (EDC), International Data Space Association (IDSA), Gaia-X, and Gaia-X Federation Services (GXFS). The last 15 years of application were reviewed and compared based on the architectures and implementations of different data spaces such as IDSA, EDC, Gaia-X and GXFS, EDC framework, IDSA, GXFS, data connector, data space architecture, characteristics of data space connectors, federated data spaces initiatives, data spaces overview, eclipse data space connector, designing data spaces, building data spaces based on technical overview, European future digital ecosystem based on Gaia-Vision and strategy of Gaia-Architecture. Empirical research based on an organized view was conducted. The current discussion elaborates on the systematic review of the impact of data space technology from various perspectives. The systematic review uses multiple databases such as IEEE Explore, Taylor & Francis, Science Direct, and Google Scholar to pursue publications on the impact of Data space from January 2019 to December 2024. The search results showcased a comparative review of 150 articles, out of which 20 were related to the IDSA, Gaia‑X, and EDC architecture and implementation.Keywords: IDSA, Gaia-X, Gaia-X architecture, EDC, EDC architecture, GXFS architecture, IDSA, data space connector
Procedia PDF Downloads 225578 Best-Performing Color Space for Land-Sea Segmentation Using Wavelet Transform Color-Texture Features and Fusion of over Segmentation
Authors: Seynabou Toure, Oumar Diop, Kidiyo Kpalma, Amadou S. Maiga
Abstract:
Color and texture are the two most determinant elements for perception and recognition of the objects in an image. For this reason, color and texture analysis find a large field of application, for example in image classification and segmentation. But, the pioneering work in texture analysis was conducted on grayscale images, thus discarding color information. Many grey-level texture descriptors have been proposed and successfully used in numerous domains for image classification: face recognition, industrial inspections, food science medical imaging among others. Taking into account color in the definition of these descriptors makes it possible to better characterize images. Color texture is thus the subject of recent work, and the analysis of color texture images is increasingly attracting interest in the scientific community. In optical remote sensing systems, sensors measure separately different parts of the electromagnetic spectrum; the visible ones and even those that are invisible to the human eye. The amounts of light reflected by the earth in spectral bands are then transformed into grayscale images. The primary natural colors Red (R) Green (G) and Blue (B) are then used in mixtures of different spectral bands in order to produce RGB images. Thus, good color texture discrimination can be achieved using RGB under controlled illumination conditions. Some previous works investigate the effect of using different color space for color texture classification. However, the selection of the best performing color space in land-sea segmentation is an open question. Its resolution may bring considerable improvements in certain applications like coastline detection, where the detection result is strongly dependent on the performance of the land-sea segmentation. The aim of this paper is to present the results of a study conducted on different color spaces in order to show the best-performing color space for land-sea segmentation. In this sense, an experimental analysis is carried out using five different color spaces (RGB, XYZ, Lab, HSV, YCbCr). For each color space, the Haar wavelet decomposition is used to extract different color texture features. These color texture features are then used for Fusion of Over Segmentation (FOOS) based classification; this allows segmentation of the land part from the sea one. By analyzing the different results of this study, the HSV color space is found as the best classification performance while using color and texture features; which is perfectly coherent with the results presented in the literature.Keywords: classification, coastline, color, sea-land segmentation
Procedia PDF Downloads 25225577 Big Data Analysis with RHadoop
Authors: Ji Eun Shin, Byung Ho Jung, Dong Hoon Lim
Abstract:
It is almost impossible to store or analyze big data increasing exponentially with traditional technologies. Hadoop is a new technology to make that possible. R programming language is by far the most popular statistical tool for big data analysis based on distributed processing with Hadoop technology. With RHadoop that integrates R and Hadoop environment, we implemented parallel multiple regression analysis with different sizes of actual data. Experimental results showed our RHadoop system was much faster as the number of data nodes increases. We also compared the performance of our RHadoop with lm function and big lm packages available on big memory. The results showed that our RHadoop was faster than other packages owing to paralleling processing with increasing the number of map tasks as the size of data increases.Keywords: big data, Hadoop, parallel regression analysis, R, RHadoop
Procedia PDF Downloads 43925576 Predictive Analysis of Chest X-rays Using NLP and Large Language Models with the Indiana University Dataset and Random Forest Classifier
Authors: Azita Ramezani, Ghazal Mashhadiagha, Bahareh Sanabakhsh
Abstract:
This study researches the combination of Random. Forest classifiers with large language models (LLMs) and natural language processing (NLP) to improve diagnostic accuracy in chest X-ray analysis using the Indiana University dataset. Utilizing advanced NLP techniques, the research preprocesses textual data from radiological reports to extract key features, which are then merged with image-derived data. This improved dataset is analyzed with Random Forest classifiers to predict specific clinical results, focusing on the identification of health issues and the estimation of case urgency. The findings reveal that the combination of NLP, LLMs, and machine learning not only increases diagnostic precision but also reliability, especially in quickly identifying critical conditions. Achieving an accuracy of 99.35%, the model shows significant advancements over conventional diagnostic techniques. The results emphasize the large potential of machine learning in medical imaging, suggesting that these technologies could greatly enhance clinician judgment and patient outcomes by offering quicker and more precise diagnostic approximations.Keywords: natural language processing (NLP), large language models (LLMs), random forest classifier, chest x-ray analysis, medical imaging, diagnostic accuracy, indiana university dataset, machine learning in healthcare, predictive modeling, clinical decision support systems
Procedia PDF Downloads 4925575 A Mutually Exclusive Task Generation Method Based on Data Augmentation
Authors: Haojie Wang, Xun Li, Rui Yin
Abstract:
In order to solve the memorization overfitting in the meta-learning MAML algorithm, a method of generating mutually exclusive tasks based on data augmentation is proposed. This method generates a mutex task by corresponding one feature of the data to multiple labels, so that the generated mutex task is inconsistent with the data distribution in the initial dataset. Because generating mutex tasks for all data will produce a large number of invalid data and, in the worst case, lead to exponential growth of computation, this paper also proposes a key data extraction method, that only extracts part of the data to generate the mutex task. The experiments show that the method of generating mutually exclusive tasks can effectively solve the memorization overfitting in the meta-learning MAML algorithm.Keywords: data augmentation, mutex task generation, meta-learning, text classification.
Procedia PDF Downloads 9725574 Efficient Positioning of Data Aggregation Point for Wireless Sensor Network
Authors: Sifat Rahman Ahona, Rifat Tasnim, Naima Hassan
Abstract:
Data aggregation is a helpful technique for reducing the data communication overhead in wireless sensor network. One of the important tasks of data aggregation is positioning of the aggregator points. There are a lot of works done on data aggregation. But, efficient positioning of the aggregators points is not focused so much. In this paper, authors are focusing on the positioning or the placement of the aggregation points in wireless sensor network. Authors proposed an algorithm to select the aggregators positions for a scenario where aggregator nodes are more powerful than sensor nodes.Keywords: aggregation point, data communication, data aggregation, wireless sensor network
Procedia PDF Downloads 16325573 Urban Heat Island Intensity Assessment through Comparative Study on Land Surface Temperature and Normalized Difference Vegetation Index: A Case Study of Chittagong, Bangladesh
Authors: Tausif A. Ishtiaque, Zarrin T. Tasin, Kazi S. Akter
Abstract:
Current trend of urban expansion, especially in the developing countries has caused significant changes in land cover, which is generating great concern due to its widespread environmental degradation. Energy consumption of the cities is also increasing with the aggravated heat island effect. Distribution of land surface temperature (LST) is one of the most significant climatic parameters affected by urban land cover change. Recent increasing trend of LST is causing elevated temperature profile of the built up area with less vegetative cover. Gradual change in land cover, especially decrease in vegetative cover is enhancing the Urban Heat Island (UHI) effect in the developing cities around the world. Increase in the amount of urban vegetation cover can be a useful solution for the reduction of UHI intensity. LST and Normalized Difference Vegetation Index (NDVI) have widely been accepted as reliable indicators of UHI and vegetation abundance respectively. Chittagong, the second largest city of Bangladesh, has been a growth center due to rapid urbanization over the last several decades. This study assesses the intensity of UHI in Chittagong city by analyzing the relationship between LST and NDVI based on the type of land use/land cover (LULC) in the study area applying an integrated approach of Geographic Information System (GIS), remote sensing (RS), and regression analysis. Land cover map is prepared through an interactive supervised classification using remotely sensed data from Landsat ETM+ image along with NDVI differencing using ArcGIS. LST and NDVI values are extracted from the same image. The regression analysis between LST and NDVI indicates that within the study area, UHI is directly correlated with LST while negatively correlated with NDVI. It interprets that surface temperature reduces with increase in vegetation cover along with reduction in UHI intensity. Moreover, there are noticeable differences in the relationship between LST and NDVI based on the type of LULC. In other words, depending on the type of land usage, increase in vegetation cover has a varying impact on the UHI intensity. This analysis will contribute to the formulation of sustainable urban land use planning decisions as well as suggesting suitable actions for mitigation of UHI intensity within the study area.Keywords: land cover change, land surface temperature, normalized difference vegetation index, urban heat island
Procedia PDF Downloads 27525572 Spatial Econometric Approaches for Count Data: An Overview and New Directions
Authors: Paula Simões, Isabel Natário
Abstract:
This paper reviews a number of theoretical aspects for implementing an explicit spatial perspective in econometrics for modelling non-continuous data, in general, and count data, in particular. It provides an overview of the several spatial econometric approaches that are available to model data that are collected with reference to location in space, from the classical spatial econometrics approaches to the recent developments on spatial econometrics to model count data, in a Bayesian hierarchical setting. Considerable attention is paid to the inferential framework, necessary for structural consistent spatial econometric count models, incorporating spatial lag autocorrelation, to the corresponding estimation and testing procedures for different assumptions, to the constrains and implications embedded in the various specifications in the literature. This review combines insights from the classical spatial econometrics literature as well as from hierarchical modeling and analysis of spatial data, in order to look for new possible directions on the processing of count data, in a spatial hierarchical Bayesian econometric context.Keywords: spatial data analysis, spatial econometrics, Bayesian hierarchical models, count data
Procedia PDF Downloads 59725571 Increasing the Speed of the Apriori Algorithm by Dimension Reduction
Authors: A. Abyar, R. Khavarzadeh
Abstract:
The most basic and important decision-making tool for industrial and service managers is understanding the market and customer behavior. In this regard, the Apriori algorithm, as one of the well-known machine learning methods, is used to identify customer preferences. On the other hand, with the increasing diversity of goods and services and the speed of changing customer behavior, we are faced with big data. Also, due to the large number of competitors and changing customer behavior, there is an urgent need for continuous analysis of this big data. While the speed of the Apriori algorithm decreases with increasing data volume. In this paper, the big data PCA method is used to reduce the dimension of the data in order to increase the speed of Apriori algorithm. Then, in the simulation section, the results are examined by generating data with different volumes and different diversity. The results show that when using this method, the speed of the a priori algorithm increases significantly.Keywords: association rules, Apriori algorithm, big data, big data PCA, market basket analysis
Procedia PDF Downloads 925570 A NoSQL Based Approach for Real-Time Managing of Robotics's Data
Authors: Gueidi Afef, Gharsellaoui Hamza, Ben Ahmed Samir
Abstract:
This paper deals with the secret of the continual progression data that new data management solutions have been emerged: The NoSQL databases. They crossed several areas like personalization, profile management, big data in real-time, content management, catalog, view of customers, mobile applications, internet of things, digital communication and fraud detection. Nowadays, these database management systems are increasing. These systems store data very well and with the trend of big data, a new challenge’s store demands new structures and methods for managing enterprise data. The new intelligent machine in the e-learning sector, thrives on more data, so smart machines can learn more and faster. The robotics are our use case to focus on our test. The implementation of NoSQL for Robotics wrestle all the data they acquire into usable form because with the ordinary type of robotics; we are facing very big limits to manage and find the exact information in real-time. Our original proposed approach was demonstrated by experimental studies and running example used as a use case.Keywords: NoSQL databases, database management systems, robotics, big data
Procedia PDF Downloads 35725569 Fuzzy Optimization Multi-Objective Clustering Ensemble Model for Multi-Source Data Analysis
Authors: C. B. Le, V. N. Pham
Abstract:
In modern data analysis, multi-source data appears more and more in real applications. Multi-source data clustering has emerged as a important issue in the data mining and machine learning community. Different data sources provide information about different data. Therefore, multi-source data linking is essential to improve clustering performance. However, in practice multi-source data is often heterogeneous, uncertain, and large. This issue is considered a major challenge from multi-source data. Ensemble is a versatile machine learning model in which learning techniques can work in parallel, with big data. Clustering ensemble has been shown to outperform any standard clustering algorithm in terms of accuracy and robustness. However, most of the traditional clustering ensemble approaches are based on single-objective function and single-source data. This paper proposes a new clustering ensemble method for multi-source data analysis. The fuzzy optimized multi-objective clustering ensemble method is called FOMOCE. Firstly, a clustering ensemble mathematical model based on the structure of multi-objective clustering function, multi-source data, and dark knowledge is introduced. Then, rules for extracting dark knowledge from the input data, clustering algorithms, and base clusterings are designed and applied. Finally, a clustering ensemble algorithm is proposed for multi-source data analysis. The experiments were performed on the standard sample data set. The experimental results demonstrate the superior performance of the FOMOCE method compared to the existing clustering ensemble methods and multi-source clustering methods.Keywords: clustering ensemble, multi-source, multi-objective, fuzzy clustering
Procedia PDF Downloads 19325568 Evaluating the Effect of Climate Change and Land Use/Cover Change on Catchment Hydrology of Gumara Watershed, Upper Blue Nile Basin, Ethiopia
Authors: Gashaw Gismu Chakilu
Abstract:
Climate and land cover change are very important issues in terms of global context and their responses to environmental and socio-economic drivers. The dynamic of these two factors is currently affecting the environment in unbalanced way including watershed hydrology. In this paper individual and combined impacts of climate change and land use land cover change on hydrological processes were evaluated through applying the model Soil and Water Assessment Tool (SWAT) in Gumara watershed, Upper Blue Nile basin Ethiopia. The regional climate; temperature and rainfall data of the past 40 years in the study area were prepared and changes were detected by using trend analysis applying Mann-Kendall trend test. The land use land cover data were obtained from land sat image and processed by ERDAS IMAGIN 2010 software. Three land use land cover data; 1973, 1986, and 2013 were prepared and these data were used for base line, model calibration and change study respectively. The effects of these changes on high flow and low flow of the catchment have also been evaluated separately. The high flow of the catchment for these two decades was analyzed by using Annual Maximum (AM) model and the low flow was evaluated by seven day sustained low flow model. Both temperature and rainfall showed increasing trend; and then the extent of changes were evaluated in terms of monthly bases by using two decadal time periods; 1973-1982 was taken as baseline and 2004-2013 was used as change study. The efficiency of the model was determined by Nash-Sutcliffe (NS) and Relative Volume error (RVe) and their values were 0.65 and 0.032 for calibration and 0.62 and 0.0051 for validation respectively. The impact of climate change was higher than that of land use land cover change on stream flow of the catchment; the flow has been increasing by 16.86% and 7.25% due to climate and LULC change respectively, and the combined change effect accounted 22.13% flow increment. The overall results of the study indicated that Climate change is more responsible for high flow than low flow; and reversely the land use land cover change showed more significant effect on low flow than high flow of the catchment. From the result we conclude that the hydrology of the catchment has been altered because of changes of climate and land cover of the study area.Keywords: climate, LULC, SWAT, Ethiopia
Procedia PDF Downloads 38025567 Modeling Activity Pattern Using XGBoost for Mining Smart Card Data
Authors: Eui-Jin Kim, Hasik Lee, Su-Jin Park, Dong-Kyu Kim
Abstract:
Smart-card data are expected to provide information on activity pattern as an alternative to conventional person trip surveys. The focus of this study is to propose a method for training the person trip surveys to supplement the smart-card data that does not contain the purpose of each trip. We selected only available features from smart card data such as spatiotemporal information on the trip and geographic information system (GIS) data near the stations to train the survey data. XGboost, which is state-of-the-art tree-based ensemble classifier, was used to train data from multiple sources. This classifier uses a more regularized model formalization to control the over-fitting and show very fast execution time with well-performance. The validation results showed that proposed method efficiently estimated the trip purpose. GIS data of station and duration of stay at the destination were significant features in modeling trip purpose.Keywords: activity pattern, data fusion, smart-card, XGboost
Procedia PDF Downloads 25125566 Iris Feature Extraction and Recognition Based on Two-Dimensional Gabor Wavelength Transform
Authors: Bamidele Samson Alobalorun, Ifedotun Roseline Idowu
Abstract:
Biometrics technologies apply the human body parts for their unique and reliable identification based on physiological traits. The iris recognition system is a biometric–based method for identification. The human iris has some discriminating characteristics which provide efficiency to the method. In order to achieve this efficiency, there is a need for feature extraction of the distinct features from the human iris in order to generate accurate authentication of persons. In this study, an approach for an iris recognition system using 2D Gabor for feature extraction is applied to iris templates. The 2D Gabor filter formulated the patterns that were used for training and equally sent to the hamming distance matching technique for recognition. A comparison of results is presented using two iris image subjects of different matching indices of 1,2,3,4,5 filter based on the CASIA iris image database. By comparing the two subject results, the actual computational time of the developed models, which is measured in terms of training and average testing time in processing the hamming distance classifier, is found with best recognition accuracy of 96.11% after capturing the iris localization or segmentation using the Daughman’s Integro-differential, the normalization is confined to the Daugman’s rubber sheet model.Keywords: Daugman rubber sheet, feature extraction, Hamming distance, iris recognition system, 2D Gabor wavelet transform
Procedia PDF Downloads 6925565 A Mutually Exclusive Task Generation Method Based on Data Augmentation
Authors: Haojie Wang, Xun Li, Rui Yin
Abstract:
In order to solve the memorization overfitting in the model-agnostic meta-learning MAML algorithm, a method of generating mutually exclusive tasks based on data augmentation is proposed. This method generates a mutex task by corresponding one feature of the data to multiple labels so that the generated mutex task is inconsistent with the data distribution in the initial dataset. Because generating mutex tasks for all data will produce a large number of invalid data and, in the worst case, lead to an exponential growth of computation, this paper also proposes a key data extraction method that only extract part of the data to generate the mutex task. The experiments show that the method of generating mutually exclusive tasks can effectively solve the memorization overfitting in the meta-learning MAML algorithm.Keywords: mutex task generation, data augmentation, meta-learning, text classification.
Procedia PDF Downloads 14825564 Main Chaos-Based Image Encryption Algorithm
Authors: Ibtissem Talbi
Abstract:
During the last decade, a variety of chaos-based cryptosystems have been investigated. Most of them are based on the structure of Fridrich, which is based on the traditional confusion-diffusion architecture proposed by Shannon. Compared with traditional cryptosystems (DES, 3DES, AES, etc.), the chaos-based cryptosystems are more flexible, more modular and easier to be implemented, which make them suitable for large scale-data encyption, such as images and videos. The heart of any chaos-based cryptosystem is the chaotic generator and so, a part of the efficiency (robustness, speed) of the system depends greatly on it. In this talk, we give an overview of the state of the art of chaos-based block ciphers and we describe some of our schemes already proposed. Also we will focus on the essential characteristics of the digital chaotic generator, The needed performance of a chaos-based block cipher in terms of security level and speed of calculus depends on the considered application. There is a compromise between the security and the speed of the calculation. The security of these block block ciphers will be analyzed.Keywords: chaos-based cryptosystems, chaotic generator, security analysis, structure of Fridrich
Procedia PDF Downloads 68825563 Revolutionizing Traditional Farming Using Big Data/Cloud Computing: A Review on Vertical Farming
Authors: Milind Chaudhari, Suhail Balasinor
Abstract:
Due to massive deforestation and an ever-increasing population, the organic content of the soil is depleting at a much faster rate. Due to this, there is a big chance that the entire food production in the world will drop by 40% in the next two decades. Vertical farming can help in aiding food production by leveraging big data and cloud computing to ensure plants are grown naturally by providing the optimum nutrients sunlight by analyzing millions of data points. This paper outlines the most important parameters in vertical farming and how a combination of big data and AI helps in calculating and analyzing these millions of data points. Finally, the paper outlines how different organizations are controlling the indoor environment by leveraging big data in enhancing food quantity and quality.Keywords: big data, IoT, vertical farming, indoor farming
Procedia PDF Downloads 178