Search results for: Big data
24056 A Comparative Analysis of Classification Models with Wrapper-Based Feature Selection for Predicting Student Academic Performance
Authors: Abdullah Al Farwan, Ya Zhang
Abstract:
In today’s educational arena, it is critical to understand educational data and be able to evaluate important aspects, particularly data on student achievement. Educational Data Mining (EDM) is a research area that focusing on uncovering patterns and information in data from educational institutions. Teachers, if they are able to predict their students' class performance, can use this information to improve their teaching abilities. It has evolved into valuable knowledge that can be used for a wide range of objectives; for example, a strategic plan can be used to generate high-quality education. Based on previous data, this paper recommends employing data mining techniques to forecast students' final grades. In this study, five data mining methods, Decision Tree, JRip, Naive Bayes, Multi-layer Perceptron, and Random Forest with wrapper feature selection, were used on two datasets relating to Portuguese language and mathematics classes lessons. The results showed the effectiveness of using data mining learning methodologies in predicting student academic success. The classification accuracy achieved with selected algorithms lies in the range of 80-94%. Among all the selected classification algorithms, the lowest accuracy is achieved by the Multi-layer Perceptron algorithm, which is close to 70.45%, and the highest accuracy is achieved by the Random Forest algorithm, which is close to 94.10%. This proposed work can assist educational administrators to identify poor performing students at an early stage and perhaps implement motivational interventions to improve their academic success and prevent educational dropout.Keywords: classification algorithms, decision tree, feature selection, multi-layer perceptron, Naïve Bayes, random forest, students’ academic performance
Procedia PDF Downloads 16524055 A Novel Framework for User-Friendly Ontology-Mediated Access to Relational Databases
Authors: Efthymios Chondrogiannis, Vassiliki Andronikou, Efstathios Karanastasis, Theodora Varvarigou
Abstract:
A large amount of data is typically stored in relational databases (DB). The latter can efficiently handle user queries which intend to elicit the appropriate information from data sources. However, direct access and use of this data requires the end users to have an adequate technical background, while they should also cope with the internal data structure and values presented. Consequently the information retrieval is a quite difficult process even for IT or DB experts, taking into account the limited contributions of relational databases from the conceptual point of view. Ontologies enable users to formally describe a domain of knowledge in terms of concepts and relations among them and hence they can be used for unambiguously specifying the information captured by the relational database. However, accessing information residing in a database using ontologies is feasible, provided that the users are keen on using semantic web technologies. For enabling users form different disciplines to retrieve the appropriate data, the design of a Graphical User Interface is necessary. In this work, we will present an interactive, ontology-based, semantically enable web tool that can be used for information retrieval purposes. The tool is totally based on the ontological representation of underlying database schema while it provides a user friendly environment through which the users can graphically form and execute their queries.Keywords: ontologies, relational databases, SPARQL, web interface
Procedia PDF Downloads 27124054 Anomaly Detection in Financial Markets Using Tucker Decomposition
Authors: Salma Krafessi
Abstract:
The financial markets have a multifaceted, intricate environment, and enormous volumes of data are produced every day. To find investment possibilities, possible fraudulent activity, and market oddities, accurate anomaly identification in this data is essential. Conventional methods for detecting anomalies frequently fail to capture the complex organization of financial data. In order to improve the identification of abnormalities in financial time series data, this study presents Tucker Decomposition as a reliable multi-way analysis approach. We start by gathering closing prices for the S&P 500 index across a number of decades. The information is converted to a three-dimensional tensor format, which contains internal characteristics and temporal sequences in a sliding window structure. The tensor is then broken down using Tucker Decomposition into a core tensor and matching factor matrices, allowing latent patterns and relationships in the data to be captured. A possible sign of abnormalities is the reconstruction error from Tucker's Decomposition. We are able to identify large deviations that indicate unusual behavior by setting a statistical threshold. A thorough examination that contrasts the Tucker-based method with traditional anomaly detection approaches validates our methodology. The outcomes demonstrate the superiority of Tucker's Decomposition in identifying intricate and subtle abnormalities that are otherwise missed. This work opens the door for more research into multi-way data analysis approaches across a range of disciplines and emphasizes the value of tensor-based methods in financial analysis.Keywords: tucker decomposition, financial markets, financial engineering, artificial intelligence, decomposition models
Procedia PDF Downloads 6824053 Analyzing the Relationship between the Spatial Characteristics of Cultural Structure, Activities, and the Tourism Demand
Authors: Deniz Karagöz
Abstract:
This study is attempt to comprehend the relationship between the spatial characteristics of cultural structure, activities and the tourism demand in Turkey. The analysis divided into four parts. The first part consisted of a cultural structure and cultural activity (CSCA) index provided by principal component analysis. The analysis determined four distinct dimensions, namely, cultural activity/structure, accessing culture, consumption, and cultural management. The exploratory spatial data analysis employed to determine the spatial models of cultural structure and cultural activities in 81 provinces in Turkey. Global Moran I indices is used to ascertain the cultural activities and the structural clusters. Finally, the relationship between the cultural activities/cultural structure and tourism demand was analyzed. The raw/original data of the study official databases. The data on the cultural structure and activities gathered from the Turkish Statistical Institute and the data related to the tourism demand was provided by the Republic of Turkey Ministry of Culture and Tourism.Keywords: cultural activities, cultural structure, spatial characteristics, tourism demand, Turkey
Procedia PDF Downloads 55824052 The Synergistic Effects of Blockchain and AI on Enhancing Data Integrity and Decision-Making Accuracy in Smart Contracts
Authors: Sayor Ajfar Aaron, Sajjat Hossain Abir, Ashif Newaz, Mushfiqur Rahman
Abstract:
Investigating the convergence of blockchain technology and artificial intelligence, this paper examines their synergistic effects on data integrity and decision-making within smart contracts. By implementing AI-driven analytics on blockchain-based platforms, the research identifies improvements in automated contract enforcement and decision accuracy. The paper presents a framework that leverages AI to enhance transparency and trust while blockchain ensures immutable record-keeping, culminating in significantly optimized operational efficiencies in various industries.Keywords: artificial intelligence, blockchain, data integrity, smart contracts
Procedia PDF Downloads 5324051 Time-Series Load Data Analysis for User Power Profiling
Authors: Mahdi Daghmhehci Firoozjaei, Minchang Kim, Dima Alhadidi
Abstract:
In this paper, we present a power profiling model for smart grid consumers based on real time load data acquired smart meters. It profiles consumers’ power consumption behaviour using the dynamic time warping (DTW) clustering algorithm. Due to the invariability of signal warping of this algorithm, time-disordered load data can be profiled and consumption features be extracted. Two load types are defined and the related load patterns are extracted for classifying consumption behaviour by DTW. The classification methodology is discussed in detail. To evaluate the performance of the method, we analyze the time-series load data measured by a smart meter in a real case. The results verify the effectiveness of the proposed profiling method with 90.91% true positive rate for load type clustering in the best case.Keywords: power profiling, user privacy, dynamic time warping, smart grid
Procedia PDF Downloads 14724050 Evaluation of Dual Polarization Rainfall Estimation Algorithm Applicability in Korea: A Case Study on Biseulsan Radar
Authors: Chulsang Yoo, Gildo Kim
Abstract:
Dual polarization radar provides comprehensive information about rainfall by measuring multiple parameters. In Korea, for the rainfall estimation, JPOLE and CSU-HIDRO algorithms are generally used. This study evaluated the local applicability of JPOLE and CSU-HIDRO algorithms in Korea by using the observed rainfall data collected on August, 2014 by the Biseulsan dual polarization radar data and KMA AWS. A total of 11,372 pairs of radar-ground rain rate data were classified according to thresholds of synthetic algorithms into suitable and unsuitable data. Then, evaluation criteria were derived by comparing radar rain rate and ground rain rate, respectively, for entire, suitable, unsuitable data. The results are as follows: (1) The radar rain rate equation including KDP, was found better in the rainfall estimation than the other equations for both JPOLE and CSU-HIDRO algorithms. The thresholds were found to be adequately applied for both algorithms including specific differential phase. (2) The radar rain rate equation including horizontal reflectivity and differential reflectivity were found poor compared to the others. The result was not improved even when only the suitable data were applied. Acknowledgments: This work was supported by the Basic Science Research Program through the National Research Foundation of Korea, funded by the Ministry of Education (NRF-2013R1A1A2011012).Keywords: CSU-HIDRO algorithm, dual polarization radar, JPOLE algorithm, radar rainfall estimation algorithm
Procedia PDF Downloads 21024049 Framework for Socio-Technical Issues in Requirements Engineering for Developing Resilient Machine Vision Systems Using Levels of Automation through the Lifecycle
Authors: Ryan Messina, Mehedi Hasan
Abstract:
This research is to examine the impacts of using data to generate performance requirements for automation in visual inspections using machine vision. These situations are intended for design and how projects can smooth the transfer of tacit knowledge to using an algorithm. We have proposed a framework when specifying machine vision systems. This framework utilizes varying levels of automation as contingency planning to reduce data processing complexity. Using data assists in extracting tacit knowledge from those who can perform the manual tasks to assist design the system; this means that real data from the system is always referenced and minimizes errors between participating parties. We propose using three indicators to know if the project has a high risk of failing to meet requirements related to accuracy and reliability. All systems tested achieved a better integration into operations after applying the framework.Keywords: automation, contingency planning, continuous engineering, control theory, machine vision, system requirements, system thinking
Procedia PDF Downloads 20324048 Wreathed Hornbill (Rhyticeros undulatus) on Mount Ungaran: Are their Habitat Threatened?
Authors: Margareta Rahayuningsih, Nugroho Edi K., Siti Alimah
Abstract:
Wreathed Hornbill (Rhyticeros undulatus) is the one of hornbill species (Family: Bucerotidae) that found on Mount Ungaran. In the preservation or planning in situ conservation of Wreathed Hornbill require the habitat condition data. The objective of the research was to determine the land cover change on Mount Ungaran using satellite image data and GIS. Based on the land cover data on 1999-2009 the research showed that the primer forest on Mount Ungaran was decreased almost 50%, while the seconder forest, tea and coffee plantation, and the settlement were increased.Keywords: GIS, Mount Ungaran, threatened habitat, Wreathed Hornbill (Rhyticeros undulatus)
Procedia PDF Downloads 35924047 Performance Comparison of ADTree and Naive Bayes Algorithms for Spam Filtering
Authors: Thanh Nguyen, Andrei Doncescu, Pierre Siegel
Abstract:
Classification is an important data mining technique and could be used as data filtering in artificial intelligence. The broad application of classification for all kind of data leads to be used in nearly every field of our modern life. Classification helps us to put together different items according to the feature items decided as interesting and useful. In this paper, we compare two classification methods Naïve Bayes and ADTree use to detect spam e-mail. This choice is motivated by the fact that Naive Bayes algorithm is based on probability calculus while ADTree algorithm is based on decision tree. The parameter settings of the above classifiers use the maximization of true positive rate and minimization of false positive rate. The experiment results present classification accuracy and cost analysis in view of optimal classifier choice for Spam Detection. It is point out the number of attributes to obtain a tradeoff between number of them and the classification accuracy.Keywords: classification, data mining, spam filtering, naive bayes, decision tree
Procedia PDF Downloads 40824046 Mapping of Electrical Energy Consumption Yogyakarta Province in 2014-2025
Authors: Alfi Al Fahreizy
Abstract:
Yogyakarta is one of the provinces in Indonesia that often get a power outage because of high load electrical consumption. The authors mapped the electrical energy consumption [GWh] for the province of Yogyakarta in 2014-2025 using LEAP (Long-range Energy Alternatives Planning system) software. This paper use BAU (Business As Usual) scenario. BAU scenario in which the projection is based on the assumption that growth in electricity consumption will run as normally as before. The goal is to be able to see the electrical energy consumption in the household sector, industry , business, social, government office building, and street lighting. The data is the data projected statistical population and consumption data electricity [GWh] 2010, 2011, 2012 in Yogyakarta province.Keywords: LEAP, energy consumption, Yogyakarta, BAU
Procedia PDF Downloads 59624045 Research and Application of Multi-Scale Three Dimensional Plant Modeling
Authors: Weiliang Wen, Xinyu Guo, Ying Zhang, Jianjun Du, Boxiang Xiao
Abstract:
Reconstructing and analyzing three-dimensional (3D) models from situ measured data is important for a number of researches and applications in plant science, including plant phenotyping, functional-structural plant modeling (FSPM), plant germplasm resources protection, agricultural technology popularization. It has many scales like cell, tissue, organ, plant and canopy from micro to macroscopic. The techniques currently used for data capture, feature analysis, and 3D reconstruction are quite different of different scales. In this context, morphological data acquisition, 3D analysis and modeling of plants on different scales are introduced systematically. The commonly used data capture equipment for these multiscale is introduced. Then hot issues and difficulties of different scales are described respectively. Some examples are also given, such as Micron-scale phenotyping quantification and 3D microstructure reconstruction of vascular bundles within maize stalks based on micro-CT scanning, 3D reconstruction of leaf surfaces and feature extraction from point cloud acquired by using 3D handheld scanner, plant modeling by combining parameter driven 3D organ templates. Several application examples by using the 3D models and analysis results of plants are also introduced. A 3D maize canopy was constructed, and light distribution was simulated within the canopy, which was used for the designation of ideal plant type. A grape tree model was constructed from 3D digital and point cloud data, which was used for the production of science content of 11th international conference on grapevine breeding and genetics. By using the tissue models of plants, a Google glass was used to look around visually inside the plant to understand the internal structure of plants. With the development of information technology, 3D data acquisition, and data processing techniques will play a greater role in plant science.Keywords: plant, three dimensional modeling, multi-scale, plant phenotyping, three dimensional data acquisition
Procedia PDF Downloads 27524044 Principal Component Analysis in Drug-Excipient Interactions
Authors: Farzad Khajavi
Abstract:
Studies about the interaction between active pharmaceutical ingredients (API) and excipients are so important in the pre-formulation stage of development of all dosage forms. Analytical techniques such as differential scanning calorimetry (DSC), Thermal gravimetry (TG), and Furrier transform infrared spectroscopy (FTIR) are commonly used tools for investigating regarding compatibility and incompatibility of APIs with excipients. Sometimes the interpretation of data obtained from these techniques is difficult because of severe overlapping of API spectrum with excipients in their mixtures. Principal component analysis (PCA) as a powerful factor analytical method is used in these situations to resolve data matrices acquired from these analytical techniques. Binary mixtures of API and interested excipients are considered and produced. Peaks of FTIR, DSC, or TG of pure API and excipient and their mixtures at different mole ratios will construct the rows of the data matrix. By applying PCA on the data matrix, the number of principal components (PCs) is determined so that it contains the total variance of the data matrix. By plotting PCs or factors obtained from the score of the matrix in two-dimensional spaces if the pure API and its mixture with the excipient at the high amount of API and the 1:1mixture form a separate cluster and the other cluster comprise of the pure excipient and its blend with the API at the high amount of excipient. This confirms the existence of compatibility between API and the interested excipient. Otherwise, the incompatibility will overcome a mixture of API and excipient.Keywords: API, compatibility, DSC, TG, interactions
Procedia PDF Downloads 13124043 Activity Data Analysis for Status Classification Using Fitness Trackers
Authors: Rock-Hyun Choi, Won-Seok Kang, Chang-Sik Son
Abstract:
Physical activity is important for healthy living. Recently wearable devices which motivate physical activity are quickly developing, and become cheaper and more comfortable. In particular, fitness trackers provide a variety of information and need to provide well-analyzed, and user-friendly results. In this study, frequency analysis was performed to classify various data sets of Fitbit into simple activity status. The data from Fitbit cloud server consists of 263 subjects who were healthy factory and office workers in Korea from March 7th to April 30th, 2016. In the results, we found assumptions of activity state classification seem to be sufficient and reasonable.Keywords: activity status, fitness tracker, heart rate, steps
Procedia PDF Downloads 38224042 A Crowdsourced Homeless Data Collection System and Its Econometric Analysis: Strengthening Inclusive Public Administration Policies
Authors: Praniil Nagaraj
Abstract:
This paper proposes a method to collect homeless data using crowdsourcing and presents an approach to analyze the data, demonstrating its potential to strengthen existing and future policies aimed at promoting socio-economic equilibrium. This paper's contributions can be categorized into three main areas. Firstly, a unique method for collecting homeless data is introduced, utilizing a user-friendly smartphone app (currently available for Android). The app enables the general public to quickly record information about homeless individuals, including the number of people and details about their living conditions. The collected data, including date, time, and location, is anonymized and securely transmitted to the cloud. It is anticipated that an increasing number of users motivated to contribute to society will adopt the app, thus expanding the data collection efforts. Duplicate data is addressed through simple classification methods, and historical data is utilized to fill in missing information. The second contribution of this paper is the description of data analysis techniques applied to the collected data. By combining this new data with existing information, statistical regression analysis is employed to gain insights into various aspects, such as distinguishing between unsheltered and sheltered homeless populations, as well as examining their correlation with factors like unemployment rates, housing affordability, and labor demand. Initial data is collected in San Francisco, while pre-existing information is drawn from three cities: San Francisco, New York City, and Washington D.C., facilitating the conduction of simulations. The third contribution focuses on demonstrating the practical implications of the data processing results. The challenges faced by key stakeholders, including charitable organizations and local city governments, are taken into consideration. Two case studies are presented as examples. The first case study explores improving the efficiency of food and necessities distribution, as well as medical assistance, driven by charitable organizations. The second case study examines the correlation between micro-geographic budget expenditure by local city governments and homeless information to justify budget allocation and expenditures. The ultimate objective of this endeavor is to enable the continuous enhancement of the quality of life for the underprivileged. It is hoped that through increased crowdsourcing of data from the public, the Generosity Curve and the Need Curve will intersect, leading to a better world for all.Keywords: crowdsourcing, homelessness, socio-economic policies, statistical analysis
Procedia PDF Downloads 4124041 Does Level of Countries Corruption Affect Firms Working Capital Management?
Authors: Ebrahim Mansoori, Datin Joriah Muhammad
Abstract:
Recent studies in finance have focused on the effect of external variables on working capital management. This study investigates the effect of corruption indexes on firms' working capital management. A large data set that covers data from 2005 to 2013 from five ASEAN countries, namely, Malaysia, Indonesia, Singapore, Thailand, and the Philippines, was selected to investigate how the level of corruption in these countries affect working capital management. The results of panel data analysis include fixed effect estimations showed that a high level of countries' corruption indexes encourages managers to shorten the CCC length. Meanwhile, the managers reduce the level of investment in cash and cash equivalents when the levels of corruption indexes increase. Therefore, increasing the level of countries' corruption indexes encourages managers to select conservative working capital strategies by reducing the level of NLB.Keywords: ASEAN, corruption indexes, panel data analysis, working capital management
Procedia PDF Downloads 43724040 BIM Data and Digital Twin Framework: Preserving the Past and Predicting the Future
Authors: Mazharuddin Syed Ahmed
Abstract:
This research presents a framework used to develop The Ara Polytechnic College of Architecture Studies building “Kahukura” which is Green Building certified. This framework integrates the development of a smart building digital twin by utilizing Building Information Modelling (BIM) and its BIM maturity levels, including Levels of Development (LOD), eight dimensions of BIM, Heritage-BIM (H-BIM) and Facility Management BIM (FM BIM). The research also outlines a structured approach to building performance analysis and integration with the circular economy, encapsulated within a five-level digital twin framework. Starting with Level 1, the Descriptive Twin provides a live, editable visual replica of the built asset, allowing for specific data inclusion and extraction. Advancing to Level 2, the Informative Twin integrates operational and sensory data, enhancing data verification and system integration. At Level 3, the Predictive Twin utilizes operational data to generate insights and proactive management suggestions. Progressing to Level 4, the Comprehensive Twin simulates future scenarios, enabling robust “what-if” analyses. Finally, Level 5, the Autonomous Twin, represents the pinnacle of digital twin evolution, capable of learning and autonomously acting on behalf of users.Keywords: building information modelling, circular economy integration, digital twin, predictive analytics
Procedia PDF Downloads 4124039 Monitor Vehicle Speed Using Internet of Things Based Wireless Sensor Network System
Authors: Akber Oumer Abdurezak
Abstract:
Road traffic accident is a major problem in Ethiopia, resulting in the deaths of many people and potential injuries and crash every year and loss of properties. According to the Federal Transport Authority, one of the main causes of traffic accident and crash in Ethiopia is over speeding. Implementation of different technologies is used to monitor the speed of vehicles in order to minimize accidents and crashes. This research aimed at designing a speed monitoring system to monitor the speed of travelling vehicles and movements, reporting illegal speeds or overspeeding vehicles to the concerned bodies. The implementation of the system is through a wireless sensor network. The proposed system can sense and detect the movement of vehicles, process, and analysis the data obtained from the sensor and the cloud system. The data is sent to the central controlling server. The system contains accelerometer and gyroscope sensors to sense and collect the data of the vehicle. Arduino to process the data and Global System for Mobile Communication (GSM) module for communication purposes to send the data to the concerned body. When the speed of the vehicle exceeds the allowable speed limit, the system sends a message to database as “over speeding”. Both accelerometer and gyroscope sensors are used to collect acceleration data. The acceleration data then convert to speed, and the corresponding speed is checked with the speed limit, and those above the speed limit are reported to the concerned authorities to avoid frequent accidents. The proposed system decreases the occurrence of accidents and crashes due to overspeeding and can be used as an eye opener for the implementation of other intelligent transport system technologies. This system can also integrate with other technologies like GPS and Google Maps to obtain better output.Keywords: accelerometer, IOT, GSM, gyroscope
Procedia PDF Downloads 7424038 Image Distortion Correction Method of 2-MHz Side Scan Sonar for Underwater Structure Inspection
Authors: Youngseok Kim, Chul Park, Jonghwa Yi, Sangsik Choi
Abstract:
The 2-MHz Side Scan SONAR (SSS) attached to the boat for inspection of underwater structures is affected by shaking. It is difficult to determine the exact scale of damage of structure. In this study, a motion sensor is attached to the inside of the 2-MHz SSS to get roll, pitch, and yaw direction data, and developed the image stabilization tool to correct the sonar image. We checked that reliable data can be obtained with an average error rate of 1.99% between the measured value and the actual distance through experiment. It is possible to get the accurate sonar data to inspect damage in underwater structure.Keywords: image stabilization, motion sensor, safety inspection, sonar image, underwater structure
Procedia PDF Downloads 27924037 Futuristic Black Box Design Considerations and Global Networking for Real Time Monitoring of Flight Performance Parameters
Authors: K. Parandhama Gowd
Abstract:
The aim of this research paper is to conceptualize, discuss, analyze and propose alternate design methodologies for futuristic Black Box for flight safety. The proposal also includes global networking concepts for real time surveillance and monitoring of flight performance parameters including GPS parameters. It is expected that this proposal will serve as a failsafe real time diagnostic tool for accident investigation and location of debris in real time. In this paper, an attempt is made to improve the existing methods of flight data recording techniques and improve upon design considerations for futuristic FDR to overcome the trauma of not able to locate the block box. Since modern day communications and information technologies with large bandwidth are available coupled with faster computer processing techniques, the attempt made in this paper to develop a failsafe recording technique is feasible. Further data fusion/data warehousing technologies are available for exploitation.Keywords: flight data recorder (FDR), black box, diagnostic tool, global networking, cockpit voice and data recorder (CVDR), air traffic control (ATC), air traffic, telemetry, tracking and control centers ATTTCC)
Procedia PDF Downloads 57124036 Applying Hybrid Graph Drawing and Clustering Methods on Stock Investment Analysis
Authors: Mouataz Zreika, Maria Estela Varua
Abstract:
Stock investment decisions are often made based on current events of the global economy and the analysis of historical data. Conversely, visual representation could assist investors’ gain deeper understanding and better insight on stock market trends more efficiently. The trend analysis is based on long-term data collection. The study adopts a hybrid method that combines the Clustering algorithm and Force-directed algorithm to overcome the scalability problem when visualizing large data. This method exemplifies the potential relationships between each stock, as well as determining the degree of strength and connectivity, which will provide investors another understanding of the stock relationship for reference. Information derived from visualization will also help them make an informed decision. The results of the experiments show that the proposed method is able to produced visualized data aesthetically by providing clearer views for connectivity and edge weights.Keywords: clustering, force-directed, graph drawing, stock investment analysis
Procedia PDF Downloads 30024035 Clinical and Laboratory Diagnosis of Malaria in Surat Thani, Southern Thailand
Authors: Manas Kotepui, Chatree Ratcha, Kwuntida Uthaisar
Abstract:
Malaria infection is still to be considered a major public health problem in Thailand. This study, a retrospective data of patients in Surat Thani Province, Southern Thailand during 2012-2015 was retrieved and analyzed. These data include demographic data, clinical characteristics and laboratory diagnosis. Statistical analyses were performed to demonstrate the frequency, proportion, data tendency, and group comparisons. Total of 395 malaria patients were found. Most of patients were male (253 cases, 64.1%). Most of patients (262 cases, 66.3%) were admitted at 6 am-11.59 am of the day. Three hundred and fifty-five patients (97.5%) were positive with P. falciparum. Hemoglobin, hematocrit, and MCHC between P. falciparum and P. vivax were significant different (P value<0.05).During 2012-2015, prevalence of malaria was highest in 2013. Neutrophils, lymphocytes, and monocytes were significantly changed among patients with fever ≤ 3 days compared with patients with fever >3 days. This information will guide to understanding pathogenesis and characteristic of malaria infection in Sothern Thailand.Keywords: prevalence, malaria, Surat Thani, Thailand
Procedia PDF Downloads 27624034 Adaptive Swarm Balancing Algorithms for Rare-Event Prediction in Imbalanced Healthcare Data
Authors: Jinyan Li, Simon Fong, Raymond Wong, Mohammed Sabah, Fiaidhi Jinan
Abstract:
Clinical data analysis and forecasting have make great contributions to disease control, prevention and detection. However, such data usually suffer from highly unbalanced samples in class distributions. In this paper, we target at the binary imbalanced dataset, where the positive samples take up only the minority. We investigate two different meta-heuristic algorithms, particle swarm optimization and bat-inspired algorithm, and combine both of them with the synthetic minority over-sampling technique (SMOTE) for processing the datasets. One approach is to process the full dataset as a whole. The other is to split up the dataset and adaptively process it one segment at a time. The experimental results reveal that while the performance improvements obtained by the former methods are not scalable to larger data scales, the later one, which we call Adaptive Swarm Balancing Algorithms, leads to significant efficiency and effectiveness improvements on large datasets. We also find it more consistent with the practice of the typical large imbalanced medical datasets. We further use the meta-heuristic algorithms to optimize two key parameters of SMOTE. Leading to more credible performances of the classifier, and shortening the running time compared with the brute-force method.Keywords: Imbalanced dataset, meta-heuristic algorithm, SMOTE, big data
Procedia PDF Downloads 43924033 Convergence and Stability in Federated Learning with Adaptive Differential Privacy Preservation
Authors: Rizwan Rizwan
Abstract:
This paper provides an overview of Federated Learning (FL) and its application in enhancing data security, privacy, and efficiency. FL utilizes three distinct architectures to ensure privacy is never compromised. It involves training individual edge devices and aggregating their models on a server without sharing raw data. This approach not only provides secure models without data sharing but also offers a highly efficient privacy--preserving solution with improved security and data access. Also we discusses various frameworks used in FL and its integration with machine learning, deep learning, and data mining. In order to address the challenges of multi--party collaborative modeling scenarios, a brief review FL scheme combined with an adaptive gradient descent strategy and differential privacy mechanism. The adaptive learning rate algorithm adjusts the gradient descent process to avoid issues such as model overfitting and fluctuations, thereby enhancing modeling efficiency and performance in multi-party computation scenarios. Additionally, to cater to ultra-large-scale distributed secure computing, the research introduces a differential privacy mechanism that defends against various background knowledge attacks.Keywords: federated learning, differential privacy, gradient descent strategy, convergence, stability, threats
Procedia PDF Downloads 2924032 Data Security in Cloud Storage
Authors: Amir Rashid
Abstract:
Today is the world of innovation and Cloud Computing is becoming a day to day technology with every passing day offering remarkable services and features on the go with rapid elasticity. This platform took business computing into an innovative dimension where clients interact and operate through service provider web portals. Initially, the trust relationship between client and service provider remained a big question but with the invention of several cryptographic paradigms, it is becoming common in everyday business. This research work proposes a solution for building a cloud storage service with respect to Data Security addressing public cloud infrastructure where the trust relationship matters a lot between client and service provider. For the great satisfaction of client regarding high-end Data Security, this research paper propose a layer of cryptographic primitives combining several architectures in order to achieve the goal. A survey has been conducted to determine the benefits for such an architecture would provide to both clients/service providers and recent developments in cryptography specifically by cloud storage.Keywords: data security in cloud computing, cloud storage architecture, cryptographic developments, token key
Procedia PDF Downloads 29324031 Fuzzy Total Factor Productivity by Credibility Theory
Authors: Shivi Agarwal, Trilok Mathur
Abstract:
This paper proposes the method to measure the total factor productivity (TFP) change by credibility theory for fuzzy input and output variables. Total factor productivity change has been widely studied with crisp input and output variables, however, in some cases, input and output data of decision-making units (DMUs) can be measured with uncertainty. These data can be represented as linguistic variable characterized by fuzzy numbers. Malmquist productivity index (MPI) is widely used to estimate the TFP change by calculating the total factor productivity of a DMU for different time periods using data envelopment analysis (DEA). The fuzzy DEA (FDEA) model is solved using the credibility theory. The results of FDEA is used to measure the TFP change for fuzzy input and output variables. Finally, numerical examples are presented to illustrate the proposed method to measure the TFP change input and output variables. The suggested methodology can be utilized for performance evaluation of DMUs and help to assess the level of integration. The methodology can also apply to rank the DMUs and can find out the DMUs that are lagging behind and make recommendations as to how they can improve their performance to bring them at par with other DMUs.Keywords: chance-constrained programming, credibility theory, data envelopment analysis, fuzzy data, Malmquist productivity index
Procedia PDF Downloads 36424030 What the Future Holds for Social Media Data Analysis
Authors: P. Wlodarczak, J. Soar, M. Ally
Abstract:
The dramatic rise in the use of Social Media (SM) platforms such as Facebook and Twitter provide access to an unprecedented amount of user data. Users may post reviews on products and services they bought, write about their interests, share ideas or give their opinions and views on political issues. There is a growing interest in the analysis of SM data from organisations for detecting new trends, obtaining user opinions on their products and services or finding out about their online reputations. A recent research trend in SM analysis is making predictions based on sentiment analysis of SM. Often indicators of historic SM data are represented as time series and correlated with a variety of real world phenomena like the outcome of elections, the development of financial indicators, box office revenue and disease outbreaks. This paper examines the current state of research in the area of SM mining and predictive analysis and gives an overview of the analysis methods using opinion mining and machine learning techniques.Keywords: social media, text mining, knowledge discovery, predictive analysis, machine learning
Procedia PDF Downloads 42224029 Development of Automatic Laser Scanning Measurement Instrument
Authors: Chien-Hung Liu, Yu-Fen Chen
Abstract:
This study used triangular laser probe and three-axial direction mobile platform for surface measurement, programmed it and applied it to real-time analytic statistics of different measured data. This structure was used to design a system integration program: using triangular laser probe for scattering or reflection non-contact measurement, transferring the captured signals to the computer through RS-232, and using RS-485 to control the three-axis platform for a wide range of measurement. The data captured by the laser probe are formed into a 3D surface. This study constructed an optical measurement application program in the concept of visual programming language. First, the signals are transmitted to the computer through RS-232/RS-485, and then the signals are stored and recorded in graphic interface timely. This programming concept analyzes various messages, and makes proper presentation graphs and data processing to provide the users with friendly graphic interfaces and data processing state monitoring, and identifies whether the present data are normal in graphic concept. The major functions of the measurement system developed by this study are thickness measurement, SPC, surface smoothness analysis, and analytical calculation of trend line. A result report can be made and printed promptly. This study measured different heights and surfaces successfully, performed on-line data analysis and processing effectively, and developed a man-machine interface for users to operate.Keywords: laser probe, non-contact measurement, triangulation measurement principle, statistical process control, labVIEW
Procedia PDF Downloads 35924028 An Optimized Association Rule Mining Algorithm
Authors: Archana Singh, Jyoti Agarwal, Ajay Rana
Abstract:
Data Mining is an efficient technology to discover patterns in large databases. Association Rule Mining techniques are used to find the correlation between the various item sets in a database, and this co-relation between various item sets are used in decision making and pattern analysis. In recent years, the problem of finding association rules from large datasets has been proposed by many researchers. Various research papers on association rule mining (ARM) are studied and analyzed first to understand the existing algorithms. Apriori algorithm is the basic ARM algorithm, but it requires so many database scans. In DIC algorithm, less amount of database scan is needed but complex data structure lattice is used. The main focus of this paper is to propose a new optimized algorithm (Friendly Algorithm) and compare its performance with the existing algorithms A data set is used to find out frequent itemsets and association rules with the help of existing and proposed (Friendly Algorithm) and it has been observed that the proposed algorithm also finds all the frequent itemsets and essential association rules from databases as compared to existing algorithms in less amount of database scan. In the proposed algorithm, an optimized data structure is used i.e. Graph and Adjacency Matrix.Keywords: association rules, data mining, dynamic item set counting, FP-growth, friendly algorithm, graph
Procedia PDF Downloads 41924027 Failure Statistics Analysis of China’s Spacecraft in Full-Life
Authors: Xin-Yan Ji
Abstract:
The historical failures data of the spacecraft is very useful to improve the spacecraft design and the test philosophies and reduce the spacecraft flight risk. A study of spacecraft failures data was performed, which is the most comprehensive statistics of spacecrafts in China. 2593 on-orbit failures data and 1298 ground data that occurred on 150 spacecraft launched from 2000 to 2016 were identified and collected, which covered the navigation satellites, communication satellites, remote sensing deep space exploration manned spaceflight platforms. In this paper, the failures were analyzed to compare different spacecraft subsystem and estimate their impact on the mission, then the development of spacecraft in China was evaluated from design, software, workmanship, management, parts, and materials. Finally, the lessons learned from the past years show that electrical and mechanical failures are responsible for the largest parts, and the key solution to reduce in-orbit failures is improving design technology, enough redundancy, adequate space environment protection measures, and adequate ground testing.Keywords: spacecraft anomalies, anomalies mechanism, failure cause, spacecraft testing
Procedia PDF Downloads 115