Search results for: startup data analytics
23999 Short Text Classification Using Part of Speech Feature to Analyze Students' Feedback of Assessment Components
Authors: Zainab Mutlaq Ibrahim, Mohamed Bader-El-Den, Mihaela Cocea
Abstract:
Students' textual feedback can hold unique patterns and useful information about learning process, it can hold information about advantages and disadvantages of teaching methods, assessment components, facilities, and other aspects of teaching. The results of analysing such a feedback can form a key point for institutions’ decision makers to advance and update their systems accordingly. This paper proposes a data mining framework for analysing end of unit general textual feedback using part of speech feature (PoS) with four machine learning algorithms: support vector machines, decision tree, random forest, and naive bays. The proposed framework has two tasks: first, to use the above algorithms to build an optimal model that automatically classifies the whole data set into two subsets, one subset is tailored to assessment practices (assessment related), and the other one is the non-assessment related data. Second task to use the same algorithms to build an optimal model for whole data set, and the new data subsets to automatically detect their sentiment. The significance of this paper is to compare the performance of the above four algorithms using part of speech feature to the performance of the same algorithms using n-grams feature. The paper follows Knowledge Discovery and Data Mining (KDDM) framework to construct the classification and sentiment analysis models, which is understanding the assessment domain, cleaning and pre-processing the data set, selecting and running the data mining algorithm, interpreting mined patterns, and consolidating the discovered knowledge. The results of this paper experiments show that both models which used both features performed very well regarding first task. But regarding the second task, models that used part of speech feature has underperformed in comparison with models that used unigrams and bigrams.Keywords: assessment, part of speech, sentiment analysis, student feedback
Procedia PDF Downloads 14323998 Advancing Dialysis Care Access and Health Information Management: A Blueprint for Nairobi Hospital
Authors: Kimberly Winnie Achieng Otieno
Abstract:
The Nairobi Hospital plays a pivotal role in healthcare provision in East and Central Africa, yet it faces challenges in providing accessible dialysis care. This paper explores strategic interventions to enhance dialysis care, improve access and streamline health information management, with an aim of fostering an integrated and patient-centered healthcare system in our region. Challenges at The Nairobi Hospital The Nairobi Hospital currently grapples with insufficient dialysis machines which results in extended turn around times. This issue stems from both staffing bottle necks and infrastructural limitations given our growing demand for renal care services. Our Paper-based record keeping system and fragmented flow of information downstream hinders the hospital’s ability to manage health data effectively. There is also a need for investment in expanding The Nairobi Hospital dialysis facilities to far reaching communities. Setting up satellite clinics that are closer to people who live in areas far from the main hospital will ensure better access to underserved areas. Community Outreach and Education Implementing education programs on kidney health within local communities is vital for early detection and prevention. Collaborating with local leaders and organizations can establish a proactive approach to renal health hence reducing the demand for acute dialysis interventions. We can amplify this effort by expanding The Nairobi Hospital’s corporate social responsibility outreach program with weekend engagement activities such as walks, awareness classes and fund drives. Enhancing Efficiency in Dialysis Care Demand for dialysis services continues to rise due to an aging Kenyan population and the increasing prevalence of chronic kidney disease (CKD). Present at this years International Nursing Conference are a diverse group of caregivers from around the world who can share with us their process optimization strategies, patient engagement techniques and resource utilization efficiencies to catapult The Nairobi Hospital to the 21st century and beyond. Plans are underway to offer ongoing education opportunities to keep staff updated on best practices and emerging technologies in addition to utilizing a patient feedback mechanisms to identify areas for improvement and enhance satisfaction. Staff empowerment and suggestion boxes address The Nairobi Hospital’s organizational challenges. Current financial constraints may limit a leapfrog in technology integration such as the acquisition of new dialysis machines and an investment in predictive analytics to forecast patient needs and optimize resource allocation. Streamlining Health Information Management Fully embracing a shift to 100% Electronic Health Records (EHRs) is a transformative step toward efficient health information management. Shared information promotes a holistic understanding of patients’ medical history, minimizing redundancies and enhancing overall care quality. To manage the transition to community-based care and EHRs effectively, a phased implementation approach is recommended. Conclusion By strategically enhancing dialysis care access and streamlining health information management, The Nairobi Hospital can strengthen its position as a leading healthcare institution in both East and Central Africa. This comprehensive approach aligns with the hospital’s commitment to providing high-quality, accessible, and patient-centered care in an evolving landscape of healthcare delivery.Keywords: Africa, urology, diaylsis, healthcare
Procedia PDF Downloads 6023997 Fast Fourier Transform-Based Steganalysis of Covert Communications over Streaming Media
Authors: Jinghui Peng, Shanyu Tang, Jia Li
Abstract:
Steganalysis seeks to detect the presence of secret data embedded in cover objects, and there is an imminent demand to detect hidden messages in streaming media. This paper shows how a steganalysis algorithm based on Fast Fourier Transform (FFT) can be used to detect the existence of secret data embedded in streaming media. The proposed algorithm uses machine parameter characteristics and a network sniffer to determine whether the Internet traffic contains streaming channels. The detected streaming data is then transferred from the time domain to the frequency domain through FFT. The distributions of power spectra in the frequency domain between original VoIP streams and stego VoIP streams are compared in turn using t-test, achieving the p-value of 7.5686E-176 which is below the threshold. The results indicate that the proposed FFT-based steganalysis algorithm is effective in detecting the secret data embedded in VoIP streaming media.Keywords: steganalysis, security, Fast Fourier Transform, streaming media
Procedia PDF Downloads 14923996 Privacy-Preserving Model for Social Network Sites to Prevent Unwanted Information Diffusion
Authors: Sanaz Kavianpour, Zuraini Ismail, Bharanidharan Shanmugam
Abstract:
Social Network Sites (SNSs) can be served as an invaluable platform to transfer the information across a large number of individuals. A substantial component of communicating and managing information is to identify which individual will influence others in propagating information and also whether dissemination of information in the absence of social signals about that information will be occurred or not. Classifying the final audience of social data is difficult as controlling the social contexts which transfers among individuals are not completely possible. Hence, undesirable information diffusion to an unauthorized individual on SNSs can threaten individuals’ privacy. This paper highlights the information diffusion in SNSs and moreover it emphasizes the most significant privacy issues to individuals of SNSs. The goal of this paper is to propose a privacy-preserving model that has urgent regards with individuals’ data in order to control availability of data and improve privacy by providing access to the data for an appropriate third parties without compromising the advantages of information sharing through SNSs.Keywords: anonymization algorithm, classification algorithm, information diffusion, privacy, social network sites
Procedia PDF Downloads 32123995 Application Difference between Cox and Logistic Regression Models
Authors: Idrissa Kayijuka
Abstract:
The logistic regression and Cox regression models (proportional hazard model) at present are being employed in the analysis of prospective epidemiologic research looking into risk factors in their application on chronic diseases. However, a theoretical relationship between the two models has been studied. By definition, Cox regression model also called Cox proportional hazard model is a procedure that is used in modeling data regarding time leading up to an event where censored cases exist. Whereas the Logistic regression model is mostly applicable in cases where the independent variables consist of numerical as well as nominal values while the resultant variable is binary (dichotomous). Arguments and findings of many researchers focused on the overview of Cox and Logistic regression models and their different applications in different areas. In this work, the analysis is done on secondary data whose source is SPSS exercise data on BREAST CANCER with a sample size of 1121 women where the main objective is to show the application difference between Cox regression model and logistic regression model based on factors that cause women to die due to breast cancer. Thus we did some analysis manually i.e. on lymph nodes status, and SPSS software helped to analyze the mentioned data. This study found out that there is an application difference between Cox and Logistic regression models which is Cox regression model is used if one wishes to analyze data which also include the follow-up time whereas Logistic regression model analyzes data without follow-up-time. Also, they have measurements of association which is different: hazard ratio and odds ratio for Cox and logistic regression models respectively. A similarity between the two models is that they are both applicable in the prediction of the upshot of a categorical variable i.e. a variable that can accommodate only a restricted number of categories. In conclusion, Cox regression model differs from logistic regression by assessing a rate instead of proportion. The two models can be applied in many other researches since they are suitable methods for analyzing data but the more recommended is the Cox, regression model.Keywords: logistic regression model, Cox regression model, survival analysis, hazard ratio
Procedia PDF Downloads 45823994 Text Mining of Twitter Data Using a Latent Dirichlet Allocation Topic Model and Sentiment Analysis
Authors: Sidi Yang, Haiyi Zhang
Abstract:
Twitter is a microblogging platform, where millions of users daily share their attitudes, views, and opinions. Using a probabilistic Latent Dirichlet Allocation (LDA) topic model to discern the most popular topics in the Twitter data is an effective way to analyze a large set of tweets to find a set of topics in a computationally efficient manner. Sentiment analysis provides an effective method to show the emotions and sentiments found in each tweet and an efficient way to summarize the results in a manner that is clearly understood. The primary goal of this paper is to explore text mining, extract and analyze useful information from unstructured text using two approaches: LDA topic modelling and sentiment analysis by examining Twitter plain text data in English. These two methods allow people to dig data more effectively and efficiently. LDA topic model and sentiment analysis can also be applied to provide insight views in business and scientific fields.Keywords: text mining, Twitter, topic model, sentiment analysis
Procedia PDF Downloads 18023993 Value Chain Based New Business Opportunity
Authors: Seonjae Lee, Sungjoo Lee
Abstract:
Excavation is necessary to remain competitive in the current business environment. The company survived the rapidly changing industry conditions by adapting new business strategy and reducing technology challenges. Traditionally, the two methods are conducted excavations for new businesses. The first method is, qualitative analysis of expert opinion, which is gathered through opportunities and secondly, new technologies are discovered through quantitative data analysis of method patents. The second method increases time and cost. Patent data is restricted for use and the purpose of discovering business opportunities. This study presents the company's characteristics (sector, size, etc.), of new business opportunities in customized form by reviewing the value chain perspective and to contributing to creating new business opportunities in the proposed model. It utilizes the trademark database of the Korean Intellectual Property Office (KIPO) and proprietary company information database of the Korea Enterprise Data (KED). This data is key to discovering new business opportunities with analysis of competitors and advanced business trademarks (Module 1) and trading analysis of competitors found in the KED (Module 2).Keywords: value chain, trademark, trading analysis, new business opportunity
Procedia PDF Downloads 37623992 Towards Addressing the Cultural Snapshot Phenomenon in Cultural Mapping Libraries
Authors: Mousouris Spiridon, Kavakli Evangelia
Abstract:
This paper focuses on Digital Libraries (DLs) that contain and geovisualise cultural data, highlighting the need to define them as a separate category termed Cultural Mapping Libraries, based on their inherent connection of culture with geographic location and their design requirements in support of visual representation of cultural data on the map. An exploratory analysis of DLs that conform to the above definition brought forward the observation that existing Cultural Mapping Libraries fail to geovisualise the entirety of cultural data per point of interest thus resulting in a Cultural Snapshot phenomenon. The existence of this phenomenon was reinforced by the results of a systematic bibliographic research. In order to address the Cultural Snapshot, this paper proposes the use of the Semantic Web principles to efficiently interconnect spatial cultural data through time, per geographic location. In this way points of interest are transformed into scenery where culture evolves over time. This evolution is expressed as occurrences taking place chronologically, in an event oriented approach, a conceptualization also endorsed by the CIDOC Conceptual Reference Model (CIDOC CRM). In particular, we posit the use of CIDOC CRM as the baseline for defining the logic of Cultural Mapping Libraries as part of the Culture Domain in accordance with the Digital Library Reference Model, in order to define the rules of cultural data management by the system. Our future goal is to transform this conceptual definition in to inferencing rules that resolve the Cultural Snapshot and lead to a more complete geovisualisation of cultural data.Keywords: digital libraries, semantic web, geovisualization, CIDOC-CRM
Procedia PDF Downloads 11023991 An Evaluation of the Impact of E-Banking on Operational Efficiency of Banks in Nigeria
Authors: Ibrahim Rabiu Darazo
Abstract:
The research has been conducted on the impact of E-banking on the operational efficiency of Banks in Nigeria, A case of some selected banks (Diamond Bank Plc, GTBankPlc, and Fidelity Bank Plc) in Nigeria. The research is a quantitative research which uses both primary and secondary sources of data collection. Questionnaire were used to obtained accurate data, where 150 Questionnaire were distributed among staff and customers of the three Banks , and the data collected where analysed using chi-square, whereas the secondary data where obtained from relevant text books, journals and relevant web sites. It is clear from the findings that, the use of e-banking by the banks has improved the efficiency of these banks, in terms of providing efficient services to customers electronically, using Internet Banking, Telephone Banking ATMs, reducing time taking to serve customers, e-banking allow new customers to open an account online, customers have access to their account at all the time 24/7.E-banking provide access to customers information from the data base and cost of check and postage were eliminated using e-banking. The recommendation at the end of the research include; the Banks should try to update their electronic gadgets, e-fraud(internal & external) should also be controlled, Banks shall employ qualified man power, Biometric ATMs shall be introduce to reduce fraud using ATM Cards, as it is use in other countries like USA.Keywords: banks, electronic banking, operational efficiency of banks, biometric ATMs
Procedia PDF Downloads 33523990 Optimize Data Evaluation Metrics for Fraud Detection Using Machine Learning
Authors: Jennifer Leach, Umashanger Thayasivam
Abstract:
The use of technology has benefited society in more ways than one ever thought possible. Unfortunately, though, as society’s knowledge of technology has advanced, so has its knowledge of ways to use technology to manipulate people. This has led to a simultaneous advancement in the world of fraud. Machine learning techniques can offer a possible solution to help decrease this advancement. This research explores how the use of various machine learning techniques can aid in detecting fraudulent activity across two different types of fraudulent data, and the accuracy, precision, recall, and F1 were recorded for each method. Each machine learning model was also tested across five different training and testing splits in order to discover which testing split and technique would lead to the most optimal results.Keywords: data science, fraud detection, machine learning, supervised learning
Procedia PDF Downloads 19723989 Suitability of Satellite-Based Data for Groundwater Modelling in Southwest Nigeria
Authors: O. O. Aiyelokun, O. A. Agbede
Abstract:
Numerical modelling of groundwater flow can be susceptible to calibration errors due to lack of adequate ground-based hydro-metrological stations in river basins. Groundwater resources management in Southwest Nigeria is currently challenged by overexploitation, lack of planning and monitoring, urbanization and climate change; hence to adopt models as decision support tools for sustainable management of groundwater; they must be adequately calibrated. Since river basins in Southwest Nigeria are characterized by missing data, and lack of adequate ground-based hydro-meteorological stations; the need for adopting satellite-based data for constructing distributed models is crucial. This study seeks to evaluate the suitability of satellite-based data as substitute for ground-based, for computing boundary conditions; by determining if ground and satellite based meteorological data fit well in Ogun and Oshun River basins. The Climate Forecast System Reanalysis (CFSR) global meteorological dataset was firstly obtained in daily form and converted to monthly form for the period of 432 months (January 1979 to June, 2014). Afterwards, ground-based meteorological data for Ikeja (1981-2010), Abeokuta (1983-2010), and Oshogbo (1981-2010) were compared with CFSR data using Goodness of Fit (GOF) statistics. The study revealed that based on mean absolute error (MEA), coefficient of correlation, (r) and coefficient of determination (R²); all meteorological variables except wind speed fit well. It was further revealed that maximum and minimum temperature, relative humidity and rainfall had high range of index of agreement (d) and ratio of standard deviation (rSD), implying that CFSR dataset could be used to compute boundary conditions such as groundwater recharge and potential evapotranspiration. The study concluded that satellite-based data such as the CFSR should be used as input when constructing groundwater flow models in river basins in Southwest Nigeria, where majority of the river basins are partially gaged and characterized with long missing hydro-metrological data.Keywords: boundary condition, goodness of fit, groundwater, satellite-based data
Procedia PDF Downloads 13123988 An Intelligent Prediction Method for Annular Pressure Driven by Mechanism and Data
Authors: Zhaopeng Zhu, Xianzhi Song, Gensheng Li, Shuo Zhu, Shiming Duan, Xuezhe Yao
Abstract:
Accurate calculation of wellbore pressure is of great significance to prevent wellbore risk during drilling. The traditional mechanism model needs a lot of iterative solving procedures in the calculation process, which reduces the calculation efficiency and is difficult to meet the demand of dynamic control of wellbore pressure. In recent years, many scholars have introduced artificial intelligence algorithms into wellbore pressure calculation, which significantly improves the calculation efficiency and accuracy of wellbore pressure. However, due to the ‘black box’ property of intelligent algorithm, the existing intelligent calculation model of wellbore pressure is difficult to play a role outside the scope of training data and overreacts to data noise, often resulting in abnormal calculation results. In this study, the multi-phase flow mechanism is embedded into the objective function of the neural network model as a constraint condition, and an intelligent prediction model of wellbore pressure under the constraint condition is established based on more than 400,000 sets of pressure measurement while drilling (MPD) data. The constraint of the multi-phase flow mechanism makes the prediction results of the neural network model more consistent with the distribution law of wellbore pressure, which overcomes the black-box attribute of the neural network model to some extent. The main performance is that the accuracy of the independent test data set is further improved, and the abnormal calculation values basically disappear. This method is a prediction method driven by MPD data and multi-phase flow mechanism, and it is the main way to predict wellbore pressure accurately and efficiently in the future.Keywords: multiphase flow mechanism, pressure while drilling data, wellbore pressure, mechanism constraints, combined drive
Procedia PDF Downloads 17623987 Prediction of Embankment Fires at Railway Infrastructure Using Machine Learning, Geospatial Data and VIIRS Remote Sensing Imagery
Authors: Jan-Peter Mund, Christian Kind
Abstract:
In view of the ongoing climate change and global warming, fires along railways in Germany are occurring more frequently, with sometimes massive consequences for railway operations and affected railroad infrastructure. In the absence of systematic studies within the infrastructure network of German Rail, little is known about the causes of such embankment fires. Since a further increase in these hazards is to be expected in the near future, there is a need for a sound knowledge of triggers and drivers for embankment fires as well as methodical knowledge of prediction tools. Two predictable future trends speak for the increasing relevance of the topic: through the intensification of the use of rail for passenger and freight transport (e.g..: doubling of annual passenger numbers by 2030, compared to 2019), there will be more rail traffic and also more maintenance and construction work on the railways. This research project approach uses satellite data to identify historical embankment fires along rail network infrastructure. The team links data from these fires with infrastructure and weather data and trains a machine-learning model with the aim of predicting fire hazards on sections of the track. Companies reflect on the results and use them on a pilot basis in precautionary measures.Keywords: embankment fires, railway maintenance, machine learning, remote sensing, VIIRS data
Procedia PDF Downloads 9023986 A Hybrid Data Mining Algorithm Based System for Intelligent Defence Mission Readiness and Maintenance Scheduling
Authors: Shivam Dwivedi, Sumit Prakash Gupta, Durga Toshniwal
Abstract:
It is a challenging task in today’s date to keep defence forces in the highest state of combat readiness with budgetary constraints. A huge amount of time and money is squandered in the unnecessary and expensive traditional maintenance activities. To overcome this limitation Defence Intelligent Mission Readiness and Maintenance Scheduling System has been proposed, which ameliorates the maintenance system by diagnosing the condition and predicting the maintenance requirements. Based on new data mining algorithms, this system intelligently optimises mission readiness for imminent operations and maintenance scheduling in repair echelons. With modified data mining algorithms such as Weighted Feature Ranking Genetic Algorithm and SVM-Random Forest Linear ensemble, it improves the reliability, availability and safety, alongside reducing maintenance cost and Equipment Out of Action (EOA) time. The results clearly conclude that the introduced algorithms have an edge over the conventional data mining algorithms. The system utilizing the intelligent condition-based maintenance approach improves the operational and maintenance decision strategy of the defence force.Keywords: condition based maintenance, data mining, defence maintenance, ensemble, genetic algorithms, maintenance scheduling, mission capability
Procedia PDF Downloads 29823985 Using Emerging Hot Spot Analysis to Analyze Overall Effectiveness of Policing Policy and Strategy in Chicago
Authors: Tyler Gill, Sophia Daniels
Abstract:
The paper examines how accessing the spatial-temporal constrains of data will help inform policymakers and law enforcement officials. The authors utilize Chicago crime data from 2006-2016 to demonstrate how the Emerging Hot Spot Tool is an ideal hot spot clustering approach to analyze crime data. Traditional approaches include density maps or creating a spatial weights matrix to include the spatial-temporal constrains. This new approach utilizes a space-time implementation of the Getis-Ord Gi* statistic to visualize the data more quickly to make better decisions. The research will help complement socio-cultural research to find key patterns to help frame future policies and evaluate the implementation of prior strategies. Through this analysis, homicide trends and patterns are found more effectively and recommendations for use by non-traditional users of GIS are offered for real life implementation.Keywords: crime mapping, emerging hot spot analysis, Getis-Ord Gi*, spatial-temporal analysis
Procedia PDF Downloads 24623984 Active Learning in Engineering Courses Using Excel Spreadsheet
Authors: Promothes Saha
Abstract:
Recently, transportation engineering industry members at the study university showed concern that students lacked the skills needed to solve real-world engineering problems using spreadsheet data analysis. In response to the concerns shown by industry members, this study investigated how to engage students in a better way by incorporating spreadsheet analysis during class - also, help them learn the course topics. Helping students link theoretical knowledge to real-world problems can be a challenge. In this effort, in-class activities and worksheets were redesigned to integrate with Excel to solve example problems using built-in tools including cell referencing, equations, data analysis tool pack, solver tool, conditional formatting, charts, etc. The effectiveness of this technique was investigated using students’ evaluations of the course, enrollment data, and students’ comments. Based on the data of those criteria, it is evident that the spreadsheet activities may increase student learning.Keywords: civil, engineering, active learning, transportation
Procedia PDF Downloads 13923983 Understanding Cruise Passengers’ On-board Experience throughout the Customer Decision Journey
Authors: Sabina Akter, Osiris Valdez Banda, Pentti Kujala, Jani Romanoff
Abstract:
This paper examines the relationship between on-board environmental factors and customer overall satisfaction in the context of the cruise on-board experience. The on-board environmental factors considered are ambient, layout/design, social, product/service and on-board enjoyment factors. The study presents a data-driven framework and model for the on-board cruise experience. The data are collected from 893 respondents in an application of a self-administered online questionnaire of their cruise experience. This study reveals the cruise passengers’ on-board experience through the customer decision journey based on the publicly available data. Pearson correlation and regression analysis have been applied, and the results show a positive and a significant relationship between the environmental factors and on-board experience. These data help understand the cruise passengers’ on-board experience, which will be used for the ultimate decision-making process in cruise ship design.Keywords: cruise behavior, customer activities, on-board environmental factors, on-board experience, user or customer satisfaction
Procedia PDF Downloads 17023982 Holistic Risk Assessment Based on Continuous Data from the User’s Behavior and Environment
Authors: Cinzia Carrodano, Dimitri Konstantas
Abstract:
Risk is part of our lives. In today’s society risk is connected to our safety and safety has become a major priority in our life. Each person lives his/her life based on the evaluation of the risk he/she is ready to accept and sustain, and the level of safety he/she wishes to reach, based on highly personal criteria. The assessment of risk a person takes in a complex environment and the impact of actions of other people’actions and events on our perception of risk are alements to be considered. The concept of Holistic Risk Assessment (HRA) aims in developing a methodology and a model that will allow us to take into account elements outside the direct influence of the individual, and provide a personalized risk assessment. The concept is based on the fact that in the near future, we will be able to gather and process extremely large amounts of data about an individual and his/her environment in real time. The interaction and correlation of these data is the key element of the holistic risk assessment. In this paper, we present the HRA concept and describe the most important elements and considerations.Keywords: continuous data, dynamic risk, holistic risk assessment, risk concept
Procedia PDF Downloads 12823981 A Comparative Analysis of Classification Models with Wrapper-Based Feature Selection for Predicting Student Academic Performance
Authors: Abdullah Al Farwan, Ya Zhang
Abstract:
In today’s educational arena, it is critical to understand educational data and be able to evaluate important aspects, particularly data on student achievement. Educational Data Mining (EDM) is a research area that focusing on uncovering patterns and information in data from educational institutions. Teachers, if they are able to predict their students' class performance, can use this information to improve their teaching abilities. It has evolved into valuable knowledge that can be used for a wide range of objectives; for example, a strategic plan can be used to generate high-quality education. Based on previous data, this paper recommends employing data mining techniques to forecast students' final grades. In this study, five data mining methods, Decision Tree, JRip, Naive Bayes, Multi-layer Perceptron, and Random Forest with wrapper feature selection, were used on two datasets relating to Portuguese language and mathematics classes lessons. The results showed the effectiveness of using data mining learning methodologies in predicting student academic success. The classification accuracy achieved with selected algorithms lies in the range of 80-94%. Among all the selected classification algorithms, the lowest accuracy is achieved by the Multi-layer Perceptron algorithm, which is close to 70.45%, and the highest accuracy is achieved by the Random Forest algorithm, which is close to 94.10%. This proposed work can assist educational administrators to identify poor performing students at an early stage and perhaps implement motivational interventions to improve their academic success and prevent educational dropout.Keywords: classification algorithms, decision tree, feature selection, multi-layer perceptron, Naïve Bayes, random forest, students’ academic performance
Procedia PDF Downloads 17023980 A Novel Framework for User-Friendly Ontology-Mediated Access to Relational Databases
Authors: Efthymios Chondrogiannis, Vassiliki Andronikou, Efstathios Karanastasis, Theodora Varvarigou
Abstract:
A large amount of data is typically stored in relational databases (DB). The latter can efficiently handle user queries which intend to elicit the appropriate information from data sources. However, direct access and use of this data requires the end users to have an adequate technical background, while they should also cope with the internal data structure and values presented. Consequently the information retrieval is a quite difficult process even for IT or DB experts, taking into account the limited contributions of relational databases from the conceptual point of view. Ontologies enable users to formally describe a domain of knowledge in terms of concepts and relations among them and hence they can be used for unambiguously specifying the information captured by the relational database. However, accessing information residing in a database using ontologies is feasible, provided that the users are keen on using semantic web technologies. For enabling users form different disciplines to retrieve the appropriate data, the design of a Graphical User Interface is necessary. In this work, we will present an interactive, ontology-based, semantically enable web tool that can be used for information retrieval purposes. The tool is totally based on the ontological representation of underlying database schema while it provides a user friendly environment through which the users can graphically form and execute their queries.Keywords: ontologies, relational databases, SPARQL, web interface
Procedia PDF Downloads 27323979 Anomaly Detection in Financial Markets Using Tucker Decomposition
Authors: Salma Krafessi
Abstract:
The financial markets have a multifaceted, intricate environment, and enormous volumes of data are produced every day. To find investment possibilities, possible fraudulent activity, and market oddities, accurate anomaly identification in this data is essential. Conventional methods for detecting anomalies frequently fail to capture the complex organization of financial data. In order to improve the identification of abnormalities in financial time series data, this study presents Tucker Decomposition as a reliable multi-way analysis approach. We start by gathering closing prices for the S&P 500 index across a number of decades. The information is converted to a three-dimensional tensor format, which contains internal characteristics and temporal sequences in a sliding window structure. The tensor is then broken down using Tucker Decomposition into a core tensor and matching factor matrices, allowing latent patterns and relationships in the data to be captured. A possible sign of abnormalities is the reconstruction error from Tucker's Decomposition. We are able to identify large deviations that indicate unusual behavior by setting a statistical threshold. A thorough examination that contrasts the Tucker-based method with traditional anomaly detection approaches validates our methodology. The outcomes demonstrate the superiority of Tucker's Decomposition in identifying intricate and subtle abnormalities that are otherwise missed. This work opens the door for more research into multi-way data analysis approaches across a range of disciplines and emphasizes the value of tensor-based methods in financial analysis.Keywords: tucker decomposition, financial markets, financial engineering, artificial intelligence, decomposition models
Procedia PDF Downloads 7023978 Analyzing the Relationship between the Spatial Characteristics of Cultural Structure, Activities, and the Tourism Demand
Authors: Deniz Karagöz
Abstract:
This study is attempt to comprehend the relationship between the spatial characteristics of cultural structure, activities and the tourism demand in Turkey. The analysis divided into four parts. The first part consisted of a cultural structure and cultural activity (CSCA) index provided by principal component analysis. The analysis determined four distinct dimensions, namely, cultural activity/structure, accessing culture, consumption, and cultural management. The exploratory spatial data analysis employed to determine the spatial models of cultural structure and cultural activities in 81 provinces in Turkey. Global Moran I indices is used to ascertain the cultural activities and the structural clusters. Finally, the relationship between the cultural activities/cultural structure and tourism demand was analyzed. The raw/original data of the study official databases. The data on the cultural structure and activities gathered from the Turkish Statistical Institute and the data related to the tourism demand was provided by the Republic of Turkey Ministry of Culture and Tourism.Keywords: cultural activities, cultural structure, spatial characteristics, tourism demand, Turkey
Procedia PDF Downloads 56223977 Unveiling Comorbidities in Irritable Bowel Syndrome: A UK BioBank Study utilizing Supervised Machine Learning
Authors: Uswah Ahmad Khan, Muhammad Moazam Fraz, Humayoon Shafique Satti, Qasim Aziz
Abstract:
Approximately 10-14% of the global population experiences a functional disorder known as irritable bowel syndrome (IBS). The disorder is defined by persistent abdominal pain and an irregular bowel pattern. IBS significantly impairs work productivity and disrupts patients' daily lives and activities. Although IBS is widespread, there is still an incomplete understanding of its underlying pathophysiology. This study aims to help characterize the phenotype of IBS patients by differentiating the comorbidities found in IBS patients from those in non-IBS patients using machine learning algorithms. In this study, we extracted samples coding for IBS from the UK BioBank cohort and randomly selected patients without a code for IBS to create a total sample size of 18,000. We selected the codes for comorbidities of these cases from 2 years before and after their IBS diagnosis and compared them to the comorbidities in the non-IBS cohort. Machine learning models, including Decision Trees, Gradient Boosting, Support Vector Machine (SVM), AdaBoost, Logistic Regression, and XGBoost, were employed to assess their accuracy in predicting IBS. The most accurate model was then chosen to identify the features associated with IBS. In our case, we used XGBoost feature importance as a feature selection method. We applied different models to the top 10% of features, which numbered 50. Gradient Boosting, Logistic Regression and XGBoost algorithms yielded a diagnosis of IBS with an optimal accuracy of 71.08%, 71.427%, and 71.53%, respectively. Among the comorbidities most closely associated with IBS included gut diseases (Haemorrhoids, diverticular diseases), atopic conditions(asthma), and psychiatric comorbidities (depressive episodes or disorder, anxiety). This finding emphasizes the need for a comprehensive approach when evaluating the phenotype of IBS, suggesting the possibility of identifying new subsets of IBS rather than relying solely on the conventional classification based on stool type. Additionally, our study demonstrates the potential of machine learning algorithms in predicting the development of IBS based on comorbidities, which may enhance diagnosis and facilitate better management of modifiable risk factors for IBS. Further research is necessary to confirm our findings and establish cause and effect. Alternative feature selection methods and even larger and more diverse datasets may lead to more accurate classification models. Despite these limitations, our findings highlight the effectiveness of Logistic Regression and XGBoost in predicting IBS diagnosis.Keywords: comorbidities, disease association, irritable bowel syndrome (IBS), predictive analytics
Procedia PDF Downloads 11923976 Time-Series Load Data Analysis for User Power Profiling
Authors: Mahdi Daghmhehci Firoozjaei, Minchang Kim, Dima Alhadidi
Abstract:
In this paper, we present a power profiling model for smart grid consumers based on real time load data acquired smart meters. It profiles consumers’ power consumption behaviour using the dynamic time warping (DTW) clustering algorithm. Due to the invariability of signal warping of this algorithm, time-disordered load data can be profiled and consumption features be extracted. Two load types are defined and the related load patterns are extracted for classifying consumption behaviour by DTW. The classification methodology is discussed in detail. To evaluate the performance of the method, we analyze the time-series load data measured by a smart meter in a real case. The results verify the effectiveness of the proposed profiling method with 90.91% true positive rate for load type clustering in the best case.Keywords: power profiling, user privacy, dynamic time warping, smart grid
Procedia PDF Downloads 15623975 Evaluation of Dual Polarization Rainfall Estimation Algorithm Applicability in Korea: A Case Study on Biseulsan Radar
Authors: Chulsang Yoo, Gildo Kim
Abstract:
Dual polarization radar provides comprehensive information about rainfall by measuring multiple parameters. In Korea, for the rainfall estimation, JPOLE and CSU-HIDRO algorithms are generally used. This study evaluated the local applicability of JPOLE and CSU-HIDRO algorithms in Korea by using the observed rainfall data collected on August, 2014 by the Biseulsan dual polarization radar data and KMA AWS. A total of 11,372 pairs of radar-ground rain rate data were classified according to thresholds of synthetic algorithms into suitable and unsuitable data. Then, evaluation criteria were derived by comparing radar rain rate and ground rain rate, respectively, for entire, suitable, unsuitable data. The results are as follows: (1) The radar rain rate equation including KDP, was found better in the rainfall estimation than the other equations for both JPOLE and CSU-HIDRO algorithms. The thresholds were found to be adequately applied for both algorithms including specific differential phase. (2) The radar rain rate equation including horizontal reflectivity and differential reflectivity were found poor compared to the others. The result was not improved even when only the suitable data were applied. Acknowledgments: This work was supported by the Basic Science Research Program through the National Research Foundation of Korea, funded by the Ministry of Education (NRF-2013R1A1A2011012).Keywords: CSU-HIDRO algorithm, dual polarization radar, JPOLE algorithm, radar rainfall estimation algorithm
Procedia PDF Downloads 21723974 Framework for Socio-Technical Issues in Requirements Engineering for Developing Resilient Machine Vision Systems Using Levels of Automation through the Lifecycle
Authors: Ryan Messina, Mehedi Hasan
Abstract:
This research is to examine the impacts of using data to generate performance requirements for automation in visual inspections using machine vision. These situations are intended for design and how projects can smooth the transfer of tacit knowledge to using an algorithm. We have proposed a framework when specifying machine vision systems. This framework utilizes varying levels of automation as contingency planning to reduce data processing complexity. Using data assists in extracting tacit knowledge from those who can perform the manual tasks to assist design the system; this means that real data from the system is always referenced and minimizes errors between participating parties. We propose using three indicators to know if the project has a high risk of failing to meet requirements related to accuracy and reliability. All systems tested achieved a better integration into operations after applying the framework.Keywords: automation, contingency planning, continuous engineering, control theory, machine vision, system requirements, system thinking
Procedia PDF Downloads 20923973 Wreathed Hornbill (Rhyticeros undulatus) on Mount Ungaran: Are their Habitat Threatened?
Authors: Margareta Rahayuningsih, Nugroho Edi K., Siti Alimah
Abstract:
Wreathed Hornbill (Rhyticeros undulatus) is the one of hornbill species (Family: Bucerotidae) that found on Mount Ungaran. In the preservation or planning in situ conservation of Wreathed Hornbill require the habitat condition data. The objective of the research was to determine the land cover change on Mount Ungaran using satellite image data and GIS. Based on the land cover data on 1999-2009 the research showed that the primer forest on Mount Ungaran was decreased almost 50%, while the seconder forest, tea and coffee plantation, and the settlement were increased.Keywords: GIS, Mount Ungaran, threatened habitat, Wreathed Hornbill (Rhyticeros undulatus)
Procedia PDF Downloads 36123972 Performance Comparison of ADTree and Naive Bayes Algorithms for Spam Filtering
Authors: Thanh Nguyen, Andrei Doncescu, Pierre Siegel
Abstract:
Classification is an important data mining technique and could be used as data filtering in artificial intelligence. The broad application of classification for all kind of data leads to be used in nearly every field of our modern life. Classification helps us to put together different items according to the feature items decided as interesting and useful. In this paper, we compare two classification methods Naïve Bayes and ADTree use to detect spam e-mail. This choice is motivated by the fact that Naive Bayes algorithm is based on probability calculus while ADTree algorithm is based on decision tree. The parameter settings of the above classifiers use the maximization of true positive rate and minimization of false positive rate. The experiment results present classification accuracy and cost analysis in view of optimal classifier choice for Spam Detection. It is point out the number of attributes to obtain a tradeoff between number of them and the classification accuracy.Keywords: classification, data mining, spam filtering, naive bayes, decision tree
Procedia PDF Downloads 41323971 Mapping of Electrical Energy Consumption Yogyakarta Province in 2014-2025
Authors: Alfi Al Fahreizy
Abstract:
Yogyakarta is one of the provinces in Indonesia that often get a power outage because of high load electrical consumption. The authors mapped the electrical energy consumption [GWh] for the province of Yogyakarta in 2014-2025 using LEAP (Long-range Energy Alternatives Planning system) software. This paper use BAU (Business As Usual) scenario. BAU scenario in which the projection is based on the assumption that growth in electricity consumption will run as normally as before. The goal is to be able to see the electrical energy consumption in the household sector, industry , business, social, government office building, and street lighting. The data is the data projected statistical population and consumption data electricity [GWh] 2010, 2011, 2012 in Yogyakarta province.Keywords: LEAP, energy consumption, Yogyakarta, BAU
Procedia PDF Downloads 59923970 Research and Application of Multi-Scale Three Dimensional Plant Modeling
Authors: Weiliang Wen, Xinyu Guo, Ying Zhang, Jianjun Du, Boxiang Xiao
Abstract:
Reconstructing and analyzing three-dimensional (3D) models from situ measured data is important for a number of researches and applications in plant science, including plant phenotyping, functional-structural plant modeling (FSPM), plant germplasm resources protection, agricultural technology popularization. It has many scales like cell, tissue, organ, plant and canopy from micro to macroscopic. The techniques currently used for data capture, feature analysis, and 3D reconstruction are quite different of different scales. In this context, morphological data acquisition, 3D analysis and modeling of plants on different scales are introduced systematically. The commonly used data capture equipment for these multiscale is introduced. Then hot issues and difficulties of different scales are described respectively. Some examples are also given, such as Micron-scale phenotyping quantification and 3D microstructure reconstruction of vascular bundles within maize stalks based on micro-CT scanning, 3D reconstruction of leaf surfaces and feature extraction from point cloud acquired by using 3D handheld scanner, plant modeling by combining parameter driven 3D organ templates. Several application examples by using the 3D models and analysis results of plants are also introduced. A 3D maize canopy was constructed, and light distribution was simulated within the canopy, which was used for the designation of ideal plant type. A grape tree model was constructed from 3D digital and point cloud data, which was used for the production of science content of 11th international conference on grapevine breeding and genetics. By using the tissue models of plants, a Google glass was used to look around visually inside the plant to understand the internal structure of plants. With the development of information technology, 3D data acquisition, and data processing techniques will play a greater role in plant science.Keywords: plant, three dimensional modeling, multi-scale, plant phenotyping, three dimensional data acquisition
Procedia PDF Downloads 278