Search results for: startup data analytics
25182 Local Binary Patterns-Based Statistical Data Analysis for Accurate Soccer Match Prediction
Authors: Mohammad Ghahramani, Fahimeh Saei Manesh
Abstract:
Winning a soccer game is based on thorough and deep analysis of the ongoing match. On the other hand, giant gambling companies are in vital need of such analysis to reduce their loss against their customers. In this research work, we perform deep, real-time analysis on every soccer match around the world that distinguishes our work from others by focusing on particular seasons, teams and partial analytics. Our contributions are presented in the platform called “Analyst Masters.” First, we introduce various sources of information available for soccer analysis for teams around the world that helped us record live statistical data and information from more than 50,000 soccer matches a year. Our second and main contribution is to introduce our proposed in-play performance evaluation. The third contribution is developing new features from stable soccer matches. The statistics of soccer matches and their odds before and in-play are considered in the image format versus time including the halftime. Local Binary patterns, (LBP) is then employed to extract features from the image. Our analyses reveal incredibly interesting features and rules if a soccer match has reached enough stability. For example, our “8-minute rule” implies if 'Team A' scores a goal and can maintain the result for at least 8 minutes then the match would end in their favor in a stable match. We could also make accurate predictions before the match of scoring less/more than 2.5 goals. We benefit from the Gradient Boosting Trees, GBT, to extract highly related features. Once the features are selected from this pool of data, the Decision trees decide if the match is stable. A stable match is then passed to a post-processing stage to check its properties such as betters’ and punters’ behavior and its statistical data to issue the prediction. The proposed method was trained using 140,000 soccer matches and tested on more than 100,000 samples achieving 98% accuracy to select stable matches. Our database from 240,000 matches shows that one can get over 20% betting profit per month using Analyst Masters. Such consistent profit outperforms human experts and shows the inefficiency of the betting market. Top soccer tipsters achieve 50% accuracy and 8% monthly profit in average only on regional matches. Both our collected database of more than 240,000 soccer matches from 2012 and our algorithm would greatly benefit coaches and punters to get accurate analysis.Keywords: soccer, analytics, machine learning, database
Procedia PDF Downloads 23925181 Game Structure and Spatio-Temporal Action Detection in Soccer Using Graphs and 3D Convolutional Networks
Authors: Jérémie Ochin
Abstract:
Soccer analytics are built on two data sources: the frame-by-frame position of each player on the terrain and the sequences of events, such as ball drive, pass, cross, shot, throw-in... With more than 2000 ball-events per soccer game, their precise and exhaustive annotation, based on a monocular video stream such as a TV broadcast, remains a tedious and costly manual task. State-of-the-art methods for spatio-temporal action detection from a monocular video stream, often based on 3D convolutional neural networks, are close to reach levels of performances in mean Average Precision (mAP) compatibles with the automation of such task. Nevertheless, to meet their expectation of exhaustiveness in the context of data analytics, such methods must be applied in a regime of high recall – low precision, using low confidence score thresholds. This setting unavoidably leads to the detection of false positives that are the product of the well documented overconfidence behaviour of neural networks and, in this case, their limited access to contextual information and understanding of the game: their predictions are highly unstructured. Based on the assumption that professional soccer players’ behaviour, pose, positions and velocity are highly interrelated and locally driven by the player performing a ball-action, it is hypothesized that the addition of information regarding surrounding player’s appearance, positions and velocity in the prediction methods can improve their metrics. Several methods are compared to build a proper representation of the game surrounding a player, from handcrafted features of the local graph, based on domain knowledge, to the use of Graph Neural Networks trained in an end-to-end fashion with existing state-of-the-art 3D convolutional neural networks. It is shown that the inclusion of information regarding surrounding players helps reaching higher metrics.Keywords: fine-grained action recognition, human action recognition, convolutional neural networks, graph neural networks, spatio-temporal action recognition
Procedia PDF Downloads 2925180 A Methodology of Using Fuzzy Logics and Data Analytics to Estimate the Life Cycle Indicators of Solar Photovoltaics
Authors: Thor Alexis Sazon, Alexander Guzman-Urbina, Yasuhiro Fukushima
Abstract:
This study outlines the method of how to develop a surrogate life cycle model based on fuzzy logic using three fuzzy inference methods: (1) the conventional Fuzzy Inference System (FIS), (2) the hybrid system of Data Analytics and Fuzzy Inference (DAFIS), which uses data clustering for defining the membership functions, and (3) the Adaptive-Neuro Fuzzy Inference System (ANFIS), a combination of fuzzy inference and artificial neural network. These methods were demonstrated with a case study where the Global Warming Potential (GWP) and the Levelized Cost of Energy (LCOE) of solar photovoltaic (PV) were estimated using Solar Irradiation, Module Efficiency, and Performance Ratio as inputs. The effects of using different fuzzy inference types, either Sugeno- or Mamdani-type, and of changing the number of input membership functions to the error between the calibration data and the model-generated outputs were also illustrated. The solution spaces of the three methods were consequently examined with a sensitivity analysis. ANFIS exhibited the lowest error while DAFIS gave slightly lower errors compared to FIS. Increasing the number of input membership functions helped with error reduction in some cases but, at times, resulted in the opposite. Sugeno-type models gave errors that are slightly lower than those of the Mamdani-type. While ANFIS is superior in terms of error minimization, it could generate solutions that are questionable, i.e. the negative GWP values of the Solar PV system when the inputs were all at the upper end of their range. This shows that the applicability of the ANFIS models highly depends on the range of cases at which it was calibrated. FIS and DAFIS generated more intuitive trends in the sensitivity runs. DAFIS demonstrated an optimal design point wherein increasing the input values does not improve the GWP and LCOE anymore. In the absence of data that could be used for calibration, conventional FIS presents a knowledge-based model that could be used for prediction. In the PV case study, conventional FIS generated errors that are just slightly higher than those of DAFIS. The inherent complexity of a Life Cycle study often hinders its widespread use in the industry and policy-making sectors. While the methodology does not guarantee a more accurate result compared to those generated by the Life Cycle Methodology, it does provide a relatively simpler way of generating knowledge- and data-based estimates that could be used during the initial design of a system.Keywords: solar photovoltaic, fuzzy logic, inference system, artificial neural networks
Procedia PDF Downloads 16725179 Competitive DNA Calibrators as Quality Reference Standards (QRS™) for Germline and Somatic Copy Number Variations/Variant Allelic Frequencies Analyses
Authors: Eirini Konstanta, Cedric Gouedard, Aggeliki Delimitsou, Stefania Patera, Samuel Murray
Abstract:
Introduction: Quality reference DNA standards (QRS) for molecular testing by next-generation sequencing (NGS) are essential for accurate quantitation of copy number variations (CNV) for germline and variant allelic frequencies (VAF) for somatic analyses. Objectives: Presently, several molecular analytics for oncology patients are reliant upon quantitative metrics. Test validation and standardisation are also reliant upon the availability of surrogate control materials allowing for understanding test LOD (limit of detection), sensitivity, specificity. We have developed a dual calibration platform allowing for QRS pairs to be included in analysed DNA samples, allowing for accurate quantitation of CNV and VAF metrics within and between patient samples. Methods: QRS™ blocks up to 500nt were designed for common NGS panel targets incorporating ≥ 2 identification tags (IDTDNA.com). These were analysed upon spiking into gDNA, somatic, and ctDNA using a proprietary CalSuite™ platform adaptable to common LIMS. Results: We demonstrate QRS™ calibration reproducibility spiked to 5–25% at ± 2.5% in gDNA and ctDNA. Furthermore, we demonstrate CNV and VAF within and between samples (gDNA and ctDNA) with the same reproducibility (± 2.5%) in a clinical sample of lung cancer and HBOC (EGFR and BRCA1, respectively). CNV analytics was performed with similar accuracy using a single pair of QRS calibrators when using multiple single targeted sequencing controls. Conclusion: Dual paired QRS™ calibrators allow for accurate and reproducible quantitative analyses of CNV, VAF, intrinsic sample allele measurement, inter and intra-sample measure not only simplifying NGS analytics but allowing for monitoring clinically relevant biomarker VAF across patient ctDNA samples with improved accuracy.Keywords: calibrator, CNV, gene copy number, VAF
Procedia PDF Downloads 15325178 Machine Learning Facing Behavioral Noise Problem in an Imbalanced Data Using One Side Behavioral Noise Reduction: Application to a Fraud Detection
Authors: Salma El Hajjami, Jamal Malki, Alain Bouju, Mohammed Berrada
Abstract:
With the expansion of machine learning and data mining in the context of Big Data analytics, the common problem that affects data is class imbalance. It refers to an imbalanced distribution of instances belonging to each class. This problem is present in many real world applications such as fraud detection, network intrusion detection, medical diagnostics, etc. In these cases, data instances labeled negatively are significantly more numerous than the instances labeled positively. When this difference is too large, the learning system may face difficulty when tackling this problem, since it is initially designed to work in relatively balanced class distribution scenarios. Another important problem, which usually accompanies these imbalanced data, is the overlapping instances between the two classes. It is commonly referred to as noise or overlapping data. In this article, we propose an approach called: One Side Behavioral Noise Reduction (OSBNR). This approach presents a way to deal with the problem of class imbalance in the presence of a high noise level. OSBNR is based on two steps. Firstly, a cluster analysis is applied to groups similar instances from the minority class into several behavior clusters. Secondly, we select and eliminate the instances of the majority class, considered as behavioral noise, which overlap with behavior clusters of the minority class. The results of experiments carried out on a representative public dataset confirm that the proposed approach is efficient for the treatment of class imbalances in the presence of noise.Keywords: machine learning, imbalanced data, data mining, big data
Procedia PDF Downloads 13225177 Data Monetisation by E-commerce Companies: A Need for a Regulatory Framework in India
Authors: Anushtha Saxena
Abstract:
This paper examines the process of data monetisation bye-commerce companies operating in India. Data monetisation is collecting, storing, and analysing consumers’ data to use further the data that is generated for profits, revenue, etc. Data monetisation enables e-commerce companies to get better businesses opportunities, innovative products and services, a competitive edge over others to the consumers, and generate millions of revenues. This paper analyses the issues and challenges that are faced due to the process of data monetisation. Some of the issues highlighted in the paper pertain to the right to privacy, protection of data of e-commerce consumers. At the same time, data monetisation cannot be prohibited, but it can be regulated and monitored by stringent laws and regulations. The right to privacy isa fundamental right guaranteed to the citizens of India through Article 21 of The Constitution of India. The Supreme Court of India recognized the Right to Privacy as a fundamental right in the landmark judgment of Justice K.S. Puttaswamy (Retd) and Another v. Union of India . This paper highlights the legal issue of how e-commerce businesses violate individuals’ right to privacy by using the data collected, stored by them for economic gains and monetisation and protection of data. The researcher has mainly focused on e-commerce companies like online shopping websitesto analyse the legal issue of data monetisation. In the Internet of Things and the digital age, people have shifted to online shopping as it is convenient, easy, flexible, comfortable, time-consuming, etc. But at the same time, the e-commerce companies store the data of their consumers and use it by selling to the third party or generating more data from the data stored with them. This violatesindividuals’ right to privacy because the consumers do not know anything while giving their data online. Many times, data is collected without the consent of individuals also. Data can be structured, unstructured, etc., that is used by analytics to monetise. The Indian legislation like The Information Technology Act, 2000, etc., does not effectively protect the e-consumers concerning their data and how it is used by e-commerce businesses to monetise and generate revenues from that data. The paper also examines the draft Data Protection Bill, 2021, pending in the Parliament of India, and how this Bill can make a huge impact on data monetisation. This paper also aims to study the European Union General Data Protection Regulation and how this legislation can be helpful in the Indian scenarioconcerning e-commerce businesses with respect to data monetisation.Keywords: data monetization, e-commerce companies, regulatory framework, GDPR
Procedia PDF Downloads 12025176 Advancing in Cricket Analytics: Novel Approaches for Pitch and Ball Detection Employing OpenCV and YOLOV8
Authors: Pratham Madnur, Prathamkumar Shetty, Sneha Varur, Gouri Parashetti
Abstract:
In order to overcome conventional obstacles, this research paper investigates novel approaches for cricket pitch and ball detection that make use of cutting-edge technologies. The research integrates OpenCV for pitch inspection and modifies the YOLOv8 model for cricket ball detection in order to overcome the shortcomings of manual pitch assessment and traditional ball detection techniques. To ensure flexibility in a range of pitch environments, the pitch detection method leverages OpenCV’s color space transformation, contour extraction, and accurate color range defining features. Regarding ball detection, the YOLOv8 model emphasizes the preservation of minor object details to improve accuracy and is specifically trained to the unique properties of cricket balls. The methods are more reliable because of the careful preparation of the datasets, which include novel ball and pitch information. These cutting-edge methods not only improve cricket analytics but also set the stage for flexible methods in more general sports technology applications.Keywords: OpenCV, YOLOv8, cricket, custom dataset, computer vision, sports
Procedia PDF Downloads 8425175 Commercial Automobile Insurance: A Practical Approach of the Generalized Additive Model
Authors: Nicolas Plamondon, Stuart Atkinson, Shuzi Zhou
Abstract:
The insurance industry is usually not the first topic one has in mind when thinking about applications of data science. However, the use of data science in the finance and insurance industry is growing quickly for several reasons, including an abundance of reliable customer data, ferocious competition requiring more accurate pricing, etc. Among the top use cases of data science, we find pricing optimization, customer segmentation, customer risk assessment, fraud detection, marketing, and triage analytics. The objective of this paper is to present an application of the generalized additive model (GAM) on a commercial automobile insurance product: an individually rated commercial automobile. These are vehicles used for commercial purposes, but for which there is not enough volume to apply pricing to several vehicles at the same time. The GAM model was selected as an improvement over GLM for its ease of use and its wide range of applications. The model was trained using the largest split of the data to determine model parameters. The remaining part of the data was used as testing data to verify the quality of the modeling activity. We used the Gini coefficient to evaluate the performance of the model. For long-term monitoring, commonly used metrics such as RMSE and MAE will be used. Another topic of interest in the insurance industry is to process of producing the model. We will discuss at a high level the interactions between the different teams with an insurance company that needs to work together to produce a model and then monitor the performance of the model over time. Moreover, we will discuss the regulations in place in the insurance industry. Finally, we will discuss the maintenance of the model and the fact that new data does not come constantly and that some metrics can take a long time to become meaningful.Keywords: insurance, data science, modeling, monitoring, regulation, processes
Procedia PDF Downloads 7625174 A Text Classification Approach Based on Natural Language Processing and Machine Learning Techniques
Authors: Rim Messaoudi, Nogaye-Gueye Gning, François Azelart
Abstract:
Automatic text classification applies mostly natural language processing (NLP) and other AI-guided techniques to automatically classify text in a faster and more accurate manner. This paper discusses the subject of using predictive maintenance to manage incident tickets inside the sociality. It focuses on proposing a tool that treats and analyses comments and notes written by administrators after resolving an incident ticket. The goal here is to increase the quality of these comments. Additionally, this tool is based on NLP and machine learning techniques to realize the textual analytics of the extracted data. This approach was tested using real data taken from the French National Railways (SNCF) company and was given a high-quality result.Keywords: machine learning, text classification, NLP techniques, semantic representation
Procedia PDF Downloads 10325173 BIM Data and Digital Twin Framework: Preserving the Past and Predicting the Future
Authors: Mazharuddin Syed Ahmed
Abstract:
This research presents a framework used to develop The Ara Polytechnic College of Architecture Studies building “Kahukura” which is Green Building certified. This framework integrates the development of a smart building digital twin by utilizing Building Information Modelling (BIM) and its BIM maturity levels, including Levels of Development (LOD), eight dimensions of BIM, Heritage-BIM (H-BIM) and Facility Management BIM (FM BIM). The research also outlines a structured approach to building performance analysis and integration with the circular economy, encapsulated within a five-level digital twin framework. Starting with Level 1, the Descriptive Twin provides a live, editable visual replica of the built asset, allowing for specific data inclusion and extraction. Advancing to Level 2, the Informative Twin integrates operational and sensory data, enhancing data verification and system integration. At Level 3, the Predictive Twin utilizes operational data to generate insights and proactive management suggestions. Progressing to Level 4, the Comprehensive Twin simulates future scenarios, enabling robust “what-if” analyses. Finally, Level 5, the Autonomous Twin, represents the pinnacle of digital twin evolution, capable of learning and autonomously acting on behalf of users.Keywords: building information modelling, circular economy integration, digital twin, predictive analytics
Procedia PDF Downloads 4425172 Post Pandemic Mobility Analysis through Indexing and Sharding in MongoDB: Performance Optimization and Insights
Authors: Karan Vishavjit, Aakash Lakra, Shafaq Khan
Abstract:
The COVID-19 pandemic has pushed healthcare professionals to use big data analytics as a vital tool for tracking and evaluating the effects of contagious viruses. To effectively analyze huge datasets, efficient NoSQL databases are needed. The analysis of post-COVID-19 health and well-being outcomes and the evaluation of the effectiveness of government efforts during the pandemic is made possible by this research’s integration of several datasets, which cuts down on query processing time and creates predictive visual artifacts. We recommend applying sharding and indexing technologies to improve query effectiveness and scalability as the dataset expands. Effective data retrieval and analysis are made possible by spreading the datasets into a sharded database and doing indexing on individual shards. Analysis of connections between governmental activities, poverty levels, and post-pandemic well being is the key goal. We want to evaluate the effectiveness of governmental initiatives to improve health and lower poverty levels. We will do this by utilising advanced data analysis and visualisations. The findings provide relevant data that supports the advancement of UN sustainable objectives, future pandemic preparation, and evidence-based decision-making. This study shows how Big Data and NoSQL databases may be used to address problems with global health.Keywords: big data, COVID-19, health, indexing, NoSQL, sharding, scalability, well being
Procedia PDF Downloads 7125171 Predictive Analytics in Traffic Flow Management: Integrating Temporal Dynamics and Traffic Characteristics to Estimate Travel Time
Authors: Maria Ezziani, Rabie Zine, Amine Amar, Ilhame Kissani
Abstract:
This paper introduces a predictive model for urban transportation engineering, which is vital for efficient traffic management. Utilizing comprehensive datasets and advanced statistical techniques, the model accurately forecasts travel times by considering temporal variations and traffic dynamics. Machine learning algorithms, including regression trees and neural networks, are employed to capture sequential dependencies. Results indicate significant improvements in predictive accuracy, particularly during peak hours and holidays, with the incorporation of traffic flow and speed variables. Future enhancements may integrate weather conditions and traffic incidents. The model's applications range from adaptive traffic management systems to route optimization algorithms, facilitating congestion reduction and enhancing journey reliability. Overall, this research extends beyond travel time estimation, offering insights into broader transportation planning and policy-making realms, empowering stakeholders to optimize infrastructure utilization and improve network efficiency.Keywords: predictive analytics, traffic flow, travel time estimation, urban transportation, machine learning, traffic management
Procedia PDF Downloads 8525170 Critically Analyzing the Application of Big Data for Smart Transportation: A Case Study of Mumbai
Authors: Tanuj Joshi
Abstract:
Smart transportation is fast emerging as a solution to modern cities’ approach mobility issues, delayed emergency response rate and high congestion on streets. Present day scenario with Google Maps, Waze, Yelp etc. demonstrates how information and communications technologies controls the intelligent transportation system. This intangible and invisible infrastructure is largely guided by the big data analytics. On the other side, the exponential increase in Indian urban population has intensified the demand for better services and infrastructure to satisfy the transportation needs of its citizens. No doubt, India’s huge internet usage is looked as an important resource to guide to achieve this. However, with a projected number of over 40 billion objects connected to the Internet by 2025, the need for systems to handle massive volume of data (big data) also arises. This research paper attempts to identify the ways of exploiting the big data variables which will aid commuters on Indian tracks. This study explores real life inputs by conducting survey and interviews to identify which gaps need to be targeted to better satisfy the customers. Several experts at Mumbai Metropolitan Region Development Authority (MMRDA), Mumbai Metro and Brihanmumbai Electric Supply and Transport (BEST) were interviewed regarding the Information Technology (IT) systems currently in use. The interviews give relevant insights and requirements into the workings of public transportation systems whereas the survey investigates the macro situation.Keywords: smart transportation, mobility issue, Mumbai transportation, big data, data analysis
Procedia PDF Downloads 17925169 The Use of Rule-Based Cellular Automata to Track and Forecast the Dispersal of Classical Biocontrol Agents at Scale, with an Application to the Fopius arisanus Fruit Fly Parasitoid
Authors: Agboka Komi Mensah, John Odindi, Elfatih M. Abdel-Rahman, Onisimo Mutanga, Henri Ez Tonnang
Abstract:
Ecosystems are networks of organisms and populations that form a community of various species interacting within their habitats. Such habitats are defined by abiotic and biotic conditions that establish the initial limits to a population's growth, development, and reproduction. The habitat’s conditions explain the context in which species interact to access resources such as food, water, space, shelter, and mates, allowing for feeding, dispersal, and reproduction. Dispersal is an essential life-history strategy that affects gene flow, resource competition, population dynamics, and species distributions. Despite the importance of dispersal in population dynamics and survival, understanding the mechanism underpinning the dispersal of organisms remains challenging. For instance, when an organism moves into an ecosystem for survival and resource competition, its progression is highly influenced by extrinsic factors such as its physiological state, climatic variables and ability to evade predation. Therefore, greater spatial detail is necessary to understand organism dispersal dynamics. Understanding organisms dispersal can be addressed using empirical and mechanistic modelling approaches, with the adopted approach depending on the study's purpose Cellular automata (CA) is an example of these approaches that have been successfully used in biological studies to analyze the dispersal of living organisms. Cellular automata can be briefly described as occupied cells by an individual that evolves based on proper decisions based on a set of neighbours' rules. However, in the ambit of modelling individual organisms dispersal at the landscape scale, we lack user friendly tools that do not require expertise in mathematical models and computing ability; such as a visual analytics framework for tracking and forecasting the dispersal behaviour of organisms. The term "visual analytics" (VA) describes a semiautomated approach to electronic data processing that is guided by users who can interact with data via an interface. Essentially, VA converts large amounts of quantitative or qualitative data into graphical formats that can be customized based on the operator's needs. Additionally, this approach can be used to enhance the ability of users from various backgrounds to understand data, communicate results, and disseminate information across a wide range of disciplines. To support effective analysis of the dispersal of organisms at the landscape scale, we therefore designed Pydisp which is a free visual data analytics tool for spatiotemporal dispersal modeling built in Python. Its user interface allows users to perform a quick and interactive spatiotemporal analysis of species dispersal using bioecological and climatic data. Pydisp enables reuse and upgrade through the use of simple principles such as Fuzzy cellular automata algorithms. The potential of dispersal modeling is demonstrated in a case study by predicting the dispersal of Fopius arisanus (Sonan), endoparasitoids to control Bactrocera dorsalis (Hendel) (Diptera: Tephritidae) in Kenya. The results obtained from our example clearly illustrate the parasitoid's dispersal process at the landscape level and confirm that dynamic processes in an agroecosystem are better understood when designed using mechanistic modelling approaches. Furthermore, as demonstrated in the example, the built software is highly effective in portraying the dispersal of organisms despite the unavailability of detailed data on the species dispersal mechanisms.Keywords: cellular automata, fuzzy logic, landscape, spatiotemporal
Procedia PDF Downloads 7925168 Framework to Quantify Customer Experience
Authors: Anant Sharma, Ashwin Rajan
Abstract:
Customer experience is measured today based on defining a set of metrics and KPIs, setting up thresholds and defining triggers across those thresholds. While this is an effective way of measuring against a Key Performance Indicator ( referred to as KPI in the rest of the paper ), this approach cannot capture the various nuances that make up the overall customer experience. Customers consume a product or service at various levels, which is not reflected in metrics like Customer Satisfaction or Net Promoter Score, but also across other measurements like recurring revenue, frequency of service usage, e-learning and depth of usage. Here we explore an alternative method of measuring customer experience by flipping the traditional views. Rather than rolling customers up to a metric, we roll up metrics to hierarchies and then measure customer experience. This method allows any team to quantify customer experience across multiple touchpoints in a customer’s journey. We make use of various data sources which contain information for metrics like CXSAT, NPS, Renewals, and depths of service usage collected across a customer lifecycle. This data can be mined systematically to get linkages between different data points like geographies, business groups, products and time. Additional views can be generated by blending synthetic contexts into the data to show trends and top/bottom types of reports. We have created a framework that allows us to measure customer experience using the above logic.Keywords: analytics, customers experience, BI, business operations, KPIs, metrics
Procedia PDF Downloads 7525167 Wealth Creation and its Externalities: Evaluating Economic Growth and Corporate Social Responsibility
Authors: Zhikang Rong
Abstract:
The 4th industrial revolution has introduced technologies like interconnectivity, machine learning, and real-time big data analytics that improve operations and business efficiency. This paper examines how these advancements have led to a concentration of wealth, specifically among the top 1%, and investigates whether this wealth provides value to society. Through analyzing impacts on employment, productivity, supply-demand dynamics, and potential externalities, it is shown that successful businesspeople, by enhancing productivity and creating jobs, contribute positively to long-term economic growth. Additionally, externalities such as environmental degradation are managed by social entrepreneurship and government policies.Keywords: wealth creation, employment, productivity, social entrepreneurship
Procedia PDF Downloads 3325166 Gradient Boosted Trees on Spark Platform for Supervised Learning in Health Care Big Data
Authors: Gayathri Nagarajan, L. D. Dhinesh Babu
Abstract:
Health care is one of the prominent industries that generate voluminous data thereby finding the need of machine learning techniques with big data solutions for efficient processing and prediction. Missing data, incomplete data, real time streaming data, sensitive data, privacy, heterogeneity are few of the common challenges to be addressed for efficient processing and mining of health care data. In comparison with other applications, accuracy and fast processing are of higher importance for health care applications as they are related to the human life directly. Though there are many machine learning techniques and big data solutions used for efficient processing and prediction in health care data, different techniques and different frameworks are proved to be effective for different applications largely depending on the characteristics of the datasets. In this paper, we present a framework that uses ensemble machine learning technique gradient boosted trees for data classification in health care big data. The framework is built on Spark platform which is fast in comparison with other traditional frameworks. Unlike other works that focus on a single technique, our work presents a comparison of six different machine learning techniques along with gradient boosted trees on datasets of different characteristics. Five benchmark health care datasets are considered for experimentation, and the results of different machine learning techniques are discussed in comparison with gradient boosted trees. The metric chosen for comparison is misclassification error rate and the run time of the algorithms. The goal of this paper is to i) Compare the performance of gradient boosted trees with other machine learning techniques in Spark platform specifically for health care big data and ii) Discuss the results from the experiments conducted on datasets of different characteristics thereby drawing inference and conclusion. The experimental results show that the accuracy is largely dependent on the characteristics of the datasets for other machine learning techniques whereas gradient boosting trees yields reasonably stable results in terms of accuracy without largely depending on the dataset characteristics.Keywords: big data analytics, ensemble machine learning, gradient boosted trees, Spark platform
Procedia PDF Downloads 24125165 The Role of Business Incubation Centers (BICS) in Fostering Entrepreneurship in Pakistani Universities
Authors: Shah Hussain Awan
Abstract:
This study investigates the role of Business Incubation Centers (BICs) in fostering entrepreneurship in Pakistani universities. The high failure rate in new startups around the world has opened a challenging discussion. Though encouraging steps and strategies have been taken to overcome these challenges, still more aggressive action needs to be taken. However, Pakistan is one of the countries that promote entrepreneurship through BICs. The purpose of the present study is to develop a conceptual model that assesses the moderating impact of government policy on entrepreneurship development through business incubation centers of public and private universities. This area is under-researched, particularly in the context of Pakistan; therefore, this research may contribute to the existing body of knowledge to appraise the industry in Pakistan. The data collection procedure included a survey of stakeholders from private and public universities in Pakistan and was analyzed by SmartPLS 4.0. The findings show that Business incubation centers and government support have a significant impact on entrepreneurship development in Pakistan. Moderation analysis showed that for business incubation centers to be successful and effective, the government needs to implement efficient policies to inculcate entrepreneurship.Keywords: business incubation centers, government support, policies, startup performance, entrepreneurship development
Procedia PDF Downloads 5725164 Earnings Management from Taiwan Gisa Firms
Authors: An-an Chiu, Shaio Yan Huang, Ling-Na Chen, Wei-Hua Lin
Abstract:
Research has primarily focused on listed companies, less is done regarding small and medium-sized enterprises. Under the authorities' support, Taipei Exchange (TPEx) started Go Incubation Board for Startup and Acceleration Firms (GISA) in January 2014. This platform is designed to help small-sized innovative companies grow and to enter the capital market in the future. This research yield insight into earnings management activities around seasoned equity offerings (SEO) based on Taiwan’s GISA firms and the effectiveness of external corporate governance. Data for the study come from the GISA Market Observation Post System from January 2014 to December 2016. The result finds that GISA firms prone to upward accrual-based earnings management during SEO to avoid long-term negative consequences. Especially, firms with paid-in capital more than NT$ 30 million, higher fundraising amounts, or smaller-sized firms, tend to increase discretionary accruals. Finally, consistent with prior literature, CPA firms effectively serve as the role of external corporate governances on mitigating earnings management.Keywords: GISA, earnings management, CPA, seasoned equity offerings
Procedia PDF Downloads 14125163 Status Report of the GERDA Phase II Startup
Authors: Valerio D’Andrea
Abstract:
The GERmanium Detector Array (GERDA) experiment, located at the Laboratori Nazionali del Gran Sasso (LNGS) of INFN, searches for 0νββ of 76Ge. Germanium diodes enriched to ∼ 86 % in the double beta emitter 76Ge(enrGe) are exposed being both source and detectors of 0νββ decay. Neutrinoless double beta decay is considered a powerful probe to address still open issues in the neutrino sector of the (beyond) Standard Model of particle Physics. Since 2013, just after the completion of the first part of its experimental program (Phase I), the GERDA setup has been upgraded to perform its next step in the 0νββ searches (Phase II). Phase II aims to reach a sensitivity to the 0νββ decay half-life larger than 1026 yr in about 3 years of physics data taking. This exposing a detector mass of about 35 kg of enrGe and with a background index of about 10^−3 cts/(keV·kg·yr). One of the main new implementations is the liquid argon scintillation light read-out, to veto those events that only partially deposit their energy both in Ge and in the surrounding LAr. In this paper, the GERDA Phase II expected goals, the upgrade work and few selected features from the 2015 commissioning and 2016 calibration runs will be presented. The main Phase I achievements will be also reviewed.Keywords: gerda, double beta decay, LNGS, germanium
Procedia PDF Downloads 36825162 Ranking Priorities for Digital Health in Portugal: Aligning Health Managers’ Perceptions with Official Policy Perspectives
Authors: Pedro G. Rodrigues, Maria J. Bárrios, Sara A. Ambrósio
Abstract:
The digitalisation of health is a profoundly transformative economic, political, and social process. As is often the case, such processes need to be carefully managed if misunderstandings, policy misalignments, or outright conflicts between the government and a wide gamut of stakeholders with competing interests are to be avoided. Thus, ensuring open lines of communication where all parties know what each other’s concerns are is key to good governance, as well as efficient and effective policymaking. This project aims to make a small but still significant contribution in this regard in that we seek to determine the extent to which health managers’ perceptions of what is a priority for digital health in Portugal are aligned with official policy perspectives. By applying state-of-the-art artificial intelligence technology first to the indexed literature on digital health and then to a set of official policy documents on the same topic, followed by a survey directed at health managers working in public and private hospitals in Portugal, we obtain two priority rankings that, when compared, will allow us to produce a synthesis and toolkit on digital health policy in Portugal, with a view to identifying areas of policy convergence and divergence. This project is also particularly peculiar in the sense that sophisticated digital methods related to text analytics are employed to study good governance aspects of digitalisation applied to health care.Keywords: digital health, health informatics, text analytics, governance, natural language understanding
Procedia PDF Downloads 6625161 Predicting Loss of Containment in Surface Pipeline using Computational Fluid Dynamics and Supervised Machine Learning Model to Improve Process Safety in Oil and Gas Operations
Authors: Muhammmad Riandhy Anindika Yudhy, Harry Patria, Ramadhani Santoso
Abstract:
Loss of containment is the primary hazard that process safety management is concerned within the oil and gas industry. Escalation to more serious consequences all begins with the loss of containment, starting with oil and gas release from leakage or spillage from primary containment resulting in pool fire, jet fire and even explosion when reacted with various ignition sources in the operations. Therefore, the heart of process safety management is avoiding loss of containment and mitigating its impact through the implementation of safeguards. The most effective safeguard for the case is an early detection system to alert Operations to take action prior to a potential case of loss of containment. The detection system value increases when applied to a long surface pipeline that is naturally difficult to monitor at all times and is exposed to multiple causes of loss of containment, from natural corrosion to illegal tapping. Based on prior researches and studies, detecting loss of containment accurately in the surface pipeline is difficult. The trade-off between cost-effectiveness and high accuracy has been the main issue when selecting the traditional detection method. The current best-performing method, Real-Time Transient Model (RTTM), requires analysis of closely positioned pressure, flow and temperature (PVT) points in the pipeline to be accurate. Having multiple adjacent PVT sensors along the pipeline is expensive, hence generally not a viable alternative from an economic standpoint.A conceptual approach to combine mathematical modeling using computational fluid dynamics and a supervised machine learning model has shown promising results to predict leakage in the pipeline. Mathematical modeling is used to generate simulation data where this data is used to train the leak detection and localization models. Mathematical models and simulation software have also been shown to provide comparable results with experimental data with very high levels of accuracy. While the supervised machine learning model requires a large training dataset for the development of accurate models, mathematical modeling has been shown to be able to generate the required datasets to justify the application of data analytics for the development of model-based leak detection systems for petroleum pipelines. This paper presents a review of key leak detection strategies for oil and gas pipelines, with a specific focus on crude oil applications, and presents the opportunities for the use of data analytics tools and mathematical modeling for the development of robust real-time leak detection and localization system for surface pipelines. A case study is also presented.Keywords: pipeline, leakage, detection, AI
Procedia PDF Downloads 19325160 Enhancing Predictive Accuracy in Pharmaceutical Sales through an Ensemble Kernel Gaussian Process Regression Approach
Authors: Shahin Mirshekari, Mohammadreza Moradi, Hossein Jafari, Mehdi Jafari, Mohammad Ensaf
Abstract:
This research employs Gaussian Process Regression (GPR) with an ensemble kernel, integrating Exponential Squared, Revised Matern, and Rational Quadratic kernels to analyze pharmaceutical sales data. Bayesian optimization was used to identify optimal kernel weights: 0.76 for Exponential Squared, 0.21 for Revised Matern, and 0.13 for Rational Quadratic. The ensemble kernel demonstrated superior performance in predictive accuracy, achieving an R² score near 1.0, and significantly lower values in MSE, MAE, and RMSE. These findings highlight the efficacy of ensemble kernels in GPR for predictive analytics in complex pharmaceutical sales datasets.Keywords: Gaussian process regression, ensemble kernels, bayesian optimization, pharmaceutical sales analysis, time series forecasting, data analysis
Procedia PDF Downloads 7125159 Insight-Based Evaluation of a Map-Based Dashboard
Authors: Anna Fredriksson Häägg, Charlotte Weil, Niklas Rönnberg
Abstract:
Map-based dashboards are used for data exploration every day. The present study used an insight-based methodology for evaluating a map-based dashboard that presents research findings of water management and ecosystem services in the Amazon. In addition to analyzing the insights gained from using the dashboard, the evaluation method was compared to standardized questionnaires and task-based evaluations. The result suggests that the dashboard enabled the participants to gain domain-relevant, complex insights regarding the topic presented. Furthermore, the insight-based analysis highlighted unexpected insights and hypotheses regarding causes and potential adaptation strategies for remediation. Although time- and resource-consuming, the insight-based methodology was shown to have the potential of thoroughly analyzing how end users can utilize map-based dashboards for data exploration and decision making. Finally, the insight-based methodology is argued to evaluate tools in scenarios more similar to real-life usage compared to task-based evaluation methods.Keywords: visual analytics, dashboard, insight-based evaluation, geographic visualization
Procedia PDF Downloads 11625158 Q-Map: Clinical Concept Mining from Clinical Documents
Authors: Sheikh Shams Azam, Manoj Raju, Venkatesh Pagidimarri, Vamsi Kasivajjala
Abstract:
Over the past decade, there has been a steep rise in the data-driven analysis in major areas of medicine, such as clinical decision support system, survival analysis, patient similarity analysis, image analytics etc. Most of the data in the field are well-structured and available in numerical or categorical formats which can be used for experiments directly. But on the opposite end of the spectrum, there exists a wide expanse of data that is intractable for direct analysis owing to its unstructured nature which can be found in the form of discharge summaries, clinical notes, procedural notes which are in human written narrative format and neither have any relational model nor any standard grammatical structure. An important step in the utilization of these texts for such studies is to transform and process the data to retrieve structured information from the haystack of irrelevant data using information retrieval and data mining techniques. To address this problem, the authors present Q-Map in this paper, which is a simple yet robust system that can sift through massive datasets with unregulated formats to retrieve structured information aggressively and efficiently. It is backed by an effective mining technique which is based on a string matching algorithm that is indexed on curated knowledge sources, that is both fast and configurable. The authors also briefly examine its comparative performance with MetaMap, one of the most reputed tools for medical concepts retrieval and present the advantages the former displays over the latter.Keywords: information retrieval, unified medical language system, syntax based analysis, natural language processing, medical informatics
Procedia PDF Downloads 13525157 The Impact of AI on Higher Education
Authors: Georges Bou Ghantous
Abstract:
This literature review examines the transformative impact of Artificial Intelligence (AI) on higher education, highlighting both the potential benefits and challenges associated with its adoption. The review reveals that AI significantly enhances personalized learning by tailoring educational experiences to individual student needs, thereby boosting engagement and learning outcomes. Automated grading systems streamline assessment processes, allowing educators to focus on improving instructional quality and student interaction. AI's data-driven insights provide valuable analytics, helping educators identify trends in at-risk students and refine teaching strategies. Moreover, AI promotes enhanced instructional innovation through the adoption of advanced teaching methods and technologies, enriching the educational environment. Administrative efficiency is also improved as AI automates routine tasks, freeing up time for educators to engage in research and curriculum development. However, the review also addresses the challenges that accompany AI integration, such as data privacy concerns, algorithmic bias, dependency on technology, reduced human interaction, and ethical dilemmas. This balanced exploration underscores the need for careful consideration of both the advantages and potential hurdles in the implementation of AI in higher education.Keywords: administrative efficiency, data-driven insights, data privacy, ethical dilemmas, higher education, personalized learning
Procedia PDF Downloads 2825156 Application of Deep Learning Algorithms in Agriculture: Early Detection of Crop Diseases
Authors: Manaranjan Pradhan, Shailaja Grover, U. Dinesh Kumar
Abstract:
Farming community in India, as well as other parts of the world, is one of the highly stressed communities due to reasons such as increasing input costs (cost of seeds, fertilizers, pesticide), droughts, reduced revenue leading to farmer suicides. Lack of integrated farm advisory system in India adds to the farmers problems. Farmers need right information during the early stages of crop’s lifecycle to prevent damage and loss in revenue. In this paper, we use deep learning techniques to develop an early warning system for detection of crop diseases using images taken by farmers using their smart phone. The research work leads to building a smart assistant using analytics and big data which could help the farmers with early diagnosis of the crop diseases and corrective actions. The classical approach for crop disease management has been to identify diseases at crop level. Recently, ImageNet Classification using the convolutional neural network (CNN) has been successfully used to identify diseases at individual plant level. Our model uses convolution filters, max pooling, dense layers and dropouts (to avoid overfitting). The models are built for binary classification (healthy or not healthy) and multi class classification (identifying which disease). Transfer learning is used to modify the weights of parameters learnt through ImageNet dataset and apply them on crop diseases, which reduces number of epochs to learn. One shot learning is used to learn from very few images, while data augmentation techniques are used to improve accuracy with images taken from farms by using techniques such as rotation, zoom, shift and blurred images. Models built using combination of these techniques are more robust for deploying in the real world. Our model is validated using tomato crop. In India, tomato is affected by 10 different diseases. Our model achieves an accuracy of more than 95% in correctly classifying the diseases. The main contribution of our research is to create a personal assistant for farmers for managing plant disease, although the model was validated using tomato crop, it can be easily extended to other crops. The advancement of technology in computing and availability of large data has made possible the success of deep learning applications in computer vision, natural language processing, image recognition, etc. With these robust models and huge smartphone penetration, feasibility of implementation of these models is high resulting in timely advise to the farmers and thus increasing the farmers' income and reducing the input costs.Keywords: analytics in agriculture, CNN, crop disease detection, data augmentation, image recognition, one shot learning, transfer learning
Procedia PDF Downloads 12025155 Housing Price Prediction Using Machine Learning Algorithms: The Case of Melbourne City, Australia
Authors: The Danh Phan
Abstract:
House price forecasting is a main topic in the real estate market research. Effective house price prediction models could not only allow home buyers and real estate agents to make better data-driven decisions but may also be beneficial for the property policymaking process. This study investigates the housing market by using machine learning techniques to analyze real historical house sale transactions in Australia. It seeks useful models which could be deployed as an application for house buyers and sellers. Data analytics show a high discrepancy between the house price in the most expensive suburbs and the most affordable suburbs in the city of Melbourne. In addition, experiments demonstrate that the combination of Stepwise and Support Vector Machine (SVM), based on the Mean Squared Error (MSE) measurement, consistently outperforms other models in terms of prediction accuracy.Keywords: house price prediction, regression trees, neural network, support vector machine, stepwise
Procedia PDF Downloads 23225154 Six Failure Points Innovators and Entrepreneurs Risk Falling into: An Exploratory Study of Underlying Emotions and Behaviors of Self- Perceived Failure
Authors: Katarzyna Niewiadomska
Abstract:
Many technology startups fail to achieve a worthwhile return on investment for their funders, founders, and employees. Failures in product development, to-market strategy, sales, and delivery are commonly recognized. Founder failures are not as obvious and harder to identify. This paper explores six critical failure points that entrepreneurs and innovators are susceptible to and aims to link their emotional intelligence and behavioral profile to the points at which they experienced self-perceived failure. A model of six failure points from the perspective of the technology entrepreneur ranging from pre-startup to maturity is provided. By analyzing emotional and behavioral profile data from entrepreneurs and recording in-person accounts, certain key emotional and behavioral clusters contributing to each failure point are determined, and several underlying factors are defined and discussed. Recommendations that support entrepreneurs and innovators stalling at each failure point are given. This work can enable stakeholders to evaluate founder emotional and behavioral profiles and to take risk-mitigating action, either through coaching or through more robust team creation, to avoid founder-related company failure. The paper will be of interest to investors funding startups, executives leading them and mentors supporting them.Keywords: behavior, emotional intelligence, entrepreneur, failure
Procedia PDF Downloads 23025153 Training AI to Be Empathetic and Determining the Psychotype of a Person During a Conversation with a Chatbot
Authors: Aliya Grig, Konstantin Sokolov, Igor Shatalin
Abstract:
The report describes the methodology for collecting data and building an ML model for determining the personality psychotype using profiling and personality traits methods based on several short messages of a user communicating on an arbitrary topic with a chitchat bot. In the course of the experiments, the minimum amount of text was revealed to confidently determine aspects of personality. Model accuracy - 85%. Users' language of communication is English. AI for a personalized communication with a user based on his mood, personality, and current emotional state. Features investigated during the research: personalized communication; providing empathy; adaptation to a user; predictive analytics. In the report, we describe the processes that captures both structured and unstructured data pertaining to a user in large quantities and diverse forms. This data is then effectively processed through ML tools to construct a knowledge graph and draw inferences regarding users of text messages in a comprehensive manner. Specifically, the system analyzes users' behavioral patterns and predicts future scenarios based on this analysis. As a result of the experiments, we provide for further research on training AI models to be empathetic, creating personalized communication for a userKeywords: AI, empathetic, chatbot, AI models
Procedia PDF Downloads 94