Search results for: data analytics
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 24163

Search results for: data analytics

24043 Enhancing Large Language Models' Data Analysis Capability with Planning-and-Execution and Code Generation Agents: A Use Case for Southeast Asia Real Estate Market Analytics

Authors: Kien Vu, Jien Min Soh, Mohamed Jahangir Abubacker, Piyawut Pattamanon, Soojin Lee, Suvro Banerjee

Abstract:

Recent advances in Generative Artificial Intelligence (GenAI), in particular Large Language Models (LLMs) have shown promise to disrupt multiple industries at scale. However, LLMs also present unique challenges, notably, these so-called "hallucination" which is the generation of outputs that are not grounded in the input data that hinders its adoption into production. Common practice to mitigate hallucination problem is utilizing Retrieval Agmented Generation (RAG) system to ground LLMs'response to ground truth. RAG converts the grounding documents into embeddings, retrieve the relevant parts with vector similarity between user's query and documents, then generates a response that is not only based on its pre-trained knowledge but also on the specific information from the retrieved documents. However, the RAG system is not suitable for tabular data and subsequent data analysis tasks due to multiple reasons such as information loss, data format, and retrieval mechanism. In this study, we have explored a novel methodology that combines planning-and-execution and code generation agents to enhance LLMs' data analysis capabilities. The approach enables LLMs to autonomously dissect a complex analytical task into simpler sub-tasks and requirements, then convert them into executable segments of code. In the final step, it generates the complete response from output of the executed code. When deployed beta version on DataSense, the property insight tool of PropertyGuru, the approach yielded promising results, as it was able to provide market insights and data visualization needs with high accuracy and extensive coverage by abstracting the complexities for real-estate agents and developers from non-programming background. In essence, the methodology not only refines the analytical process but also serves as a strategic tool for real estate professionals, aiding in market understanding and enhancement without the need for programming skills. The implication extends beyond immediate analytics, paving the way for a new era in the real estate industry characterized by efficiency and advanced data utilization.

Keywords: large language model, reasoning, planning and execution, code generation, natural language processing, prompt engineering, data analysis, real estate, data sense, PropertyGuru

Procedia PDF Downloads 34
24042 Intellectual Property in Digital Environment

Authors: Balamurugan L.

Abstract:

Artificial intelligence (AI) and its applications in Intellectual Property Rights (IPR) has been significantly growing in recent years. In last couple of years, AI tools for Patent Research and Patent Analytics have been well-stabilized in terms of accuracy of references and representation of identified patent insights. However, AI tools for Patent Prosecution and Patent Litigation are still in the nascent stage and there may be a significant potential if such market is explored further. Our research is primarily focused on identifying potential whitespaces and schematic algorithms to automate the Patent Prosecution and Patent Litigation Process of the Intellectual Property. The schematic algorithms may assist leading AI tool developers, to explore such opportunities in the field of Intellectual Property. Our research is also focused on identification of pitfalls of the AI. For example, Information Security and its impact in IPR, and Potential remediations to sustain the IPR in the digital environment.

Keywords: artificial intelligence, patent analytics, patent drafting, patent litigation, patent prosecution, patent research

Procedia PDF Downloads 37
24041 Optimizing Data Integration and Management Strategies for Upstream Oil and Gas Operations

Authors: Deepak Singh, Rail Kuliev

Abstract:

The abstract highlights the critical importance of optimizing data integration and management strategies in the upstream oil and gas industry. With its complex and dynamic nature generating vast volumes of data, efficient data integration and management are essential for informed decision-making, cost reduction, and maximizing operational performance. Challenges such as data silos, heterogeneity, real-time data management, and data quality issues are addressed, prompting the proposal of several strategies. These strategies include implementing a centralized data repository, adopting industry-wide data standards, employing master data management (MDM), utilizing real-time data integration technologies, and ensuring data quality assurance. Training and developing the workforce, “reskilling and upskilling” the employees and establishing robust Data Management training programs play an essential role and integral part in this strategy. The article also emphasizes the significance of data governance and best practices, as well as the role of technological advancements such as big data analytics, cloud computing, Internet of Things (IoT), and artificial intelligence (AI) and machine learning (ML). To illustrate the practicality of these strategies, real-world case studies are presented, showcasing successful implementations that improve operational efficiency and decision-making. In present study, by embracing the proposed optimization strategies, leveraging technological advancements, and adhering to best practices, upstream oil and gas companies can harness the full potential of data-driven decision-making, ultimately achieving increased profitability and a competitive edge in the ever-evolving industry.

Keywords: master data management, IoT, AI&ML, cloud Computing, data optimization

Procedia PDF Downloads 36
24040 Generating Real-Time Visual Summaries from Located Sensor-Based Data with Chorems

Authors: Z. Bouattou, R. Laurini, H. Belbachir

Abstract:

This paper describes a new approach for the automatic generation of the visual summaries dealing with cartographic visualization methods and sensors real time data modeling. Hence, the concept of chorems seems an interesting candidate to visualize real time geographic database summaries. Chorems have been defined by Roger Brunet (1980) as schematized visual representations of territories. However, the time information is not yet handled in existing chorematic map approaches, issue has been discussed in this paper. Our approach is based on spatial analysis by interpolating the values recorded at the same time, by sensors available, so we have a number of distributed observations on study areas and used spatial interpolation methods to find the concentration fields, from these fields and by using some spatial data mining procedures on the fly, it is possible to extract important patterns as geographic rules. Then, those patterns are visualized as chorems.

Keywords: geovisualization, spatial analytics, real-time, geographic data streams, sensors, chorems

Procedia PDF Downloads 368
24039 Indian Premier League (IPL) Score Prediction: Comparative Analysis of Machine Learning Models

Authors: Rohini Hariharan, Yazhini R, Bhamidipati Naga Shrikarti

Abstract:

In the realm of cricket, particularly within the context of the Indian Premier League (IPL), the ability to predict team scores accurately holds significant importance for both cricket enthusiasts and stakeholders alike. This paper presents a comprehensive study on IPL score prediction utilizing various machine learning algorithms, including Support Vector Machines (SVM), XGBoost, Multiple Regression, Linear Regression, K-nearest neighbors (KNN), and Random Forest. Through meticulous data preprocessing, feature engineering, and model selection, we aimed to develop a robust predictive framework capable of forecasting team scores with high precision. Our experimentation involved the analysis of historical IPL match data encompassing diverse match and player statistics. Leveraging this data, we employed state-of-the-art machine learning techniques to train and evaluate the performance of each model. Notably, Multiple Regression emerged as the top-performing algorithm, achieving an impressive accuracy of 77.19% and a precision of 54.05% (within a threshold of +/- 10 runs). This research contributes to the advancement of sports analytics by demonstrating the efficacy of machine learning in predicting IPL team scores. The findings underscore the potential of advanced predictive modeling techniques to provide valuable insights for cricket enthusiasts, team management, and betting agencies. Additionally, this study serves as a benchmark for future research endeavors aimed at enhancing the accuracy and interpretability of IPL score prediction models.

Keywords: indian premier league (IPL), cricket, score prediction, machine learning, support vector machines (SVM), xgboost, multiple regression, linear regression, k-nearest neighbors (KNN), random forest, sports analytics

Procedia PDF Downloads 6
24038 Microclimate Variations in Rio de Janeiro Related to Massive Public Transportation

Authors: Marco E. O. Jardim, Frederico A. M. Souza, Valeria M. Bastos, Myrian C. A. Costa, Nelson F. F. Ebecken

Abstract:

Urban public transportation in Rio de Janeiro is based on bus lines, powered by diesel, and four limited metro lines that support only some neighborhoods. This work presents an infrastructure built to better understand microclimate variations related to massive urban transportation in some specific areas of the city. The use of sensor nodes with small analytics capacity provides environmental information to population or public services. The analyses of data collected from a few small sensors positioned near some heavy traffic streets show the harmful impact due to poor bus route plan.

Keywords: big data, IoT, public transportation, public health system

Procedia PDF Downloads 212
24037 Evolution of Approaches to Cost Calculation in the Conditions of the Modern Russian Economy

Authors: Elena Tkachenko, Vladimir Kokh, Alina Osipenko, Vladislav Surkov

Abstract:

The modern period of development of Russian economy is fraught with a number of problems related to limitations in the use of traditional planning and financial management tools. Restrictions in the use of foreign software when performing an order of the Russian Government, on the one hand, and sanctions limiting the support of the major ERP and MRP II systems in the Russian Federation, on the other hand, entail the necessity to appeal to the basics of developing budgeting and analysis systems for industrial enterprises. Thus, cost calculation theory becomes the theoretical foundation for the development of industrial cost management systems. Based on the foregoing, it would be fair to make an assumption that the development of a working managerial accounting model on an industrial enterprise using an automated enterprise resource management system should rest upon the concept of the inevitability of alterations of business processes. On the other hand, optimized business processes make the architecture of financial analytics more transparent and permit the use of all the benefits of data cubes. The metrics and indicator slices provide online assessment of the state of key business processes at a given moment of time, which improves the quality of managerial decisions considerably. Therefore, the bilateral sanctions situation boosted the development of corporate business analytics and took industrial companies to the next level of understanding of business processes.

Keywords: cost culculation, ERP, OLAP, modern Russian economy

Procedia PDF Downloads 191
24036 Employing a Knime-based and Open-source Tools to Identify AMI and VER Metabolites from UPLC-MS Data

Authors: Nouf Alourfi

Abstract:

This study examines the metabolism of amitriptyline (AMI) and verapamil (VER) using a KNIME-based method. KNIME improved workflow is an open-source data-analytics platform that integrates a number of open-source metabolomics tools such as CFMID and MetFrag to provide standard data visualisations, predict candidate metabolites, assess them against experimental data, and produce reports on identified metabolites. The use of this workflow is demonstrated by employing three types of liver microsomes (human, rat, and Guinea pig) to study the in vitro metabolism of the two drugs (AMI and VER). This workflow is used to create and treat UPLC-MS (Orbitrap) data. The formulas and structures of these drugs' metabolites can be assigned automatically. The key metabolic routes for amitriptyline are hydroxylation, N-dealkylation, N-oxidation, and conjugation, while N-demethylation, O-demethylation and N-dealkylation, and conjugation are the primary metabolic routes for verapamil. The identified metabolites are compatible to the published, clarifying the solidity of the workflow technique and the usage of computational tools like KNIME in supporting the integration and interoperability of emerging novel software packages in the metabolomics area.

Keywords: KNIME, CFMID, MetFrag, Data Analysis, Metabolomics

Procedia PDF Downloads 87
24035 The Role of Technology in Transforming the Finance, Banking, and Insurance Sectors

Authors: Farid Fahami

Abstract:

This article explores the transformative role of technology in the finance, banking, and insurance sectors. It examines key technological trends such as AI, blockchain, data analytics, and digital platforms and their impact on operations, customer experiences, and business models. The article highlights the benefits of technology adoption, including improved efficiency, cost reduction, enhanced customer experiences, and expanded financial inclusion. It also addresses challenges like cybersecurity, data privacy, and the need for upskilling. Real-world case studies demonstrate successful technology integration, and recommendations for stakeholders emphasize embracing innovation and collaboration. The article concludes by emphasizing the importance of technology in shaping the future of these sectors.

Keywords: banking, finance, insurance, technology

Procedia PDF Downloads 43
24034 High School Gain Analytics From National Assessment Program – Literacy and Numeracy and Australian Tertiary Admission Rankin Linkage

Authors: Andrew Laming, John Hattie, Mark Wilson

Abstract:

Nine Queensland Independent high schools provided deidentified student-matched ATAR and NAPLAN data for all 1217 ATAR graduates since 2020 who also sat NAPLAN at the school. Graduating cohorts from the nine schools contained a mean 100 ATAR graduates with previous NAPLAN data from their school. Excluded were vocational students (mean=27) and any ATAR graduates without NAPLAN data (mean=20). Based on Index of Community Socio-Educational Access (ICSEA) prediction, all schools had larger that predicted proportions of their students graduating with ATARs. There were an additional 173 students not releasing their ATARs to their school (14%), requiring this data to be inferred by schools. Gain was established by first converting each student’s strongest NAPLAN domain to a statewide percentile, then subtracting this result from final ATAR. The resulting ‘percentile shift’ was corrected for plausible ATAR participation at each NAPLAN level. Strongest NAPLAN domain had the highest correlation with ATAR (R2=0.58). RESULTS School mean NAPLAN scores fitted ICSEA closely (R2=0.97). Schools achieved a mean cohort gain of two ATAR rankings, but only 66% of students gained. This ranged from 46% of top-NAPLAN decile students gaining, rising to 75% achieving gains outside the top decile. The 54% of top-decile students whose ATAR fell short of prediction lost a mean 4.0 percentiles (or 6.2 percentiles prior to correction for regression to the mean). 71% of students in smaller schools gained, compared to 63% in larger schools. NAPLAN variability in each of the 13 ICSEA1100 cohorts was 17%, with both intra-school and inter-school variation of these values extremely low (0.3% to 1.8%). Mean ATAR change between years in each school was just 1.1 ATAR ranks. This suggests consecutive school cohorts and ICSEA-similar schools share very similar distributions and outcomes over time. Quantile analysis of the NAPLAN/ATAR revealed heteroscedasticity, but splines offered little additional benefit over simple linear regression. The NAPLAN/ATAR R2 was 0.33. DISCUSSION Standardised data like NAPLAN and ATAR offer educators a simple no-cost progression metric to analyse performance in conjunction with their internal test results. Change is expressed in percentiles, or ATAR shift per student, which is layperson intuitive. Findings may also reduce ATAR/vocational stream mismatch, reveal proportions of cohorts meeting or falling short of expectation and demonstrate by how much. Finally, ‘crashed’ ATARs well below expectation are revealed, which schools can reasonably work to minimise. The percentile shift method is neither value-add nor a growth percentile. In the absence of exit NAPLAN testing, this metric is unable to discriminate academic gain from legitimate ATAR-maximizing strategies. But by controlling for ICSEA, ATAR proportion variation and student mobility, it uncovers progression to ATAR metrics which are not currently publicly available. However achieved, ATAR maximisation is a sought-after private good. So long as standardised nationwide data is available, this analysis offers useful analytics for educators and reasonable predictivity when counselling subsequent cohorts about their ATAR prospects.  

Keywords: NAPLAN, ATAR, analytics, measurement, gain, performance, data, percentile, value-added, high school, numeracy, reading comprehension, variability, regression to the mean

Procedia PDF Downloads 34
24033 Big Data in Construction Project Management: The Colombian Northeast Case

Authors: Sergio Zabala-Vargas, Miguel Jiménez-Barrera, Luz VArgas-Sánchez

Abstract:

In recent years, information related to project management in organizations has been increasing exponentially. Performance data, management statistics, indicator results have forced the collection, analysis, traceability, and dissemination of project managers to be essential. In this sense, there are current trends to facilitate efficient decision-making in emerging technology projects, such as: Machine Learning, Data Analytics, Data Mining, and Big Data. The latter is the most interesting in this project. This research is part of the thematic line Construction methods and project management. Many authors present the relevance that the use of emerging technologies, such as Big Data, has taken in recent years in project management in the construction sector. The main focus is the optimization of time, scope, budget, and in general mitigating risks. This research was developed in the northeastern region of Colombia-South America. The first phase was aimed at diagnosing the use of emerging technologies (Big-Data) in the construction sector. In Colombia, the construction sector represents more than 50% of the productive system, and more than 2 million people participate in this economic segment. The quantitative approach was used. A survey was applied to a sample of 91 companies in the construction sector. Preliminary results indicate that the use of Big Data and other emerging technologies is very low and also that there is interest in modernizing project management. There is evidence of a correlation between the interest in using new data management technologies and the incorporation of Building Information Modeling BIM. The next phase of the research will allow the generation of guidelines and strategies for the incorporation of technological tools in the construction sector in Colombia.

Keywords: big data, building information modeling, tecnology, project manamegent

Procedia PDF Downloads 96
24032 Choosing an Optimal Epsilon for Differentially Private Arrhythmia Analysis

Authors: Arin Ghazarian, Cyril Rakovski

Abstract:

Differential privacy has become the leading technique to protect the privacy of individuals in a database while allowing useful analysis to be done and the results to be shared. It puts a guarantee on the amount of privacy loss in the worst-case scenario. Differential privacy is not a toggle between full privacy and zero privacy. It controls the tradeoff between the accuracy of the results and the privacy loss using a single key parameter called

Keywords: arrhythmia, cardiology, differential privacy, ECG, epsilon, medi-cal data, privacy preserving analytics, statistical databases

Procedia PDF Downloads 118
24031 Structuring and Visualizing Healthcare Claims Data Using Systems Architecture Methodology

Authors: Inas S. Khayal, Weiping Zhou, Jonathan Skinner

Abstract:

Healthcare delivery systems around the world are in crisis. The need to improve health outcomes while decreasing healthcare costs have led to an imminent call to action to transform the healthcare delivery system. While Bioinformatics and Biomedical Engineering have primarily focused on biological level data and biomedical technology, there is clear evidence of the importance of the delivery of care on patient outcomes. Classic singular decomposition approaches from reductionist science are not capable of explaining complex systems. Approaches and methods from systems science and systems engineering are utilized to structure healthcare delivery system data. Specifically, systems architecture is used to develop a multi-scale and multi-dimensional characterization of the healthcare delivery system, defined here as the Healthcare Delivery System Knowledge Base. This paper is the first to contribute a new method of structuring and visualizing a multi-dimensional and multi-scale healthcare delivery system using systems architecture in order to better understand healthcare delivery.

Keywords: health informatics, systems thinking, systems architecture, healthcare delivery system, data analytics

Procedia PDF Downloads 313
24030 The Analyzer: Clustering Based System for Improving Business Productivity by Analyzing User Profiles to Enhance Human Computer Interaction

Authors: Dona Shaini Abhilasha Nanayakkara, Kurugamage Jude Pravinda Gregory Perera

Abstract:

E-commerce platforms have revolutionized the shopping experience, offering convenient ways for consumers to make purchases. To improve interactions with customers and optimize marketing strategies, it is essential for businesses to understand user behavior, preferences, and needs on these platforms. This paper focuses on recommending businesses to customize interactions with users based on their behavioral patterns, leveraging data-driven analysis and machine learning techniques. Businesses can improve engagement and boost the adoption of e-commerce platforms by aligning behavioral patterns with user goals of usability and satisfaction. We propose TheAnalyzer, a clustering-based system designed to enhance business productivity by analyzing user-profiles and improving human-computer interaction. The Analyzer seamlessly integrates with business applications, collecting relevant data points based on users' natural interactions without additional burdens such as questionnaires or surveys. It defines five key user analytics as features for its dataset, which are easily captured through users' interactions with e-commerce platforms. This research presents a study demonstrating the successful distinction of users into specific groups based on the five key analytics considered by TheAnalyzer. With the assistance of domain experts, customized business rules can be attached to each group, enabling The Analyzer to influence business applications and provide an enhanced personalized user experience. The outcomes are evaluated quantitatively and qualitatively, demonstrating that utilizing TheAnalyzer’s capabilities can optimize business outcomes, enhance customer satisfaction, and drive sustainable growth. The findings of this research contribute to the advancement of personalized interactions in e-commerce platforms. By leveraging user behavioral patterns and analyzing both new and existing users, businesses can effectively tailor their interactions to improve customer satisfaction, loyalty and ultimately drive sales.

Keywords: data clustering, data standardization, dimensionality reduction, human computer interaction, user profiling

Procedia PDF Downloads 35
24029 Video Analytics on Pedagogy Using Big Data

Authors: Jamuna Loganath

Abstract:

Education is the key to the development of any individual’s personality. Today’s students will be tomorrow’s citizens of the global society. The education of the student is the edifice on which his/her future will be built. Schools therefore should provide an all-round development of students so as to foster a healthy society. The behaviors and the attitude of the students in school play an essential role for the success of the education process. Frequent reports of misbehaviors such as clowning, harassing classmates, verbal insults are becoming common in schools today. If this issue is left unattended, it may develop a negative attitude and increase the delinquent behavior. So, the need of the hour is to find a solution to this problem. To solve this issue, it is important to monitor the students’ behaviors in school and give necessary feedback and mentor them to develop a positive attitude and help them to become a successful grownup. Nevertheless, measuring students’ behavior and attitude is extremely challenging. None of the present technology has proven to be effective in this measurement process because actions, reactions, interactions, response of the students are rarely used in the course of the data due to complexity. The purpose of this proposal is to recommend an effective supervising system after carrying out a feasibility study by measuring the behavior of the Students. This can be achieved by equipping schools with CCTV cameras. These CCTV cameras installed in various schools of the world capture the facial expressions and interactions of the students inside and outside their classroom. The real time raw videos captured from the CCTV can be uploaded to the cloud with the help of a network. The video feeds get scooped into various nodes in the same rack or on the different racks in the same cluster in Hadoop HDFS. The video feeds are converted into small frames and analyzed using various Pattern recognition algorithms and MapReduce algorithm. Then, the video frames are compared with the bench marking database (good behavior). When misbehavior is detected, an alert message can be sent to the counseling department which helps them in mentoring the students. This will help in improving the effectiveness of the education process. As Video feeds come from multiple geographical areas (schools from different parts of the world), BIG DATA helps in real time analysis as it analyzes computationally to reveal patterns, trends, and associations, especially relating to human behavior and interactions. It also analyzes data that can’t be analyzed by traditional software applications such as RDBMS, OODBMS. It has also proven successful in handling human reactions with ease. Therefore, BIG DATA could certainly play a vital role in handling this issue. Thus, effectiveness of the education process can be enhanced with the help of video analytics using the latest BIG DATA technology.

Keywords: big data, cloud, CCTV, education process

Procedia PDF Downloads 214
24028 Developing a Cloud Intelligence-Based Energy Management Architecture Facilitated with Embedded Edge Analytics for Energy Conservation in Demand-Side Management

Authors: Yu-Hsiu Lin, Wen-Chun Lin, Yen-Chang Cheng, Chia-Ju Yeh, Yu-Chuan Chen, Tai-You Li

Abstract:

Demand-Side Management (DSM) has the potential to reduce electricity costs and carbon emission, which are associated with electricity used in the modern society. A home Energy Management System (EMS) commonly used by residential consumers in a down-stream sector of a smart grid to monitor, control, and optimize energy efficiency to domestic appliances is a system of computer-aided functionalities as an energy audit for residential DSM. Implementing fault detection and classification to domestic appliances monitored, controlled, and optimized is one of the most important steps to realize preventive maintenance, such as residential air conditioning and heating preventative maintenance in residential/industrial DSM. In this study, a cloud intelligence-based green EMS that comes up with an Internet of Things (IoT) technology stack for residential DSM is developed. In the EMS, Arduino MEGA Ethernet communication-based smart sockets that module a Real Time Clock chip to keep track of current time as timestamps via Network Time Protocol are designed and implemented for readings of load phenomena reflecting on voltage and current signals sensed. Also, a Network-Attached Storage providing data access to a heterogeneous group of IoT clients via Hypertext Transfer Protocol (HTTP) methods is configured to data stores of parsed sensor readings. Lastly, a desktop computer with a WAMP software bundle (the Microsoft® Windows operating system, Apache HTTP Server, MySQL relational database management system, and PHP programming language) serves as a data science analytics engine for dynamic Web APP/REpresentational State Transfer-ful web service of the residential DSM having globally-Advanced Internet of Artificial Intelligence (AI)/Computational Intelligence. Where, an abstract computing machine, Java Virtual Machine, enables the desktop computer to run Java programs, and a mash-up of Java, R language, and Python is well-suited and -configured for AI in this study. Having the ability of sending real-time push notifications to IoT clients, the desktop computer implements Google-maintained Firebase Cloud Messaging to engage IoT clients across Android/iOS devices and provide mobile notification service to residential/industrial DSM. In this study, in order to realize edge intelligence that edge devices avoiding network latency and much-needed connectivity of Internet connections for Internet of Services can support secure access to data stores and provide immediate analytical and real-time actionable insights at the edge of the network, we upgrade the designed and implemented smart sockets to be embedded AI Arduino ones (called embedded AIduino). With the realization of edge analytics by the proposed embedded AIduino for data analytics, an Arduino Ethernet shield WizNet W5100 having a micro SD card connector is conducted and used. The SD library is included for reading parsed data from and writing parsed data to an SD card. And, an Artificial Neural Network library, ArduinoANN, for Arduino MEGA is imported and used for locally-embedded AI implementation. The embedded AIduino in this study can be developed for further applications in manufacturing industry energy management and sustainable energy management, wherein in sustainable energy management rotating machinery diagnostics works to identify energy loss from gross misalignment and unbalance of rotating machines in power plants as an example.

Keywords: demand-side management, edge intelligence, energy management system, fault detection and classification

Procedia PDF Downloads 222
24027 Data Monetisation by E-commerce Companies: A Need for a Regulatory Framework in India

Authors: Anushtha Saxena

Abstract:

This paper examines the process of data monetisation bye-commerce companies operating in India. Data monetisation is collecting, storing, and analysing consumers’ data to use further the data that is generated for profits, revenue, etc. Data monetisation enables e-commerce companies to get better businesses opportunities, innovative products and services, a competitive edge over others to the consumers, and generate millions of revenues. This paper analyses the issues and challenges that are faced due to the process of data monetisation. Some of the issues highlighted in the paper pertain to the right to privacy, protection of data of e-commerce consumers. At the same time, data monetisation cannot be prohibited, but it can be regulated and monitored by stringent laws and regulations. The right to privacy isa fundamental right guaranteed to the citizens of India through Article 21 of The Constitution of India. The Supreme Court of India recognized the Right to Privacy as a fundamental right in the landmark judgment of Justice K.S. Puttaswamy (Retd) and Another v. Union of India . This paper highlights the legal issue of how e-commerce businesses violate individuals’ right to privacy by using the data collected, stored by them for economic gains and monetisation and protection of data. The researcher has mainly focused on e-commerce companies like online shopping websitesto analyse the legal issue of data monetisation. In the Internet of Things and the digital age, people have shifted to online shopping as it is convenient, easy, flexible, comfortable, time-consuming, etc. But at the same time, the e-commerce companies store the data of their consumers and use it by selling to the third party or generating more data from the data stored with them. This violatesindividuals’ right to privacy because the consumers do not know anything while giving their data online. Many times, data is collected without the consent of individuals also. Data can be structured, unstructured, etc., that is used by analytics to monetise. The Indian legislation like The Information Technology Act, 2000, etc., does not effectively protect the e-consumers concerning their data and how it is used by e-commerce businesses to monetise and generate revenues from that data. The paper also examines the draft Data Protection Bill, 2021, pending in the Parliament of India, and how this Bill can make a huge impact on data monetisation. This paper also aims to study the European Union General Data Protection Regulation and how this legislation can be helpful in the Indian scenarioconcerning e-commerce businesses with respect to data monetisation.

Keywords: data monetization, e-commerce companies, regulatory framework, GDPR

Procedia PDF Downloads 81
24026 A Study on the New Weapon Requirements Analytics Using Simulations and Big Data

Authors: Won Il Jung, Gene Lee, Luis Rabelo

Abstract:

Since many weapon systems are getting more complex and diverse, various problems occur in terms of the acquisition cost, time, and performance limitation. As a matter of fact, the experiment execution in real world is costly, dangerous, and time-consuming to obtain Required Operational Characteristics (ROC) for a new weapon acquisition although enhancing the fidelity of experiment results. Also, until presently most of the research contained a large amount of assumptions so therefore a bias is present in the experiment results. At this moment, the new methodology is proposed to solve these problems without a variety of assumptions. ROC of the new weapon system is developed through the new methodology, which is a way to analyze big data generated by simulating various scenarios based on virtual and constructive models which are involving 6 Degrees of Freedom (6DoF). The new methodology enables us to identify unbiased ROC on new weapons by reducing assumptions and provide support in terms of the optimal weapon systems acquisition.

Keywords: big data, required operational characteristics (ROC), virtual and constructive models, weapon acquisition

Procedia PDF Downloads 261
24025 Machine Learning Facing Behavioral Noise Problem in an Imbalanced Data Using One Side Behavioral Noise Reduction: Application to a Fraud Detection

Authors: Salma El Hajjami, Jamal Malki, Alain Bouju, Mohammed Berrada

Abstract:

With the expansion of machine learning and data mining in the context of Big Data analytics, the common problem that affects data is class imbalance. It refers to an imbalanced distribution of instances belonging to each class. This problem is present in many real world applications such as fraud detection, network intrusion detection, medical diagnostics, etc. In these cases, data instances labeled negatively are significantly more numerous than the instances labeled positively. When this difference is too large, the learning system may face difficulty when tackling this problem, since it is initially designed to work in relatively balanced class distribution scenarios. Another important problem, which usually accompanies these imbalanced data, is the overlapping instances between the two classes. It is commonly referred to as noise or overlapping data. In this article, we propose an approach called: One Side Behavioral Noise Reduction (OSBNR). This approach presents a way to deal with the problem of class imbalance in the presence of a high noise level. OSBNR is based on two steps. Firstly, a cluster analysis is applied to groups similar instances from the minority class into several behavior clusters. Secondly, we select and eliminate the instances of the majority class, considered as behavioral noise, which overlap with behavior clusters of the minority class. The results of experiments carried out on a representative public dataset confirm that the proposed approach is efficient for the treatment of class imbalances in the presence of noise.

Keywords: machine learning, imbalanced data, data mining, big data

Procedia PDF Downloads 101
24024 A Study of Predicting Judgments on Causes of Online Privacy Invasions: Based on U.S Judicial Cases

Authors: Minjung Park, Sangmi Chai, Myoung Jun Lee

Abstract:

Since there are growing concerns on online privacy, enterprises could involve various personal privacy infringements cases resulting legal causations. For companies that are involving online business, it is important for them to pay extra attentions to protect users’ privacy. If firms can aware consequences from possible online privacy invasion cases, they can more actively prevent future online privacy infringements. This study attempts to predict the probability of ruling types caused by various invasion cases under U.S Personal Privacy Act. More specifically, this research explores online privacy invasion cases which was sentenced guilty to identify types of criminal punishments such as penalty, imprisonment, probation as well as compensation in civil cases. Based on the 853 U.S judicial cases ranged from January, 2000 to May, 2016, which related on data privacy, this research examines the relationship between personal information infringements cases and adjudications. Upon analysis results of 41,724 words extracted from 853 regal cases, this study examined online users’ privacy invasion cases to predict the probability of conviction for a firm as an offender in both of criminal and civil law. This research specifically examines that a cause of privacy infringements and a judgment type, whether it leads a civil or criminal liability, from U.S court. This study applies network text analysis (NTA) for data analysis, which is regarded as a useful method to discover embedded social trends within texts. According to our research results, certain online privacy infringement cases caused by online spamming and adware have a high possibility that firms are liable in the case. Our research results provide meaningful insights to academia as well as industry. First, our study is providing a new insight by applying Big Data analytics to legal cases so that it can predict the cause of invasions and legal consequences. Since there are few researches applying big data analytics in the domain of law, specifically in online privacy, this study suggests new area that future studies can explore. Secondly, this study reflects social influences, such as a development of privacy invasion technologies and changes of users’ level of awareness of online privacy on judicial cases analysis by adopting NTA method. Our research results indicate that firms need to improve technical and managerial systems to protect users’ online privacy to avoid negative legal consequences.

Keywords: network text analysis, online privacy invasions, personal information infringements, predicting judgements

Procedia PDF Downloads 199
24023 Commercial Automobile Insurance: A Practical Approach of the Generalized Additive Model

Authors: Nicolas Plamondon, Stuart Atkinson, Shuzi Zhou

Abstract:

The insurance industry is usually not the first topic one has in mind when thinking about applications of data science. However, the use of data science in the finance and insurance industry is growing quickly for several reasons, including an abundance of reliable customer data, ferocious competition requiring more accurate pricing, etc. Among the top use cases of data science, we find pricing optimization, customer segmentation, customer risk assessment, fraud detection, marketing, and triage analytics. The objective of this paper is to present an application of the generalized additive model (GAM) on a commercial automobile insurance product: an individually rated commercial automobile. These are vehicles used for commercial purposes, but for which there is not enough volume to apply pricing to several vehicles at the same time. The GAM model was selected as an improvement over GLM for its ease of use and its wide range of applications. The model was trained using the largest split of the data to determine model parameters. The remaining part of the data was used as testing data to verify the quality of the modeling activity. We used the Gini coefficient to evaluate the performance of the model. For long-term monitoring, commonly used metrics such as RMSE and MAE will be used. Another topic of interest in the insurance industry is to process of producing the model. We will discuss at a high level the interactions between the different teams with an insurance company that needs to work together to produce a model and then monitor the performance of the model over time. Moreover, we will discuss the regulations in place in the insurance industry. Finally, we will discuss the maintenance of the model and the fact that new data does not come constantly and that some metrics can take a long time to become meaningful.

Keywords: insurance, data science, modeling, monitoring, regulation, processes

Procedia PDF Downloads 49
24022 Challenges in E-Government: Conceptual Views and Solutions

Authors: Rasim Alguliev, Farhad Yusifov

Abstract:

Considering the international experience, conceptual and architectural principles of forming of electron government are researched and some suggestions were made. The assessment of monitoring of forming processes of electron government, intellectual analysis of web-resources, provision of information security, electron democracy problems were researched, conceptual approaches were suggested. By taking into consideration main principles of electron government theory, important research directions were specified.

Keywords: electron government, public administration, information security, web-analytics, social networks, data mining

Procedia PDF Downloads 432
24021 Local Binary Patterns-Based Statistical Data Analysis for Accurate Soccer Match Prediction

Authors: Mohammad Ghahramani, Fahimeh Saei Manesh

Abstract:

Winning a soccer game is based on thorough and deep analysis of the ongoing match. On the other hand, giant gambling companies are in vital need of such analysis to reduce their loss against their customers. In this research work, we perform deep, real-time analysis on every soccer match around the world that distinguishes our work from others by focusing on particular seasons, teams and partial analytics. Our contributions are presented in the platform called “Analyst Masters.” First, we introduce various sources of information available for soccer analysis for teams around the world that helped us record live statistical data and information from more than 50,000 soccer matches a year. Our second and main contribution is to introduce our proposed in-play performance evaluation. The third contribution is developing new features from stable soccer matches. The statistics of soccer matches and their odds before and in-play are considered in the image format versus time including the halftime. Local Binary patterns, (LBP) is then employed to extract features from the image. Our analyses reveal incredibly interesting features and rules if a soccer match has reached enough stability. For example, our “8-minute rule” implies if 'Team A' scores a goal and can maintain the result for at least 8 minutes then the match would end in their favor in a stable match. We could also make accurate predictions before the match of scoring less/more than 2.5 goals. We benefit from the Gradient Boosting Trees, GBT, to extract highly related features. Once the features are selected from this pool of data, the Decision trees decide if the match is stable. A stable match is then passed to a post-processing stage to check its properties such as betters’ and punters’ behavior and its statistical data to issue the prediction. The proposed method was trained using 140,000 soccer matches and tested on more than 100,000 samples achieving 98% accuracy to select stable matches. Our database from 240,000 matches shows that one can get over 20% betting profit per month using Analyst Masters. Such consistent profit outperforms human experts and shows the inefficiency of the betting market. Top soccer tipsters achieve 50% accuracy and 8% monthly profit in average only on regional matches. Both our collected database of more than 240,000 soccer matches from 2012 and our algorithm would greatly benefit coaches and punters to get accurate analysis.

Keywords: soccer, analytics, machine learning, database

Procedia PDF Downloads 207
24020 A Methodology of Using Fuzzy Logics and Data Analytics to Estimate the Life Cycle Indicators of Solar Photovoltaics

Authors: Thor Alexis Sazon, Alexander Guzman-Urbina, Yasuhiro Fukushima

Abstract:

This study outlines the method of how to develop a surrogate life cycle model based on fuzzy logic using three fuzzy inference methods: (1) the conventional Fuzzy Inference System (FIS), (2) the hybrid system of Data Analytics and Fuzzy Inference (DAFIS), which uses data clustering for defining the membership functions, and (3) the Adaptive-Neuro Fuzzy Inference System (ANFIS), a combination of fuzzy inference and artificial neural network. These methods were demonstrated with a case study where the Global Warming Potential (GWP) and the Levelized Cost of Energy (LCOE) of solar photovoltaic (PV) were estimated using Solar Irradiation, Module Efficiency, and Performance Ratio as inputs. The effects of using different fuzzy inference types, either Sugeno- or Mamdani-type, and of changing the number of input membership functions to the error between the calibration data and the model-generated outputs were also illustrated. The solution spaces of the three methods were consequently examined with a sensitivity analysis. ANFIS exhibited the lowest error while DAFIS gave slightly lower errors compared to FIS. Increasing the number of input membership functions helped with error reduction in some cases but, at times, resulted in the opposite. Sugeno-type models gave errors that are slightly lower than those of the Mamdani-type. While ANFIS is superior in terms of error minimization, it could generate solutions that are questionable, i.e. the negative GWP values of the Solar PV system when the inputs were all at the upper end of their range. This shows that the applicability of the ANFIS models highly depends on the range of cases at which it was calibrated. FIS and DAFIS generated more intuitive trends in the sensitivity runs. DAFIS demonstrated an optimal design point wherein increasing the input values does not improve the GWP and LCOE anymore. In the absence of data that could be used for calibration, conventional FIS presents a knowledge-based model that could be used for prediction. In the PV case study, conventional FIS generated errors that are just slightly higher than those of DAFIS. The inherent complexity of a Life Cycle study often hinders its widespread use in the industry and policy-making sectors. While the methodology does not guarantee a more accurate result compared to those generated by the Life Cycle Methodology, it does provide a relatively simpler way of generating knowledge- and data-based estimates that could be used during the initial design of a system.

Keywords: solar photovoltaic, fuzzy logic, inference system, artificial neural networks

Procedia PDF Downloads 122
24019 BIM Data and Digital Twin Framework: Preserving the Past and Predicting the Future

Authors: Mazharuddin Syed Ahmed

Abstract:

This research presents a framework used to develop The Ara Polytechnic College of Architecture Studies building “Kahukura” which is Green Building certified. This framework integrates the development of a smart building digital twin by utilizing Building Information Modelling (BIM) and its BIM maturity levels, including Levels of Development (LOD), eight dimensions of BIM, Heritage-BIM (H-BIM) and Facility Management BIM (FM BIM). The research also outlines a structured approach to building performance analysis and integration with the circular economy, encapsulated within a five-level digital twin framework. Starting with Level 1, the Descriptive Twin provides a live, editable visual replica of the built asset, allowing for specific data inclusion and extraction. Advancing to Level 2, the Informative Twin integrates operational and sensory data, enhancing data verification and system integration. At Level 3, the Predictive Twin utilizes operational data to generate insights and proactive management suggestions. Progressing to Level 4, the Comprehensive Twin simulates future scenarios, enabling robust “what-if” analyses. Finally, Level 5, the Autonomous Twin, represents the pinnacle of digital twin evolution, capable of learning and autonomously acting on behalf of users.

Keywords: building information modelling, circular economy integration, digital twin, predictive analytics

Procedia PDF Downloads 10
24018 Semantic Based Analysis in Complaint Management System with Analytics

Authors: Francis Alterado, Jennifer Enriquez

Abstract:

Semantic Based Analysis in Complaint Management System with Analytics is an enhanced tool of providing complaints by the clients as well as a mechanism for Palawan Polytechnic College to gather, process, and monitor status of these complaints. The study has a mobile application that serves as a remote facility of communication between the students and the school management on the issues encountered by the student and the solution of every complaint received. In processing the complaints, text mining and clustering algorithms were utilized. Every module of the systems was tested and based on the results; these are 100% free from error before integration was done. A system testing was also done by checking the expected functionality of the system which was 100% functional. The system was tested by 10 students by forwarding complaints to 10 departments. Based on results, the students were able to submit complaints, the system was able to process accordingly by identifying to which department the complaints are intended, and the concerned department was able to give feedback on the complaint received to the student. With this, the system gained 4.7 rating which means Excellent.

Keywords: technology adoption, emerging technology, issues challenges, algorithm, text mining, mobile technology

Procedia PDF Downloads 169
24017 Post Pandemic Mobility Analysis through Indexing and Sharding in MongoDB: Performance Optimization and Insights

Authors: Karan Vishavjit, Aakash Lakra, Shafaq Khan

Abstract:

The COVID-19 pandemic has pushed healthcare professionals to use big data analytics as a vital tool for tracking and evaluating the effects of contagious viruses. To effectively analyze huge datasets, efficient NoSQL databases are needed. The analysis of post-COVID-19 health and well-being outcomes and the evaluation of the effectiveness of government efforts during the pandemic is made possible by this research’s integration of several datasets, which cuts down on query processing time and creates predictive visual artifacts. We recommend applying sharding and indexing technologies to improve query effectiveness and scalability as the dataset expands. Effective data retrieval and analysis are made possible by spreading the datasets into a sharded database and doing indexing on individual shards. Analysis of connections between governmental activities, poverty levels, and post-pandemic well being is the key goal. We want to evaluate the effectiveness of governmental initiatives to improve health and lower poverty levels. We will do this by utilising advanced data analysis and visualisations. The findings provide relevant data that supports the advancement of UN sustainable objectives, future pandemic preparation, and evidence-based decision-making. This study shows how Big Data and NoSQL databases may be used to address problems with global health.

Keywords: big data, COVID-19, health, indexing, NoSQL, sharding, scalability, well being

Procedia PDF Downloads 38
24016 A Text Classification Approach Based on Natural Language Processing and Machine Learning Techniques

Authors: Rim Messaoudi, Nogaye-Gueye Gning, François Azelart

Abstract:

Automatic text classification applies mostly natural language processing (NLP) and other AI-guided techniques to automatically classify text in a faster and more accurate manner. This paper discusses the subject of using predictive maintenance to manage incident tickets inside the sociality. It focuses on proposing a tool that treats and analyses comments and notes written by administrators after resolving an incident ticket. The goal here is to increase the quality of these comments. Additionally, this tool is based on NLP and machine learning techniques to realize the textual analytics of the extracted data. This approach was tested using real data taken from the French National Railways (SNCF) company and was given a high-quality result.

Keywords: machine learning, text classification, NLP techniques, semantic representation

Procedia PDF Downloads 56
24015 Detecting Elderly Abuse in US Nursing Homes Using Machine Learning and Text Analytics

Authors: Minh Huynh, Aaron Heuser, Luke Patterson, Chris Zhang, Mason Miller, Daniel Wang, Sandeep Shetty, Mike Trinh, Abigail Miller, Adaeze Enekwechi, Tenille Daniels, Lu Huynh

Abstract:

Machine learning and text analytics have been used to analyze child abuse, cyberbullying, domestic abuse and domestic violence, and hate speech. However, to the authors’ knowledge, no research to date has used these methods to study elder abuse in nursing homes or skilled nursing facilities from field inspection reports. We used machine learning and text analytics methods to analyze 356,000 inspection reports, which have been extracted from CMS Form-2567 field inspections of US nursing homes and skilled nursing facilities between 2016 and 2021. Our algorithm detected occurrences of the various types of abuse, including physical abuse, psychological abuse, verbal abuse, sexual abuse, and passive and active neglect. For example, to detect physical abuse, our algorithms search for combinations or phrases and words suggesting willful infliction of damage (hitting, pinching or burning, tethering, tying), or consciously ignoring an emergency. To detect occurrences of elder neglect, our algorithm looks for combinations or phrases and words suggesting both passive neglect (neglecting vital needs, allowing malnutrition and dehydration, allowing decubiti, deprivation of information, limitation of freedom, negligence toward safety precautions) and active neglect (intimidation and name-calling, tying the victim up to prevent falls without consent, consciously ignoring an emergency, not calling a physician in spite of indication, stopping important treatments, failure to provide essential care, deprivation of nourishment, leaving a person alone for an inappropriate amount of time, excessive demands in a situation of care). We further compare the prevalence of abuse before and after Covid-19 related restrictions on nursing home visits. We also identified the facilities with the most number of cases of abuse with no abuse facilities within a 25-mile radius as most likely candidates for additional inspections. We also built an interactive display to visualize the location of these facilities.

Keywords: machine learning, text analytics, elder abuse, elder neglect, nursing home abuse

Procedia PDF Downloads 111
24014 Critically Analyzing the Application of Big Data for Smart Transportation: A Case Study of Mumbai

Authors: Tanuj Joshi

Abstract:

Smart transportation is fast emerging as a solution to modern cities’ approach mobility issues, delayed emergency response rate and high congestion on streets. Present day scenario with Google Maps, Waze, Yelp etc. demonstrates how information and communications technologies controls the intelligent transportation system. This intangible and invisible infrastructure is largely guided by the big data analytics. On the other side, the exponential increase in Indian urban population has intensified the demand for better services and infrastructure to satisfy the transportation needs of its citizens. No doubt, India’s huge internet usage is looked as an important resource to guide to achieve this. However, with a projected number of over 40 billion objects connected to the Internet by 2025, the need for systems to handle massive volume of data (big data) also arises. This research paper attempts to identify the ways of exploiting the big data variables which will aid commuters on Indian tracks. This study explores real life inputs by conducting survey and interviews to identify which gaps need to be targeted to better satisfy the customers. Several experts at Mumbai Metropolitan Region Development Authority (MMRDA), Mumbai Metro and Brihanmumbai Electric Supply and Transport (BEST) were interviewed regarding the Information Technology (IT) systems currently in use. The interviews give relevant insights and requirements into the workings of public transportation systems whereas the survey investigates the macro situation.

Keywords: smart transportation, mobility issue, Mumbai transportation, big data, data analysis

Procedia PDF Downloads 143