Search results for: Data Mining
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 25529

Search results for: Data Mining

25079 Modeling Food Popularity Dependencies Using Social Media Data

Authors: DEVASHISH KHULBE, MANU PATHAK

Abstract:

The rise in popularity of major social media platforms have enabled people to share photos and textual information about their daily life. One of the popular topics about which information is shared is food. Since a lot of media about food are attributed to particular locations and restaurants, information like spatio-temporal popularity of various cuisines can be analyzed. Tracking the popularity of food types and retail locations across space and time can also be useful for business owners and restaurant investors. In this work, we present an approach using off-the shelf machine learning techniques to identify trends and popularity of cuisine types in an area using geo-tagged data from social media, Google images and Yelp. After adjusting for time, we use the Kernel Density Estimation to get hot spots across the location and model the dependencies among food cuisines popularity using Bayesian Networks. We consider the Manhattan borough of New York City as the location for our analyses but the approach can be used for any area with social media data and information about retail businesses.

Keywords: Web Mining, Geographic Information Systems, Business popularity, Spatial Data Analyses

Procedia PDF Downloads 115
25078 EDM for Prediction of Academic Trends and Patterns

Authors: Trupti Diwan

Abstract:

Predicting student failure at school has changed into a difficult challenge due to both the large number of factors that can affect the reduced performance of students and the imbalanced nature of these kinds of data sets. This paper surveys the two elements needed to make prediction on Students’ Academic Performances which are parameters and methods. This paper also proposes a framework for predicting the performance of engineering students. Genetic programming can be used to predict student failure/success. Ranking algorithm is used to rank students according to their credit points. The framework can be used as a basis for the system implementation & prediction of students’ Academic Performance in Higher Learning Institute.

Keywords: classification, educational data mining, student failure, grammar-based genetic programming

Procedia PDF Downloads 422
25077 Measurement of Natural Radioactivity and Health Hazard Index Evaluation in Major Soils of Tin Mining Areas of Perak

Authors: Habila Nuhu

Abstract:

Natural radionuclides in the environment can significantly contribute to human exposure to ionizing radiation. The knowledge of their levels in an environment can help the radiological protection agencies in policymaking. Measurement of natural radioactivity in major soils in the tin mining state of Perak Malaysia has been conducted using an HPGe detector. Seventy (70) soil samples were collected at widely distributed locations in the state. Six major soil types were sampled, and thirteen districts around the state were covered. The following were the results of the 226Ra (238U), 228Ra (232Th), and 40K activity in the soil samples: 226Ra (238U) has a mean activity concentration of 191.83 Bq kg⁻¹, more than five times the UNSCEAR reference limits of 35 Bq kg⁻¹. The mean activity concentration of 228Ra (232Th) with a value of 232.41 Bq kg⁻¹ is over seven times the UNSCEAR reference values of 30 Bq kg⁻¹. The average concentration of 40K activity was 275.24 Bq kg⁻¹, which was less than the UNSCEAR reference limit of 400 Bq Kg⁻¹. The range of external hazards index (Hₑₓ) values was from 1.03 to 2.05, while the internal hazards index (Hin) was from 1.48 to 3.08. The Hex and Hin should be less than one for minimal external and internal radiation threats as well as secure use of soil material for building construction. The Hₑₓ and Hin results generally indicate that while using the soil types and their derivatives as building materials in the study area, care must be taken.

Keywords: activity concentration, hazard index, soil samples, tin mining

Procedia PDF Downloads 111
25076 Application of a Modified Crank-Nicolson Method in Metallurgy

Authors: Kobamelo Mashaba

Abstract:

The molten slag has a high substantial temperatures range between 1723-1923, carrying a huge amount of useful energy for reducing energy consumption and CO₂ emissions under the heat recovery process. Therefore in this study, we investigated the performance of the modified crank Nicolson method for a delayed partial differential equation on the heat recovery of molten slag in the metallurgical mining environment. It was proved that the proposed method converges quickly compared to the classic method with the existence of a unique solution. It was inferred from numerical result that the proposed methodology is more viable and profitable for the mining industry.

Keywords: delayed partial differential equation, modified Crank-Nicolson Method, molten slag, heat recovery, parabolic equation

Procedia PDF Downloads 101
25075 Data-Driven Decision Making: A Reference Model for Organizational, Educational and Competency-Based Learning Systems

Authors: Emanuel Koseos

Abstract:

Data-Driven Decision Making (DDDM) refers to making decisions that are based on historical data in order to inform practice, develop strategies and implement policies that benefit organizational settings. In educational technology, DDDM facilitates the implementation of differential educational learning approaches such as Educational Data Mining (EDM) and Competency-Based Education (CBE), which commonly target university classrooms. There is a current need for DDDM models applied to middle and secondary schools from a concern for assessing the needs, progress and performance of students and educators with respect to regional standards, policies and evolution of curriculums. To address these concerns, we propose a DDDM reference model developed using educational key process initiatives as inputs to a machine learning framework implemented with statistical software (SAS, R) to provide a best-practices, complex-free and automated approach for educators at their regional level. We assessed the efficiency of the model over a six-year period using data from 45 schools and grades K-12 in the Langley, BC, Canada regional school district. We concluded that the model has wider appeal, such as business learning systems.

Keywords: competency-based learning, data-driven decision making, machine learning, secondary schools

Procedia PDF Downloads 173
25074 Automated Prediction of HIV-associated Cervical Cancer Patients Using Data Mining Techniques for Survival Analysis

Authors: O. J. Akinsola, Yinan Zheng, Rose Anorlu, F. T. Ogunsola, Lifang Hou, Robert Leo-Murphy

Abstract:

Cervical Cancer (CC) is the 2nd most common cancer among women living in low and middle-income countries, with no associated symptoms during formative periods. With the advancement and innovative medical research, there are numerous preventive measures being utilized, but the incidence of cervical cancer cannot be truncated with the application of only screening tests. The mortality associated with this invasive cervical cancer can be nipped in the bud through the important role of early-stage detection. This study research selected an array of different top features selection techniques which was aimed at developing a model that could validly diagnose the risk factors of cervical cancer. A retrospective clinic-based cohort study was conducted on 178 HIV-associated cervical cancer patients in Lagos University teaching Hospital, Nigeria (U54 data repository) in April 2022. The outcome measure was the automated prediction of the HIV-associated cervical cancer cases, while the predictor variables include: demographic information, reproductive history, birth control, sexual history, cervical cancer screening history for invasive cervical cancer. The proposed technique was assessed with R and Python programming software to produce the model by utilizing the classification algorithms for the detection and diagnosis of cervical cancer disease. Four machine learning classification algorithms used are: the machine learning model was split into training and testing dataset into ratio 80:20. The numerical features were also standardized while hyperparameter tuning was carried out on the machine learning to train and test the data. Logistic Regression (LR), Decision Tree (DT), Random Forest (RF), and K-Nearest Neighbor (KNN). Some fitting features were selected for the detection and diagnosis of cervical cancer diseases from selected characteristics in the dataset using the contribution of various selection methods for the classification cervical cancer into healthy or diseased status. The mean age of patients was 49.7±12.1 years, mean age at pregnancy was 23.3±5.5 years, mean age at first sexual experience was 19.4±3.2 years, while the mean BMI was 27.1±5.6 kg/m2. A larger percentage of the patients are Married (62.9%), while most of them have at least two sexual partners (72.5%). Age of patients (OR=1.065, p<0.001**), marital status (OR=0.375, p=0.011**), number of pregnancy live-births (OR=1.317, p=0.007**), and use of birth control pills (OR=0.291, p=0.015**) were found to be significantly associated with HIV-associated cervical cancer. On top ten 10 features (variables) considered in the analysis, RF claims the overall model performance, which include: accuracy of (72.0%), the precision of (84.6%), a recall of (84.6%) and F1-score of (74.0%) while LR has: an accuracy of (74.0%), precision of (70.0%), recall of (70.0%) and F1-score of (70.0%). The RF model identified 10 features predictive of developing cervical cancer. The age of patients was considered as the most important risk factor, followed by the number of pregnancy livebirths, marital status, and use of birth control pills, The study shows that data mining techniques could be used to identify women living with HIV at high risk of developing cervical cancer in Nigeria and other sub-Saharan African countries.

Keywords: associated cervical cancer, data mining, random forest, logistic regression

Procedia PDF Downloads 83
25073 Design and Development of a Computerized Medical Record System for Hospitals in Remote Areas

Authors: Grace Omowunmi Soyebi

Abstract:

A computerized medical record system is a collection of medical information about a person that is stored on a computer. One principal problem of most hospitals in rural areas is using the file management system for keeping records. A lot of time is wasted when a patient visits the hospital, probably in an emergency, and the nurse or attendant has to search through voluminous files before the patient's file can be retrieved, this may cause an unexpected to happen to the patient. This Data Mining application is to be designed using a Structured System Analysis and design method which will help in a well-articulated analysis of the existing file management system, feasibility study, and proper documentation of the Design and Implementation of a Computerized medical record system. This Computerized system will replace the file management system and help to quickly retrieve a patient's record with increased data security, access clinical records for decision-making, and reduce the time range at which a patient gets attended to.

Keywords: programming, computing, data, innovation

Procedia PDF Downloads 119
25072 Occupational Safety and Health in the Wake of Drones

Authors: Hoda Rahmani, Gary Weckman

Abstract:

The body of research examining the integration of drones into various industries is expanding rapidly. Despite progress made in addressing the cybersecurity concerns for commercial drones, knowledge deficits remain in determining potential occupational hazards and risks of drone use to employees’ well-being and health in the workplace. This creates difficulty in identifying key approaches to risk mitigation strategies and thus reflects the need for raising awareness among employers, safety professionals, and policymakers about workplace drone-related accidents. The purpose of this study is to investigate the prevalence of and possible risk factors for drone-related mishaps by comparing the application of drones in construction with manufacturing industries. The chief reason for considering these specific sectors is to ascertain whether there exists any significant difference between indoor and outdoor flights since most construction sites use drones outside and vice versa. Therefore, the current research seeks to examine the causes and patterns of workplace drone-related mishaps and suggest possible ergonomic interventions through data collection. Potential ergonomic practices to mitigate hazards associated with flying drones could include providing operators with professional pieces of training, conducting a risk analysis, and promoting the use of personal protective equipment. For the purpose of data analysis, two data mining techniques, the random forest and association rule mining algorithms, will be performed to find meaningful associations and trends in data as well as influential features that have an impact on the occurrence of drone-related accidents in construction and manufacturing sectors. In addition, Spearman’s correlation and chi-square tests will be used to measure the possible correlation between different variables. Indeed, by recognizing risks and hazards, occupational safety stakeholders will be able to pursue data-driven and evidence-based policy change with the aim of reducing drone mishaps, increasing productivity, creating a safer work environment, and extending human performance in safe and fulfilling ways. This research study was supported by the National Institute for Occupational Safety and Health through the Pilot Research Project Training Program of the University of Cincinnati Education and Research Center Grant #T42OH008432.

Keywords: commercial drones, ergonomic interventions, occupational safety, pattern recognition

Procedia PDF Downloads 209
25071 Brainbow Image Segmentation Using Bayesian Sequential Partitioning

Authors: Yayun Hsu, Henry Horng-Shing Lu

Abstract:

This paper proposes a data-driven, biology-inspired neural segmentation method of 3D drosophila Brainbow images. We use Bayesian Sequential Partitioning algorithm for probabilistic modeling, which can be used to detect somas and to eliminate cross talk effects. This work attempts to develop an automatic methodology for neuron image segmentation, which nowadays still lacks a complete solution due to the complexity of the image. The proposed method does not need any predetermined, risk-prone thresholds since biological information is inherently included in the image processing procedure. Therefore, it is less sensitive to variations in neuron morphology; meanwhile, its flexibility would be beneficial for tracing the intertwining structure of neurons.

Keywords: brainbow, 3D imaging, image segmentation, neuron morphology, biological data mining, non-parametric learning

Procedia PDF Downloads 487
25070 The Reduction of Post-Blast Fumes to Improve Productivity and Safety: A Review Paper

Authors: Nhleko Monique Chiloane

Abstract:

The gold mining industry has predominantly used ammonium nitrate fuel oil (ANFO) explosives for decades, although these are known to be “gassier” and their detonation results in toxic fumes, for example, carbon monoxide (CO), nitrogen oxides (NOx) and ammonia. Re-entry into underground workings too soon after blasting can lead to fatal exposure to toxic fumes. It is, therefore, required that the polluted air be removed from the affected areas within a reasonable period before employees' re-entry into the working area. Post-blast re-entry times have therefore been described as a productivity bottleneck. The known causes of post-blast fumes are water ingress, incorrect fuel to oxygen ratio, confinement, explosive additives etc. To prevent or minimize post-blast fumes, some researchers have used neutralization, re-burning technique and non-explosive products or different oxidizing agents. The use of commercial explosives without nitrate oxidizing agents can also minimize the production of blasting fumes and thereby reduce the time needed for the clearance of these fumes to allow workers to re-enter the underground workings safely. The reduction in non-production time directly contributes to an increase in the available time per shift for productive work, thus leading to continuous mining. However, owing to its low cost and ease of use, ANFO is still widely used in South African underground blasting operations.

Keywords: post-blast fumes, continuous mining, ammonium nitrate explosive, non-explosive blasting, re-entry period

Procedia PDF Downloads 183
25069 Improvement of Microstructure, Wear and Mechanical Properties of Modified G38NiCrMo8-4-4 Steel Used in Mining Industry

Authors: Mustafa Col, Funda Gul Koc, Merve Yangaz, Eylem Subasi, Can Akbasoglu

Abstract:

G38NiCrMo8-4-4 steel is widely used in mining industries, machine parts, gears due to its high strength and toughness properties. In this study, microstructure, wear and mechanical properties of G38NiCrMo8-4-4 steel modified with boron used in the mining industry were investigated. For this purpose, cast materials were alloyed by melting in an induction furnace to include boron with the rates of 0 ppm, 15 ppm, and 50 ppm (wt.) and were formed in the dimensions of 150x200x150 mm by casting into the sand mould. Homogenization heat treatment was applied to the specimens at 1150˚C for 7 hours. Then all specimens were austenitized at 930˚C for 1 hour, quenched in the polymer solution and tempered at 650˚C for 1 hour. Microstructures of the specimens were investigated by using light microscope and SEM to determine the effect of boron and heat treatment conditions. Changes in microstructure properties and material hardness were obtained due to increasing boron content and heat treatment conditions after microstructure investigations and hardness tests. Wear tests were carried out using a pin-on-disc tribometer under dry sliding conditions. Charpy V notch impact test was performed to determine the toughness properties of the specimens. Fracture and worn surfaces were investigated with scanning electron microscope (SEM). The results show that boron element has a positive effect on the hardness and wear properties of G38NiCrMo8-4-4 steel.

Keywords: G38NiCrMo8-4-4 steel, boron, heat treatment, microstructure, wear, mechanical properties

Procedia PDF Downloads 195
25068 Impact of Coal Mining on River Sediment Quality in the Sydney Basin, Australia

Authors: A. Ali, V. Strezov, P. Davies, I. Wright, T. Kan

Abstract:

The environmental impacts arising from mining activities affect the air, water, and soil quality. Impacts may result in unexpected and adverse environmental outcomes. This study reports on the impact of coal production on sediment in Sydney region of Australia. The sediment samples upstream and downstream from the discharge points from three mines were taken, and 80 parameters were tested. The results were assessed against sediment quality based on presence of metals. The study revealed the increment of metal content in the sediment downstream of the reference locations. In many cases, the sediment was above the Australia and New Zealand Environment Conservation Council and international sediment quality guidelines value (SQGV). The major outliers to the guidelines were nickel (Ni) and zinc (Zn).

Keywords: coal mine, environmental impact, produced water, sediment quality guidelines value (SQGV)

Procedia PDF Downloads 304
25067 A Methodology for Developing New Technology Ideas to Avoid Patent Infringement: F-Term Based Patent Analysis

Authors: Kisik Song, Sungjoo Lee

Abstract:

With the growing importance of intangible assets recently, the impact of patent infringement on the business of a company has become more evident. Accordingly, it is essential for firms to estimate the risk of patent infringement risk before developing a technology and create new technology ideas to avoid the risk. Recognizing the needs, several attempts have been made to help develop new technology opportunities and most of them have focused on identifying emerging vacant technologies from patent analysis. In these studies, the IPC (International Patent Classification) system or keywords from text-mining application to patent documents was generally used to define vacant technologies. Unlike those studies, this study adopted F-term, which classifies patent documents according to the technical features of the inventions described in them. Since the technical features are analyzed by various perspectives by F-term, F-term provides more detailed information about technologies compared to IPC while more systematic information compared to keywords. Therefore, if well utilized, it can be a useful guideline to create a new technology idea. Recognizing the potential of F-term, this paper aims to suggest a novel approach to developing new technology ideas to avoid patent infringement based on F-term. For this purpose, we firstly collected data about F-term and then applied text-mining to the descriptions about classification criteria and attributes. From the text-mining results, we could identify other technologies with similar technical features of the existing one, the patented technology. Finally, we compare the technologies and extract the technical features that are commonly used in other technologies but have not been used in the existing one. These features are presented in terms of “purpose”, “function”, “structure”, “material”, “method”, “processing and operation procedure” and “control means” and so are useful for creating new technology ideas that help avoid infringing patent rights of other companies. Theoretically, this is one of the earliest attempts to adopt F-term to patent analysis; the proposed methodology can show how to best take advantage of F-term with the wealth of technical information. In practice, the proposed methodology can be valuable in the ideation process for successful product and service innovation without infringing the patents of other companies.

Keywords: patent infringement, new technology ideas, patent analysis, F-term

Procedia PDF Downloads 269
25066 Combination of Artificial Neural Network Model and Geographic Information System for Prediction Water Quality

Authors: Sirilak Areerachakul

Abstract:

Water quality has initiated serious management efforts in many countries. Artificial Neural Network (ANN) models are developed as forecasting tools in predicting water quality trend based on historical data. This study endeavors to automatically classify water quality. The water quality classes are evaluated using 6 factor indices. These factors are pH value (pH), Dissolved Oxygen (DO), Biochemical Oxygen Demand (BOD), Nitrate Nitrogen (NO3N), Ammonia Nitrogen (NH3N) and Total Coliform (T-Coliform). The methodology involves applying data mining techniques using multilayer perceptron (MLP) neural network models. The data consisted of 11 sites of Saen Saep canal in Bangkok, Thailand. The data is obtained from the Department of Drainage and Sewerage Bangkok Metropolitan Administration during 2007-2011. The results of multilayer perceptron neural network exhibit a high accuracy multilayer perception rate at 94.23% in classifying the water quality of Saen Saep canal in Bangkok. Subsequently, this encouraging result could be combined with GIS data improves the classification accuracy significantly.

Keywords: artificial neural network, geographic information system, water quality, computer science

Procedia PDF Downloads 343
25065 The Impact of Mining Activities on the Surface Water Quality: A Case Study of the Kaap River in Barberton, Mpumalanga

Authors: M. F. Mamabolo

Abstract:

Mining activities are identified as the most significant source of heavy metal contamination in river basins, due to inadequate disposal of mining waste thus resulting in acid mine drainage. Waste materials generated from gold mining and processing have severe and widespread impacts on water resources. Therefore, a total of 30 water samples were collected from Fig Tree Creek, Kaapriver, Sheba mine stream & Sauid kaap river to investigate the impact of gold mines on the Kaap River system. Physicochemical parameters (pH, EC and TDS) were taken using a BANTE 900P portable water quality meter. The concentration of Fe, Cu, Co, and SO₄²⁻ in water samples were analysed using Inductively Coupled Plasma-Mass spectrophotometry (ICP-MS) at 0.01 mg/L. The results were compared to the regulatory guideline of the World Health Organization (WHO) and the South Africa National Standards (SANS). It was found that Fe, Cu and Co were below the guideline values while SO₄²⁻ detected in Sheba mine stream exceeded the 250 mg/L limit for both seasons, attributed by mine wastewater. SO₄²⁻ was higher in wet season due to high evaporation rates and greater interaction between rocks and water. The pH of all the streams was within the limit (≥5 to ≤9.7), however EC of the Sheba mine stream, Suid Kaap River & where the tributary connects with the Fig Tree Creek exceeded 1700 uS/m, due to dissolved material. The TDS of Sheba mine stream exceeded 1000 mg/L, attributed by high SO₄²⁻ concentration. While the tributary connecting to the Fig Tree Creek exceed the value due to pollution from household waste, runoff from agriculture etc. In conclusion, the water from all sampled streams were safe for consumption due to low concentrations of physicochemical parameters. However, elevated concentration of SO₄²⁻ should be monitored and managed to avoid water quality deterioration in the Kaap River system.

Keywords: Kaap river system, mines, heavy metals, sulphate

Procedia PDF Downloads 81
25064 Unsupervised Domain Adaptive Text Retrieval with Query Generation

Authors: Rui Yin, Haojie Wang, Xun Li

Abstract:

Recently, mainstream dense retrieval methods have obtained state-of-the-art results on some datasets and tasks. However, they require large amounts of training data, which is not available in most domains. The severe performance degradation of dense retrievers on new data domains has limited the use of dense retrieval methods to only a few domains with large training datasets. In this paper, we propose an unsupervised domain-adaptive approach based on query generation. First, a generative model is used to generate relevant queries for each passage in the target corpus, and then the generated queries are used for mining negative passages. Finally, the query-passage pairs are labeled with a cross-encoder and used to train a domain-adapted dense retriever. Experiments show that our approach is more robust than previous methods in target domains that require less unlabeled data.

Keywords: dense retrieval, query generation, unsupervised training, text retrieval

Procedia PDF Downloads 73
25063 Artificial Reproduction System and Imbalanced Dataset: A Mendelian Classification

Authors: Anita Kushwaha

Abstract:

We propose a new evolutionary computational model called Artificial Reproduction System which is based on the complex process of meiotic reproduction occurring between male and female cells of the living organisms. Artificial Reproduction System is an attempt towards a new computational intelligence approach inspired by the theoretical reproduction mechanism, observed reproduction functions, principles and mechanisms. A reproductive organism is programmed by genes and can be viewed as an automaton, mapping and reducing so as to create copies of those genes in its off springs. In Artificial Reproduction System, the binding mechanism between male and female cells is studied, parameters are chosen and a network is constructed also a feedback system for self regularization is established. The model then applies Mendel’s law of inheritance, allele-allele associations and can be used to perform data analysis of imbalanced data, multivariate, multiclass and big data. In the experimental study Artificial Reproduction System is compared with other state of the art classifiers like SVM, Radial Basis Function, neural networks, K-Nearest Neighbor for some benchmark datasets and comparison results indicates a good performance.

Keywords: bio-inspired computation, nature- inspired computation, natural computing, data mining

Procedia PDF Downloads 272
25062 Design and Development of a Computerized Medical Record System for Hospitals in Remote Areas

Authors: Grace Omowunmi Soyebi

Abstract:

A computerized medical record system is a collection of medical information about a person that is stored on a computer. One principal problem of most hospitals in rural areas is using the file management system for keeping records. A lot of time is wasted when a patient visits the hospital, probably in an emergency, and the nurse or attendant has to search through voluminous files before the patient's file can be retrieved; this may cause an unexpected to happen to the patient. This data mining application is to be designed using a structured system analysis and design method which will help in a well-articulated analysis of the existing file management system, feasibility study, and proper documentation of the design and implementation of a computerized medical record system. This computerized system will replace the file management system and help to quickly retrieve a patient's record with increased data security, access clinical records for decision-making, and reduce the time range at which a patient gets attended to.

Keywords: programming, data, software development, innovation

Procedia PDF Downloads 87
25061 Statistical Analysis to Select Evacuation Route

Authors: Zaky Musyarof, Dwi Yono Sutarto, Dwima Rindy Atika, R. B. Fajriya Hakim

Abstract:

Each country should be responsible for the safety of people, especially responsible for the safety of people living in disaster-prone areas. One of those services is provides evacuation route for them. But all this time, the selection of evacuation route is seem doesn’t well organized, it could be seen that when a disaster happen, there will be many accumulation of people on the steps of evacuation route. That condition is dangerous to people because hampers evacuation process. By some methods in Statistical analysis, author tries to give a suggestion how to prepare evacuation route which is organized and based on people habit. Those methods are association rules, sequential pattern mining, hierarchical cluster analysis and fuzzy logic.

Keywords: association rules, sequential pattern mining, cluster analysis, fuzzy logic, evacuation route

Procedia PDF Downloads 504
25060 Cultural Dynamics in Online Consumer Behavior: Exploring Cross-Country Variances in Review Influence

Authors: Eunjung Lee

Abstract:

This research investigates the intricate connection between cultural differences and online consumer behaviors by integrating Hofstede's Cultural Dimensions theory with analysis methodologies such as text mining, data mining, and topic analysis. Our aim is to provide a comprehensive understanding of how national cultural differences influence individuals' behaviors when engaging with online reviews. To ensure the relevance of our investigation, we systematically analyze and interpret the cultural nuances influencing online consumer behaviors, especially in the context of online reviews. By anchoring our research in Hofstede's Cultural Dimensions theory, we seek to offer valuable insights for marketers to tailor their strategies based on the cultural preferences of diverse global consumer bases. In our methodology, we employ advanced text mining techniques to extract insights from a diverse range of online reviews gathered globally for a specific product or service like Netflix. This approach allows us to reveal hidden cultural cues in the language used by consumers from various backgrounds. Complementing text mining, data mining techniques are applied to extract meaningful patterns from online review datasets collected from different countries, aiming to unveil underlying structures and gain a deeper understanding of the impact of cultural differences on online consumer behaviors. The study also integrates topic analysis to identify recurring subjects, sentiments, and opinions within online reviews. Marketers can leverage these insights to inform the development of culturally sensitive strategies, enhance target audience segmentation, and refine messaging approaches aligned with cultural preferences. Anchored in Hofstede's Cultural Dimensions theory, our research employs sophisticated methodologies to delve into the intricate relationship between cultural differences and online consumer behaviors. Applied to specific cultural dimensions, such as individualism vs. collectivism, masculinity vs. femininity, uncertainty avoidance, and long-term vs. short-term orientation, the study uncovers nuanced insights. For example, in exploring individualism vs. collectivism, we examine how reviewers from individualistic cultures prioritize personal experiences while those from collectivistic cultures emphasize communal opinions. Similarly, within masculinity vs. femininity, we investigate whether distinct topics align with cultural notions, such as robust features in masculine cultures and user-friendliness in feminine cultures. Examining information-seeking behaviors under uncertainty avoidance reveals how cultures differ in seeking detailed information or providing succinct reviews based on their comfort with ambiguity. Additionally, in assessing long-term vs. short-term orientation, the research explores how cultural focus on enduring benefits or immediate gratification influences reviews. These concrete examples contribute to the theoretical enhancement of Hofstede's Cultural Dimensions theory, providing a detailed understanding of cultural impacts on online consumer behaviors. As online reviews become increasingly crucial in decision-making, this research not only contributes to the academic understanding of cultural influences but also proposes practical recommendations for enhancing online review systems. Marketers can leverage these findings to design targeted and culturally relevant strategies, ultimately enhancing their global marketing effectiveness and optimizing online review systems for maximum impact.

Keywords: comparative analysis, cultural dimensions, marketing intelligence, national culture, online consumer behavior, text mining

Procedia PDF Downloads 47
25059 Data Transformations in Data Envelopment Analysis

Authors: Mansour Mohammadpour

Abstract:

Data transformation refers to the modification of any point in a data set by a mathematical function. When applying transformations, the measurement scale of the data is modified. Data transformations are commonly employed to turn data into the appropriate form, which can serve various functions in the quantitative analysis of the data. This study addresses the investigation of the use of data transformations in Data Envelopment Analysis (DEA). Although data transformations are important options for analysis, they do fundamentally alter the nature of the variable, making the interpretation of the results somewhat more complex.

Keywords: data transformation, data envelopment analysis, undesirable data, negative data

Procedia PDF Downloads 20
25058 Seasonal Variation of the Impact of Mining Activities on Ga-Selati River in Limpopo Province, South Africa

Authors: Joshua N. Edokpayi, John O. Odiyo, Patience P. Shikwambana

Abstract:

Water is a very rare natural resource in South Africa. Ga-Selati River is used for both domestic and industrial purposes. This study was carried out in order to assess the quality of Ga-Selati River in a mining area of Limpopo Province-Phalaborwa. The pH, Electrical Conductivity (EC) and Total Dissolved Solids (TDS) were determined using a Crinson multimeter while turbidity was measured using a Labcon Turbidimeter. The concentrations of Al, Ca, Cd, Cr, Fe, K, Mg, Mn, Na and Pb were analysed in triplicate using a Varian 520 flame atomic absorption spectrometer (AAS) supplied by PerkinElmer, after acid digestion with nitric acid in a fume cupboard. The average pH of the river from eight different sampling sites was 8.00 and 9.38 in wet and dry season respectively. Higher EC values were determined in the dry season (138.7 mS/m) than in the wet season (96.93 mS/m). Similarly, TDS values were higher in dry (929.29 mg/L) than in the wet season (640.72 mg/L) season. These values exceeded the recommended guideline of South Africa Department of Water Affairs and Forestry (DWAF) for domestic water use (70 mS/m) and that of the World Health Organization (WHO) (600 mS/m), respectively. Turbidity varied between 1.78-5.20 and 0.95-2.37 NTU in both wet and dry seasons. Total hardness of 312.50 mg/L and 297.75 mg/L as the concentration of CaCO3 was computed for the river in both the wet and the dry seasons and the river water was categorised as very hard. Mean concentration of the metals studied in both the wet and the dry seasons are: Na (94.06 mg/L and 196.3 mg/L), K (11.79 mg/L and 13.62 mg/L), Ca (45.60 mg/L and 41.30 mg/L), Mg (48.41 mg/L and 44.71 mg/L), Al (0.31 mg/L and 0.38 mg/L), Cd (0.01 mg/L and 0.01 mg/L), Cr (0.02 mg/L and 0.09 mg/L), Pb (0.05 mg/L and 0.06 mg/L), Mn (0.31 mg/L and 0.11 mg/L) and Fe (0.76 mg/L and 0.69 mg/L). Results from this study reveal that most of the metals were present in concentrations higher than the recommended guidelines of DWAF and WHO for domestic use and the protection of aquatic life.

Keywords: contamination, mining activities, surface water, trace metals

Procedia PDF Downloads 317
25057 Heart Ailment Prediction Using Machine Learning Methods

Authors: Abhigyan Hedau, Priya Shelke, Riddhi Mirajkar, Shreyash Chaple, Mrunali Gadekar, Himanshu Akula

Abstract:

The heart is the coordinating centre of the major endocrine glandular structure of the body, which produces hormones that profoundly affect the operations of the body, and diagnosing cardiovascular disease is a difficult but critical task. By extracting knowledge and information about the disease from patient data, data mining is a more practical technique to help doctors detect disorders. We use a variety of machine learning methods here, including logistic regression and support vector classifiers (SVC), K-nearest neighbours Classifiers (KNN), Decision Tree Classifiers, Random Forest classifiers and Gradient Boosting classifiers. These algorithms are applied to patient data containing 13 different factors to build a system that predicts heart disease in less time with more accuracy.

Keywords: logistic regression, support vector classifier, k-nearest neighbour, decision tree, random forest and gradient boosting

Procedia PDF Downloads 50
25056 Destination Port Detection For Vessels: An Analytic Tool For Optimizing Port Authorities Resources

Authors: Lubna Eljabu, Mohammad Etemad, Stan Matwin

Abstract:

Port authorities have many challenges in congested ports to allocate their resources to provide a safe and secure loading/ unloading procedure for cargo vessels. Selecting a destination port is the decision of a vessel master based on many factors such as weather, wavelength and changes of priorities. Having access to a tool which leverages AIS messages to monitor vessel’s movements and accurately predict their next destination port promotes an effective resource allocation process for port authorities. In this research, we propose a method, namely, Reference Route of Trajectory (RRoT) to assist port authorities in predicting inflow and outflow traffic in their local environment by monitoring Automatic Identification System (AIS) messages. Our RRoT method creates a reference route based on historical AIS messages. It utilizes some of the best trajectory similarity measure to identify the destination of a vessel using their recent movement. We evaluated five different similarity measures such as Discrete Fr´echet Distance (DFD), Dynamic Time Warping (DTW), Partial Curve Mapping (PCM), Area between two curves (Area) and Curve length (CL). Our experiments show that our method identifies the destination port with an accuracy of 98.97% and an fmeasure of 99.08% using Dynamic Time Warping (DTW) similarity measure.

Keywords: spatial temporal data mining, trajectory mining, trajectory similarity, resource optimization

Procedia PDF Downloads 121
25055 Study of the Transport of ²²⁶Ra Colloidal in Mining Context Using a Multi-Disciplinary Approach

Authors: Marine Reymond, Michael Descostes, Marie Muguet, Clemence Besancon, Martine Leermakers, Catherine Beaucaire, Sophie Billon, Patricia Patrier

Abstract:

²²⁶Ra is one of the radionuclides resulting from the disintegration of ²³⁸U. Due to its half-life (1600 y) and its high specific activity (3.7 x 1010 Bq/g), ²²⁶Ra is found at the ultra-trace level in the natural environment (usually below 1 Bq/L, i.e. 10-13 mol/L). Because of its decay in ²²²Rn, a radioactive gas with a shorter half-life (3.8 days) which is difficult to control and dangerous for humans when inhaled, ²²⁶Ra is subject to a dedicated monitoring in surface waters especially in the context of uranium mining. In natural waters, radionuclides occur in dissolved, colloidal or particular forms. Due to the size of colloids, generally ranging between 1 nm and 1 µm and their high specific surface areas, the colloidal fraction could be involved in the transport of trace elements, including radionuclides in the environment. The colloidal fraction is not always easy to determine and few existing studies focus on ²²⁶Ra. In the present study, a complete multidisciplinary approach is proposed to assess the colloidal transport of ²²⁶Ra. It includes water sampling by conventional filtration (0.2µm) and the innovative Diffusive Gradient in Thin Films technique to measure the dissolved fraction (<10nm), from which the colloidal fraction could be estimated. Suspended matter in these waters were also sampled and characterized mineralogically by X-Ray Diffraction, infrared spectroscopy and scanning electron microscopy. All of these data, which were acquired on a rehabilitated former uranium mine, allowed to build a geochemical model using the geochemical calculation code PhreeqC to describe, as accurately as possible, the colloidal transport of ²²⁶Ra. Colloidal transport of ²²⁶Ra was found, for some of the sampling points, to account for up to 95% of the total ²²⁶Ra measured in water. Mineralogical characterization and associated geochemical modelling highlight the role of barite, a barium sulfate mineral well known to trap ²²⁶Ra into its structure. Barite was shown to be responsible for the colloidal ²²⁶Ra fraction despite the presence of kaolinite and ferrihydrite, which are also known to retain ²²⁶Ra by sorption.

Keywords: colloids, mining context, radium, transport

Procedia PDF Downloads 156
25054 Convergence and Stability in Federated Learning with Adaptive Differential Privacy Preservation

Authors: Rizwan Rizwan

Abstract:

This paper provides an overview of Federated Learning (FL) and its application in enhancing data security, privacy, and efficiency. FL utilizes three distinct architectures to ensure privacy is never compromised. It involves training individual edge devices and aggregating their models on a server without sharing raw data. This approach not only provides secure models without data sharing but also offers a highly efficient privacy--preserving solution with improved security and data access. Also we discusses various frameworks used in FL and its integration with machine learning, deep learning, and data mining. In order to address the challenges of multi--party collaborative modeling scenarios, a brief review FL scheme combined with an adaptive gradient descent strategy and differential privacy mechanism. The adaptive learning rate algorithm adjusts the gradient descent process to avoid issues such as model overfitting and fluctuations, thereby enhancing modeling efficiency and performance in multi-party computation scenarios. Additionally, to cater to ultra-large-scale distributed secure computing, the research introduces a differential privacy mechanism that defends against various background knowledge attacks.

Keywords: federated learning, differential privacy, gradient descent strategy, convergence, stability, threats

Procedia PDF Downloads 30
25053 Information Needs and Information Usage of the Older Person Club’s Members in Bangkok

Authors: Siriporn Poolsuwan

Abstract:

This research aims to explore the information needs, information usages, and problems of information usage of the older people club’s members in Dusit District, Bangkok. There are 12 clubs and 746 club’s members in this district. The research results use for older person service in this district. Data is gathered from 252 club’s members by using questionnaires. The quantitative approach uses in research by percentage, means and standard deviation. The results are as follows (1) The older people need Information for entertainment, occupation and academic in the field of short story, computer work, and religion and morality. (2) The participants use Information from various sources. (3) The Problem of information usage is their language skills because of the older people’s literacy problem.

Keywords: information behavior, older person, information seeking, knowledge discovery and data mining

Procedia PDF Downloads 270
25052 A Survey on Compression Methods for Table Constraints

Authors: N. Gharbi

Abstract:

Constraint Satisfaction problems are mathematical problems that are often used to model many real-world problems for which we look if there exists a solution satisfying all its constraints. Table constraints are important for modeling parts of many problems since they list all combinations of allowed or forbidden values. However, they admit practical limitations because they are sometimes too large to be represented in a direct way. In this paper, we present a survey of the different categories of the proposed approaches to compress table constraints in order to reduce both space and time complexities.

Keywords: constraint programming, compression, data mining, table constraints

Procedia PDF Downloads 325
25051 A Method to Evaluate and Compare Web Information Extractors

Authors: Patricia Jiménez, Rafael Corchuelo, Hassan A. Sleiman

Abstract:

Web mining is gaining importance at an increasing pace. Currently, there are many complementary research topics under this umbrella. Their common theme is that they all focus on applying knowledge discovery techniques to data that is gathered from the Web. Sometimes, these data are relatively easy to gather, chiefly when it comes from server logs. Unfortunately, there are cases in which the data to be mined is the data that is displayed on a web document. In such cases, it is necessary to apply a pre-processing step to first extract the information of interest from the web documents. Such pre-processing steps are performed using so-called information extractors, which are software components that are typically configured by means of rules that are tailored to extracting the information of interest from a web page and structuring it according to a pre-defined schema. Paramount to getting good mining results is that the technique used to extract the source information is exact, which requires to evaluate and compare the different proposals in the literature from an empirical point of view. According to Google Scholar, about 4 200 papers on information extraction have been published during the last decade. Unfortunately, they were not evaluated within a homogeneous framework, which leads to difficulties to compare them empirically. In this paper, we report on an original information extraction evaluation method. Our contribution is three-fold: a) this is the first attempt to provide an evaluation method for proposals that work on semi-structured documents; the little existing work on this topic focuses on proposals that work on free text, which has little to do with extracting information from semi-structured documents. b) It provides a method that relies on statistically sound tests to support the conclusions drawn; the previous work does not provide clear guidelines or recommend statistically sound tests, but rather a survey that collects many features to take into account as well as related work; c) We provide a novel method to compute the performance measures regarding unsupervised proposals; otherwise they would require the intervention of a user to compute them by using the annotations on the evaluation sets and the information extracted. Our contributions will definitely help researchers in this area make sure that they have advanced the state of the art not only conceptually, but from an empirical point of view; it will also help practitioners make informed decisions on which proposal is the most adequate for a particular problem. This conference is a good forum to discuss on our ideas so that we can spread them to help improve the evaluation of information extraction proposals and gather valuable feedback from other researchers.

Keywords: web information extractors, information extraction evaluation method, Google scholar, web

Procedia PDF Downloads 248
25050 Text Mining Analysis of the Reconstruction Plans after the Great East Japan Earthquake

Authors: Minami Ito, Akihiro Iijima

Abstract:

On March 11, 2011, the Great East Japan Earthquake occurred off the coast of Sanriku, Japan. It is important to build a sustainable society through the reconstruction process rather than simply restoring the infrastructure. To compare the goals of reconstruction plans of quake-stricken municipalities, Japanese language morphological analysis was performed by using text mining techniques. Frequently-used nouns were sorted into four main categories of “life”, “disaster prevention”, “economy”, and “harmony with environment”. Because Soma City is affected by nuclear accident, sentences tagged to “harmony with environment” tended to be frequent compared to the other municipalities. Results from cluster analysis and principle component analysis clearly indicated that the local government reinforces the efforts to reduce risks from radiation exposure as a top priority.

Keywords: eco-friendly reconstruction, harmony with environment, decontamination, nuclear disaster

Procedia PDF Downloads 220