Search results for: data analysis of Uzbekistan
41936 Big Data: Concepts, Technologies and Applications in the Public Sector
Authors: A. Alexandru, C. A. Alexandru, D. Coardos, E. Tudora
Abstract:
Big Data (BD) is associated with a new generation of technologies and architectures which can harness the value of extremely large volumes of very varied data through real time processing and analysis. It involves changes in (1) data types, (2) accumulation speed, and (3) data volume. This paper presents the main concepts related to the BD paradigm, and introduces architectures and technologies for BD and BD sets. The integration of BD with the Hadoop Framework is also underlined. BD has attracted a lot of attention in the public sector due to the newly emerging technologies that allow the availability of network access. The volume of different types of data has exponentially increased. Some applications of BD in the public sector in Romania are briefly presented.Keywords: big data, big data analytics, Hadoop, cloud
Procedia PDF Downloads 30841935 Influence of Thermal Annealing on Phase Composition and Structure of Quartz-Sericite Minerale
Authors: Atabaev I. G., Fayziev Sh. A., Irmatova Sh. K.
Abstract:
Raw materials with high content of Kalium oxide widely used in ceramic technology for prevention or decreasing of deformation of ceramic goods during drying process and under thermal annealing. Becouse to low melting temperature it is also used to decreasing of the temperature of thermal annealing during fabrication of ceramic goods [1,2]. So called “Porceline or China stones” - quartz-sericite (muscovite) minerals is also can be used for prevention of deformation as the content of Kalium oxide in muscovite is rather high (SiO2, + KAl2[AlSi3O10](OH)2). [3] . To estimation of possibility of use of this mineral for ceramic manufacture, in the presented article the influence of thermal processing on phase and a chemical content of this raw material is investigated. As well as to other ceramic raw materials (kaoline, white burning clays) the basic requirements of the industry to quality of "a porcelain stone» are following: small size of particles, relative high uniformity of disrtribution of components and phase, white color after burning, small content of colorant oxides or chromophores (Fe2O3, FeO, TiO2, etc) [4,5]. In the presented work natural minerale from the Boynaksay deposit (Uzbekistan) is investigated. The samples was mechanically polished for investigation by Scanning Electron Microscope. Powder with size of particle up to 63 μm was used to X-ray diffractometry and chemical analysis. The annealing of samples was performed at 900, 1120, 1350oC during 1 hour. Chemical composition of Boynaksay raw material according to chemical analysis presented in the table 1. For comparison the composition of raw materials from Russia and USA are also presented. In the Boynaksay quartz – sericite the average parity of quartz and sericite makes 55-60 and 30-35 % accordingly. The distribution of quartz and sericite phases in raw material was investigated using electron probe scanning electronic microscope «JEOL» JXA-8800R. In the figure 1 the scanning electron microscope (SEM) micrograps of the surface and the distributions of Al, Si and K atoms in the sample are presented. As it seen small granular, white and dense mineral includes quartz, sericite and small content of impurity minerals. Basically, crystals of quartz have the sizes from 80 up to 500 μm. Between quartz crystals the sericite inclusions having a tablet form with radiant structure are located. The size of sericite crystals is ~ 40-250 μm. Using data on interplanar distance [6,7] and ASTM Powder X-ray Diffraction Data it is shown that natural «a porcelain stone» quartz – sericite consists the quartz SiO2, sericite (muscovite type) KAl2[AlSi3O10](OH)2 and kaolinite Al203SiO22Н2О (See Figure 2 and Table 2). As it seen in the figure 3 and table 3a after annealing at 900oC the quartz – sericite contains quartz – SiO2 and muscovite - KAl2[AlSi3O10](OH)2, the peaks related with Kaolinite are absent. After annealing at 1120oC the full disintegration of muscovite and formation of mullite phase Al203 SiO2 is observed (the weak peaks of mullite appears in fig 3b and table 3b). After annealing at 1350oC the samples contains crystal phase of quartz and mullite (figure 3c and table 3с). Well known Mullite gives to ceramics high density, abrasive and chemical stability. Thus the obtained experimental data on formation of various phases during thermal annealing can be used for development of fabrication technology of advanced materials. Conclusion: The influence of thermal annealing in the interval 900-1350oC on phase composition and structure of quartz-sericite minerale is investigated. It is shown that during annealing the phase content of raw material is changed. After annealing at 1350oC the samples contains crystal phase of quartz and mullite (which gives gives to ceramics high density, abrasive and chemical stability).Keywords: quartz-sericite, kaolinite, mullite, thermal processing
Procedia PDF Downloads 41241934 Data Stream Association Rule Mining with Cloud Computing
Authors: B. Suraj Aravind, M. H. M. Krishna Prasad
Abstract:
There exist emerging applications of data streams that require association rule mining, such as network traffic monitoring, web click streams analysis, sensor data, data from satellites etc. Data streams typically arrive continuously in high speed with huge amount and changing data distribution. This raises new issues that need to be considered when developing association rule mining techniques for stream data. This paper proposes to introduce an improved data stream association rule mining algorithm by eliminating the limitation of resources. For this, the concept of cloud computing is used. Inclusion of this may lead to additional unknown problems which needs further research.Keywords: data stream, association rule mining, cloud computing, frequent itemsets
Procedia PDF Downloads 49841933 Using SNAP and RADTRAD to Establish the Analysis Model for Maanshan PWR Plant
Authors: J. R. Wang, H. C. Chen, C. Shih, S. W. Chen, J. H. Yang, Y. Chiang
Abstract:
In this study, we focus on the establishment of the analysis model for Maanshan PWR nuclear power plant (NPP) by using RADTRAD and SNAP codes with the FSAR, manuals, and other data. In order to evaluate the cumulative dose at the Exclusion Area Boundary (EAB) and Low Population Zone (LPZ) outer boundary, Maanshan NPP RADTRAD/SNAP model was used to perform the analysis of the DBA LOCA case. The analysis results of RADTRAD were similar to FSAR data. These analysis results were lower than the failure criteria of 10 CFR 100.11 (a total radiation dose to the whole body, 250 mSv; a total radiation dose to the thyroid from iodine exposure, 3000 mSv).Keywords: RADionuclide, transport, removal, and dose estimation (RADTRAD), symbolic nuclear analysis package (SNAP), dose, PWR
Procedia PDF Downloads 45941932 Efficiency of the Slovak Commercial Banks Applying the DEA Window Analysis
Authors: Iveta Řepková
Abstract:
The aim of this paper is to estimate the efficiency of the Slovak commercial banks employing the Data Envelopment Analysis (DEA) window analysis approach during the period 2003-2012. The research is based on unbalanced panel data of the Slovak commercial banks. Undesirable output was included into analysis of banking efficiency. It was found that most efficient banks were Postovabanka, UniCredit Bank and Istrobanka in CCR model and the most efficient banks were Slovenskasporitelna, Istrobanka and UniCredit Bank in BCC model. On contrary, the lowest efficient banks were found Privatbanka and CitiBank. We found that the largest banks in the Slovak banking market were lower efficient than medium-size and small banks. Results of the paper is that during the period 2003-2008 the average efficiency was increasing and then during the period 2010-2011 the average efficiency decreased as a result of financial crisis.Keywords: data envelopment analysis, efficiency, Slovak banking sector, window analysis
Procedia PDF Downloads 35541931 Spatial Variability of Brahmaputra River Flow Characteristics
Authors: Hemant Kumar
Abstract:
Brahmaputra River is known according to the Hindu mythology the son of the Lord Brahma. According to this name, the river Brahmaputra creates mass destruction during the monsoon season in Assam, India. It is a state situated in North-East part of India. This is one of the essential states out of the seven countries of eastern India, where almost all entire Brahmaputra flow carried out. The other states carry their tributaries. In the present case study, the spatial analysis performed in this specific case the number of MODIS data are acquired. In the method of detecting the change, the spray content was found during heavy rainfall and in the flooded monsoon season. By this method, particularly the analysis over the Brahmaputra outflow determines the flooded season. The charged particle-associated in aerosol content genuinely verifies the heavy water content below the ground surface, which is validated by trend analysis through rainfall spectrum data. This is confirmed by in-situ sampled view data from a different position of Brahmaputra River. Further, a Hyperion Hyperspectral 30 m resolution data were used to scan the sediment deposits, which is also confirmed by in-situ sampled view data from a different position.Keywords: aerosol, change detection, spatial analysis, trend analysis
Procedia PDF Downloads 14641930 Attribute Analysis of Quick Response Code Payment Users Using Discriminant Non-negative Matrix Factorization
Authors: Hironori Karachi, Haruka Yamashita
Abstract:
Recently, the system of quick response (QR) code is getting popular. Many companies introduce new QR code payment services and the services are competing with each other to increase the number of users. For increasing the number of users, we should grasp the difference of feature of the demographic information, usage information, and value of users between services. In this study, we conduct an analysis of real-world data provided by Nomura Research Institute including the demographic data of users and information of users’ usages of two services; LINE Pay, and PayPay. For analyzing such data and interpret the feature of them, Nonnegative Matrix Factorization (NMF) is widely used; however, in case of the target data, there is a problem of the missing data. EM-algorithm NMF (EMNMF) to complete unknown values for understanding the feature of the given data presented by matrix shape. Moreover, for comparing the result of the NMF analysis of two matrices, there is Discriminant NMF (DNMF) shows the difference of users features between two matrices. In this study, we combine EMNMF and DNMF and also analyze the target data. As the interpretation, we show the difference of the features of users between LINE Pay and Paypay.Keywords: data science, non-negative matrix factorization, missing data, quality of services
Procedia PDF Downloads 13041929 Machine Learning Analysis of Student Success in Introductory Calculus Based Physics I Course
Authors: Chandra Prayaga, Aaron Wade, Lakshmi Prayaga, Gopi Shankar Mallu
Abstract:
This paper presents the use of machine learning algorithms to predict the success of students in an introductory physics course. Data having 140 rows pertaining to the performance of two batches of students was used. The lack of sufficient data to train robust machine learning models was compensated for by generating synthetic data similar to the real data. CTGAN and CTGAN with Gaussian Copula (Gaussian) were used to generate synthetic data, with the real data as input. To check the similarity between the real data and each synthetic dataset, pair plots were made. The synthetic data was used to train machine learning models using the PyCaret package. For the CTGAN data, the Ada Boost Classifier (ADA) was found to be the ML model with the best fit, whereas the CTGAN with Gaussian Copula yielded Logistic Regression (LR) as the best model. Both models were then tested for accuracy with the real data. ROC-AUC analysis was performed for all the ten classes of the target variable (Grades A, A-, B+, B, B-, C+, C, C-, D, F). The ADA model with CTGAN data showed a mean AUC score of 0.4377, but the LR model with the Gaussian data showed a mean AUC score of 0.6149. ROC-AUC plots were obtained for each Grade value separately. The LR model with Gaussian data showed consistently better AUC scores compared to the ADA model with CTGAN data, except in two cases of the Grade value, C- and A-.Keywords: machine learning, student success, physics course, grades, synthetic data, CTGAN, gaussian copula CTGAN
Procedia PDF Downloads 4341928 Frequent Itemset Mining Using Rough-Sets
Authors: Usman Qamar, Younus Javed
Abstract:
Frequent pattern mining is the process of finding a pattern (a set of items, subsequences, substructures, etc.) that occurs frequently in a data set. It was proposed in the context of frequent itemsets and association rule mining. Frequent pattern mining is used to find inherent regularities in data. What products were often purchased together? Its applications include basket data analysis, cross-marketing, catalog design, sale campaign analysis, Web log (click stream) analysis, and DNA sequence analysis. However, one of the bottlenecks of frequent itemset mining is that as the data increase the amount of time and resources required to mining the data increases at an exponential rate. In this investigation a new algorithm is proposed which can be uses as a pre-processor for frequent itemset mining. FASTER (FeAture SelecTion using Entropy and Rough sets) is a hybrid pre-processor algorithm which utilizes entropy and rough-sets to carry out record reduction and feature (attribute) selection respectively. FASTER for frequent itemset mining can produce a speed up of 3.1 times when compared to original algorithm while maintaining an accuracy of 71%.Keywords: rough-sets, classification, feature selection, entropy, outliers, frequent itemset mining
Procedia PDF Downloads 43541927 Dynamics of Protest Mobilization and Rapid Demobilization in Post-2001 Afghanistan: Facing Enlightening Movement
Authors: Ali Aqa Mohammad Jawad
Abstract:
Taking a relational approach, this paper analyzes the causal mechanisms associated with successful mobilization and rapid demobilization of the Enlightening Movement in post-2001 Afghanistan. The movement emerged after the state-owned Da Afghan Bereshna Sherkat (DABS) decided to divert the route for the Turkmenistan-Uzbekistan-Tajikistan-Afghanistan-Pakistan (TUTAP) electricity project. The grid was initially planned to go through the Hazara-inhabited province of Bamiyan, according to Afghanistan’s Power Sector Master Plan. The reroute served as an aide-mémoire of historical subordination to other ethno-religious groups for the Hazara community. It was also perceived as deprivation from post-2001 development projects, financed by international aid. This torched the accumulated grievances, which then gave birth to the Enlightening Movement. The movement had a successful mobilization. However, it demobilized after losing much of its mobilizing capabilities through an amalgamation of external and internal relational factors. The successful mobilization yet rapid demobilization constitutes the puzzle of this paper. From the theoretical perspective, this paper is significant as it establishes the applicability of contentious politics theory to protest mobilizations that occurred in Afghanistan, a context-specific, characterized by ethnic politics. Both primary and secondary data are utilized to address the puzzle. As for the primary resources, media coverage, interviews, reports, public media statements of the movement, involved in contentious performances, and data from Social Networking Services (SNS) are used. The covered period is from 2001-2018. As for the secondary resources, published academic articles and books are used to give a historical account of contentious politics. For data analysis, a qualitative comparative historical method is utilized to uncover the causal mechanisms associated with successful mobilization and rapid demobilization of the Movement. In this pursuit, both mobilization and demobilization are considered as larger political processes that could be decomposed to constituent mechanisms. Enlightening Movement’s framing and campaigns are first studied to uncover the associated mechanisms. Then, to avoid introducing some ad hoc mechanisms, the recurrence of mechanisms is checked against another case. Mechanisms qualify as robust if they are “recurrent” in different episodes of contention. Checking the recurrence of causal mechanisms is vital as past contentious events tend to reinforce future events. The findings of this paper suggest that the public sphere in Afghanistan is drastically different from Western democracies known as the birthplace of social movements. In Western democracies, when institutional politics did not respond, movement organizers occupied the public sphere, undermining the legitimacy of the government. In Afghanistan, the public sphere is ethicized. Considering the inter- and intra-relational dynamics of ethnic groups in Afghanistan, the movement reduced to an erosive inter- and intra-ethnic conflict. This undermined the cohesiveness of the movement, which then kicked-off its demobilization process.Keywords: enlightening movement, contentious politics, mobilization, demobilization
Procedia PDF Downloads 19341926 Harmonic Data Preparation for Clustering and Classification
Authors: Ali Asheibi
Abstract:
The rapid increase in the size of databases required to store power quality monitoring data has demanded new techniques for analysing and understanding the data. One suggested technique to assist in analysis is data mining. Preparing raw data to be ready for data mining exploration take up most of the effort and time spent in the whole data mining process. Clustering is an important technique in data mining and machine learning in which underlying and meaningful groups of data are discovered. Large amounts of harmonic data have been collected from an actual harmonic monitoring system in a distribution system in Australia for three years. This amount of acquired data makes it difficult to identify operational events that significantly impact the harmonics generated on the system. In this paper, harmonic data preparation processes to better understanding of the data have been presented. Underlying classes in this data has then been identified using clustering technique based on the Minimum Message Length (MML) method. The underlying operational information contained within the clusters can be rapidly visualised by the engineers. The C5.0 algorithm was used for classification and interpretation of the generated clusters.Keywords: data mining, harmonic data, clustering, classification
Procedia PDF Downloads 24541925 Simulation Data Summarization Based on Spatial Histograms
Authors: Jing Zhao, Yoshiharu Ishikawa, Chuan Xiao, Kento Sugiura
Abstract:
In order to analyze large-scale scientific data, research on data exploration and visualization has gained popularity. In this paper, we focus on the exploration and visualization of scientific simulation data, and define a spatial V-Optimal histogram for data summarization. We propose histogram construction algorithms based on a general binary hierarchical partitioning as well as a more specific one, the l-grid partitioning. For effective data summarization and efficient data visualization in scientific data analysis, we propose an optimal algorithm as well as a heuristic algorithm for histogram construction. To verify the effectiveness and efficiency of the proposed methods, we conduct experiments on the massive evacuation simulation data.Keywords: simulation data, data summarization, spatial histograms, exploration, visualization
Procedia PDF Downloads 17541924 Structural Equation Modeling Semiparametric Truncated Spline Using Simulation Data
Authors: Adji Achmad Rinaldo Fernandes
Abstract:
SEM analysis is a complex multivariate analysis because it involves a number of exogenous and endogenous variables that are interconnected to form a model. The measurement model is divided into two, namely, the reflective model (reflecting) and the formative model (forming). Before carrying out further tests on SEM, there are assumptions that must be met, namely the linearity assumption, to determine the form of the relationship. There are three modeling approaches to path analysis, including parametric, nonparametric and semiparametric approaches. The aim of this research is to develop semiparametric SEM and obtain the best model. The data used in the research is secondary data as the basis for the process of obtaining simulation data. Simulation data was generated with various sample sizes of 100, 300, and 500. In the semiparametric SEM analysis, the form of the relationship studied was determined, namely linear and quadratic and determined one and two knot points with various levels of error variance (EV=0.5; 1; 5). There are three levels of closeness of relationship for the analysis process in the measurement model consisting of low (0.1-0.3), medium (0.4-0.6) and high (0.7-0.9) levels of closeness. The best model lies in the form of the relationship X1Y1 linear, and. In the measurement model, a characteristic of the reflective model is obtained, namely that the higher the closeness of the relationship, the better the model obtained. The originality of this research is the development of semiparametric SEM, which has not been widely studied by researchers.Keywords: semiparametric SEM, measurement model, structural model, reflective model, formative model
Procedia PDF Downloads 3841923 Prediction of Marine Ecosystem Changes Based on the Integrated Analysis of Multivariate Data Sets
Authors: Prozorkevitch D., Mishurov A., Sokolov K., Karsakov L., Pestrikova L.
Abstract:
The current body of knowledge about the marine environment and the dynamics of marine ecosystems includes a huge amount of heterogeneous data collected over decades. It generally includes a wide range of hydrological, biological and fishery data. Marine researchers collect these data and analyze how and why the ecosystem changes from past to present. Based on these historical records and linkages between the processes it is possible to predict future changes. Multivariate analysis of trends and their interconnection in the marine ecosystem may be used as an instrument for predicting further ecosystem evolution. A wide range of information about the components of the marine ecosystem for more than 50 years needs to be used to investigate how these arrays can help to predict the future.Keywords: barents sea ecosystem, abiotic, biotic, data sets, trends, prediction
Procedia PDF Downloads 11341922 Combining Diffusion Maps and Diffusion Models for Enhanced Data Analysis
Authors: Meng Su
Abstract:
High-dimensional data analysis often presents challenges in capturing the complex, nonlinear relationships and manifold structures inherent to the data. This article presents a novel approach that leverages the strengths of two powerful techniques, Diffusion Maps and Diffusion Probabilistic Models (DPMs), to address these challenges. By integrating the dimensionality reduction capability of Diffusion Maps with the data modeling ability of DPMs, the proposed method aims to provide a comprehensive solution for analyzing and generating high-dimensional data. The Diffusion Map technique preserves the nonlinear relationships and manifold structure of the data by mapping it to a lower-dimensional space using the eigenvectors of the graph Laplacian matrix. Meanwhile, DPMs capture the dependencies within the data, enabling effective modeling and generation of new data points in the low-dimensional space. The generated data points can then be mapped back to the original high-dimensional space, ensuring consistency with the underlying manifold structure. Through a detailed example implementation, the article demonstrates the potential of the proposed hybrid approach to achieve more accurate and effective modeling and generation of complex, high-dimensional data. Furthermore, it discusses possible applications in various domains, such as image synthesis, time-series forecasting, and anomaly detection, and outlines future research directions for enhancing the scalability, performance, and integration with other machine learning techniques. By combining the strengths of Diffusion Maps and DPMs, this work paves the way for more advanced and robust data analysis methods.Keywords: diffusion maps, diffusion probabilistic models (DPMs), manifold learning, high-dimensional data analysis
Procedia PDF Downloads 10541921 Industrial Process Mining Based on Data Pattern Modeling and Nonlinear Analysis
Authors: Hyun-Woo Cho
Abstract:
Unexpected events may occur with serious impacts on industrial process. This work utilizes a data representation technique to model and to analyze process data pattern for the purpose of diagnosis. In this work, the use of triangular representation of process data is evaluated using simulation process. Furthermore, the effect of using different pre-treatment techniques based on such as linear or nonlinear reduced spaces was compared. This work extracted the fault pattern in the reduced space, not in the original data space. The results have shown that the non-linear technique based diagnosis method produced more reliable results and outperforms linear method.Keywords: process monitoring, data analysis, pattern modeling, fault, nonlinear techniques
Procedia PDF Downloads 38641920 Analysis of Lead Time Delays in Supply Chain: A Case Study
Authors: Abdel-Aziz M. Mohamed, Nermeen Coutry
Abstract:
Lead time is an important measure of supply chain performance. It impacts both customer satisfactions as well as the total cost of inventory. This paper presents the result of a study on the analysis of the customer order lead-time for a multinational company. In the study, the lead time was divided into three stages: order entry, order fulfillment, and order delivery. A sample of size 2,425 order lines from the company records were considered for this study. The sample data includes information regarding customer orders from the time of order entry until order delivery. Data regarding the lead time of each sage for different orders were also provided. Summary statistics on lead time data reveals that about 30% of the orders were delivered after the scheduled due date. The result of the multiple linear regression analysis technique revealed that component type, logistics parameter, order size and the customer type have significant impact on lead time. Data analysis on the stages of lead time indicates that stage 2 consumes over 50% of the lead time. Pareto analysis was made to study the reasons for the customer order delay in each of the 3 stages. Recommendation was given to resolve the problem.Keywords: lead time reduction, customer satisfaction, service quality, statistical analysis
Procedia PDF Downloads 72941919 Differentiation between Different Rangeland Sites Using Principal Component Analysis in Semi-Arid Areas of Sudan
Authors: Nancy Ibrahim Abdalla, Abdelaziz Karamalla Gaiballa
Abstract:
Rangelands in semi-arid areas provide a good source for feeding huge numbers of animals and serving environmental, economic and social importance; therefore, these areas are considered economically very important for the pastoral sector in Sudan. This paper investigates the means of differentiating between different rangelands sites according to soil types using principal component analysis to assist in monitoring and assessment purposes. Three rangeland sites were identified in the study area as flat sandy sites, sand dune site, and hard clay site. Principal component analysis (PCA) was used to reduce the number of factors needed to distinguish between rangeland sites and produce a new set of data including the most useful spectral information to run satellite image processing. It was performed using selected types of data (two vegetation indices, topographic data and vegetation surface reflectance within the three bands of MODIS data). Analysis with PCA indicated that there is a relatively high correspondence between vegetation and soil of the total variance in the data set. The results showed that the use of the principal component analysis (PCA) with the selected variables showed a high difference, reflected in the variance and eigenvalues and it can be used for differentiation between different range sites.Keywords: principal component analysis, PCA, rangeland sites, semi-arid areas, soil types
Procedia PDF Downloads 18441918 Interpretation and Clustering Framework for Analyzing ECG Survey Data
Authors: Irum Matloob, Shoab Ahmad Khan, Fahim Arif
Abstract:
As Indo-Pak has been the victim of heart diseases since many decades. Many surveys showed that percentage of cardiac patients is increasing in Pakistan day by day, and special attention is needed to pay on this issue. The framework is proposed for performing detailed analysis of ECG survey data which is conducted for measuring prevalence of heart diseases statistics in Pakistan. The ECG survey data is evaluated or filtered by using automated Minnesota codes and only those ECGs are used for further analysis which is fulfilling the standardized conditions mentioned in the Minnesota codes. Then feature selection is performed by applying proposed algorithm based on discernibility matrix, for selecting relevant features from the database. Clustering is performed for exposing natural clusters from the ECG survey data by applying spectral clustering algorithm using fuzzy c means algorithm. The hidden patterns and interesting relationships which have been exposed after this analysis are useful for further detailed analysis and for many other multiple purposes.Keywords: arrhythmias, centroids, ECG, clustering, discernibility matrix
Procedia PDF Downloads 46941917 A Study on Sentiment Analysis Using Various ML/NLP Models on Historical Data of Indian Leaders
Authors: Sarthak Deshpande, Akshay Patil, Pradip Pandhare, Nikhil Wankhede, Rushali Deshmukh
Abstract:
Among the highly significant duties for any language most effective is the sentiment analysis, which is also a key area of NLP, that recently made impressive strides. There are several models and datasets available for those tasks in popular and commonly used languages like English, Russian, and Spanish. While sentiment analysis research is performed extensively, however it is lagging behind for the regional languages having few resources such as Hindi, Marathi. Marathi is one of the languages that included in the Indian Constitution’s 8th schedule and is the third most widely spoken language in the country and primarily spoken in the Deccan region, which encompasses Maharashtra and Goa. There isn’t sufficient study on sentiment analysis methods based on Marathi text due to lack of available resources, information. Therefore, this project proposes the use of different ML/NLP models for the analysis of Marathi data from the comments below YouTube content, tweets or Instagram posts. We aim to achieve a short and precise analysis and summary of the related data using our dataset (Dates, names, root words) and lexicons to locate exact information.Keywords: multilingual sentiment analysis, Marathi, natural language processing, text summarization, lexicon-based approaches
Procedia PDF Downloads 7141916 Extreme Temperature Forecast in Mbonge, Cameroon Through Return Level Analysis of the Generalized Extreme Value (GEV) Distribution
Authors: Nkongho Ayuketang Arreyndip, Ebobenow Joseph
Abstract:
In this paper, temperature extremes are forecast by employing the block maxima method of the generalized extreme value (GEV) distribution to analyse temperature data from the Cameroon Development Corporation (CDC). By considering two sets of data (raw data and simulated data) and two (stationary and non-stationary) models of the GEV distribution, return levels analysis is carried out and it was found that in the stationary model, the return values are constant over time with the raw data, while in the simulated data the return values show an increasing trend with an upper bound. In the non-stationary model, the return levels of both the raw data and simulated data show an increasing trend with an upper bound. This clearly shows that although temperatures in the tropics show a sign of increase in the future, there is a maximum temperature at which there is no exceedance. The results of this paper are very vital in agricultural and environmental research.Keywords: forecasting, generalized extreme value (GEV), meteorology, return level
Procedia PDF Downloads 47741915 Longitudinal Analysis of Internet Speed Data in the Gulf Cooperation Council Region
Authors: Musab Isah
Abstract:
This paper presents a longitudinal analysis of Internet speed data in the Gulf Cooperation Council (GCC) region, focusing on the most populous cities of each of the six countries – Riyadh, Saudi Arabia; Dubai, UAE; Kuwait City, Kuwait; Doha, Qatar; Manama, Bahrain; and Muscat, Oman. The study utilizes data collected from the Measurement Lab (M-Lab) infrastructure over a five-year period from January 1, 2019, to December 31, 2023. The analysis includes downstream and upstream throughput data for the cities, covering significant events such as the launch of 5G networks in 2019, COVID-19-induced lockdowns in 2020 and 2021, and the subsequent recovery period and return to normalcy. The results showcase substantial increases in Internet speeds across the cities, highlighting improvements in both download and upload throughput over the years. All the GCC countries have achieved above-average Internet speeds that can conveniently support various online activities and applications with excellent user experience.Keywords: internet data science, internet performance measurement, throughput analysis, internet speed, measurement lab, network diagnostic tool
Procedia PDF Downloads 6141914 Analysis of Spatial and Temporal Data Using Remote Sensing Technology
Authors: Kapil Pandey, Vishnu Goyal
Abstract:
Spatial and temporal data analysis is very well known in the field of satellite image processing. When spatial data are correlated with time, series analysis it gives the significant results in change detection studies. In this paper the GIS and Remote sensing techniques has been used to find the change detection using time series satellite imagery of Uttarakhand state during the years of 1990-2010. Natural vegetation, urban area, forest cover etc. were chosen as main landuse classes to study. Landuse/ landcover classes within several years were prepared using satellite images. Maximum likelihood supervised classification technique was adopted in this work and finally landuse change index has been generated and graphical models were used to present the changes.Keywords: GIS, landuse/landcover, spatial and temporal data, remote sensing
Procedia PDF Downloads 43041913 On the Estimation of Crime Rate in the Southwest of Nigeria: Principal Component Analysis Approach
Authors: Kayode Balogun, Femi Ayoola
Abstract:
Crime is at alarming rate in this part of world and there are many factors that are contributing to this antisocietal behaviour both among the youths and old. In this work, principal component analysis (PCA) was used as a tool to reduce the dimensionality and to really know those variables that were crime prone in the study region. Data were collected on twenty-eight crime variables from National Bureau of Statistics (NBS) databank for a period of fifteen years, while retaining as much of the information as possible. We use PCA in this study to know the number of major variables and contributors to the crime in the Southwest Nigeria. The results of our analysis revealed that there were eight principal variables have been retained using the Scree plot and Loading plot which implies an eight-equation solution will be appropriate for the data. The eight components explained 93.81% of the total variation in the data set. We also found that the highest and commonly committed crimes in the Southwestern Nigeria were: Assault, Grievous Harm and Wounding, theft/stealing, burglary, house breaking, false pretence, unlawful arms possession and breach of public peace.Keywords: crime rates, data, Southwest Nigeria, principal component analysis, variables
Procedia PDF Downloads 44341912 AI-Driven Solutions for Optimizing Master Data Management
Authors: Srinivas Vangari
Abstract:
In the era of big data, ensuring the accuracy, consistency, and reliability of critical data assets is crucial for data-driven enterprises. Master Data Management (MDM) plays a crucial role in this endeavor. This paper investigates the role of Artificial Intelligence (AI) in enhancing MDM, focusing on how AI-driven solutions can automate and optimize various stages of the master data lifecycle. By integrating AI (Quantitative and Qualitative Analysis) into processes such as data creation, maintenance, enrichment, and usage, organizations can achieve significant improvements in data quality and operational efficiency. Quantitative analysis is employed to measure the impact of AI on key metrics, including data accuracy, processing speed, and error reduction. For instance, our study demonstrates an 18% improvement in data accuracy and a 75% reduction in duplicate records across multiple systems post-AI implementation. Furthermore, AI’s predictive maintenance capabilities reduced data obsolescence by 22%, as indicated by statistical analyses of data usage patterns over a 12-month period. Complementing this, a qualitative analysis delves into the specific AI-driven strategies that enhance MDM practices, such as automating data entry and validation, which resulted in a 28% decrease in manual errors. Insights from case studies highlight how AI-driven data cleansing processes reduced inconsistencies by 25% and how AI-powered enrichment strategies improved data relevance by 24%, thus boosting decision-making accuracy. The findings demonstrate that AI significantly enhances data quality and integrity, leading to improved enterprise performance through cost reduction, increased compliance, and more accurate, real-time decision-making. These insights underscore the value of AI as a critical tool in modern data management strategies, offering a competitive edge to organizations that leverage its capabilities.Keywords: artificial intelligence, master data management, data governance, data quality
Procedia PDF Downloads 1641911 Customer Churn Analysis in Telecommunication Industry Using Data Mining Approach
Authors: Burcu Oralhan, Zeki Oralhan, Nilsun Sariyer, Kumru Uyar
Abstract:
Data mining has been becoming more and more important and a wide range of applications in recent years. Data mining is the process of find hidden and unknown patterns in big data. One of the applied fields of data mining is Customer Relationship Management. Understanding the relationships between products and customers is crucial for every business. Customer Relationship Management is an approach to focus on customer relationship development, retention and increase on customer satisfaction. In this study, we made an application of a data mining methods in telecommunication customer relationship management side. This study aims to determine the customers profile who likely to leave the system, develop marketing strategies, and customized campaigns for customers. Data are clustered by applying classification techniques for used to determine the churners. As a result of this study, we will obtain knowledge from international telecommunication industry. We will contribute to the understanding and development of this subject in Customer Relationship Management.Keywords: customer churn analysis, customer relationship management, data mining, telecommunication industry
Procedia PDF Downloads 31641910 Accelerating Side Channel Analysis with Distributed and Parallelized Processing
Authors: Kyunghee Oh, Dooho Choi
Abstract:
Although there is no theoretical weakness in a cryptographic algorithm, Side Channel Analysis can find out some secret data from the physical implementation of a cryptosystem. The analysis is based on extra information such as timing information, power consumption, electromagnetic leaks or even sound which can be exploited to break the system. Differential Power Analysis is one of the most popular analyses, as computing the statistical correlations of the secret keys and power consumptions. It is usually necessary to calculate huge data and takes a long time. It may take several weeks for some devices with countermeasures. We suggest and evaluate the methods to shorten the time to analyze cryptosystems. Our methods include distributed computing and parallelized processing.Keywords: DPA, distributed computing, parallelized processing, side channel analysis
Procedia PDF Downloads 42541909 Analysis and Rule Extraction of Coronary Artery Disease Data Using Data Mining
Authors: Rezaei Hachesu Peyman, Oliyaee Azadeh, Salahzadeh Zahra, Alizadeh Somayyeh, Safaei Naser
Abstract:
Coronary Artery Disease (CAD) is one major cause of disability in adults and one main cause of death in developed. In this study, data mining techniques including Decision Trees, Artificial neural networks (ANNs), and Support Vector Machine (SVM) analyze CAD data. Data of 4948 patients who had suffered from heart diseases were included in the analysis. CAD is the target variable, and 24 inputs or predictor variables are used for the classification. The performance of these techniques is compared in terms of sensitivity, specificity, and accuracy. The most significant factor influencing CAD is chest pain. Elderly males (age > 53) have a high probability to be diagnosed with CAD. SVM algorithm is the most useful way for evaluation and prediction of CAD patients as compared to non-CAD ones. Application of data mining techniques in analyzing coronary artery diseases is a good method for investigating the existing relationships between variables.Keywords: classification, coronary artery disease, data-mining, knowledge discovery, extract
Procedia PDF Downloads 65741908 Analyzing the Evolution of Adverse Events in Pharmacovigilance: A Data-Driven Approach
Authors: Kwaku Damoah
Abstract:
This study presents a comprehensive data-driven analysis to understand the evolution of adverse events (AEs) in pharmacovigilance. Utilizing data from the FDA Adverse Event Reporting System (FAERS), we employed three analytical methods: rank-based, frequency-based, and percentage change analyses. These methods assessed temporal trends and patterns in AE reporting, focusing on various drug-active ingredients and patient demographics. Our findings reveal significant trends in AE occurrences, with both increasing and decreasing patterns from 2000 to 2023. This research highlights the importance of continuous monitoring and advanced analysis in pharmacovigilance, offering valuable insights for healthcare professionals and policymakers to enhance drug safety.Keywords: event analysis, FDA adverse event reporting system, pharmacovigilance, temporal trend analysis
Procedia PDF Downloads 4841907 An Exploratory Analysis of Brisbane's Commuter Travel Patterns Using Smart Card Data
Authors: Ming Wei
Abstract:
Over the past two decades, Location Based Service (LBS) data have been increasingly applied to urban and transportation studies due to their comprehensiveness and consistency. However, compared to other LBS data including mobile phone data, GPS and social networking platforms, smart card data collected from public transport users have arguably yet to be fully exploited in urban systems analysis. By using five weekdays of passenger travel transaction data taken from go card – Southeast Queensland’s transit smart card – this paper analyses the spatiotemporal distribution of passenger movement with regard to the land use patterns in Brisbane. Work and residential places for public transport commuters were identified after extracting journeys-to-work patterns. Our results show that the locations of the workplaces identified from the go card data and residential suburbs are largely consistent with those that were marked in the land use map. However, the intensity for some residential locations in terms of population or commuter densities do not match well between the map and those derived from the go card data. This indicates that the misalignment between residential areas and workplaces to a certain extent, shedding light on how enhancements to service management and infrastructure expansion might be undertaken.Keywords: big data, smart card data, travel pattern, land use
Procedia PDF Downloads 285