Search results for: statistical data analysis
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 41640

Search results for: statistical data analysis

41370 As a Little-Known Side a Passionate Statistician: Florence Nightingale

Authors: Gülcan Taşkıran, Ayla Bayık Temel

Abstract:

Background: Florence Nightingale, the modern founder of the nursing, is most famous for her role as a nurse. But not so much known about her contributions as a mathematician and statistician. Aim: In this conceptual article it is aimed to examine Florence Nightingale's statistics education, how she used her passion for statistics and applied statistical data in nursing care and her scientific contributions to statistical science. Design: Literature review method was used in the study. The databases of Istanbul University Library Search Engine, Turkish Medical Directory, Thesis Scanning Center of Higher Education Council, PubMed, Google Scholar, EBSCO Host, Web of Science were scanned to reach the studies. The keywords 'statistics' and 'Florence Nightingale' have been used in Turkish and English while being screened. As a result of the screening, totally 41 studies were examined from the national and international literature. Results: Florence Nightingale has interested in mathematics and statistics at her early ages and has received various training in these subjects. Lessons learned by Nightingale in a cultured family environment, her talent in mathematics and numbers, and her religious beliefs played a crucial role in the direction of the statistics. She was influenced by Quetelet's ideas in the formation of the statistical philosophy and received support from William Farr in her statistical studies. During the Crimean War, she applied statistical knowledge to nursing care, developed many statistical methods and graphics, so that she made revolutionary reforms in the health field. Conclusions: Nightingale's interest in statistics, her broad vision, the statistical ideas fused with religious beliefs, the innovative graphics she has developed and the extraordinary statistical projects that she carried out has been influential on the basis of her professional achievements. Florence Nightingale has also become a model for women in statistics. Today, using and teaching of statistics and research in nursing care practices and education programs continues with the light she gave.

Keywords: Crimean war, Florence Nightingale, nursing, statistics

Procedia PDF Downloads 281
41369 Mixture statistical modeling for predecting mortality human immunodeficiency virus (HIV) and tuberculosis(TB) infection patients

Authors: Mohd Asrul Affendi Bi Abdullah, Nyi Nyi Naing

Abstract:

The purpose of this study was to identify comparable manner between negative binomial death rate (NBDR) and zero inflated negative binomial death rate (ZINBDR) with died patients with (HIV + T B+) and (HIV + T B−). HIV and TB is a serious world wide problem in the developing country. Data were analyzed with applying NBDR and ZINBDR to make comparison which a favorable model is better to used. The ZINBDR model is able to account for the disproportionately large number of zero within the data and is shown to be a consistently better fit than the NBDR model. Hence, as a results ZINBDR model is a superior fit to the data than the NBDR model and provides additional information regarding the died mechanisms HIV+TB. The ZINBDR model is shown to be a use tool for analysis death rate according age categorical.

Keywords: zero inflated negative binomial death rate, HIV and TB, AIC and BIC, death rate

Procedia PDF Downloads 410
41368 Statistical Model to Examine the Impact of the Inflation Rate and Real Interest Rate on the Bahrain Economy

Authors: Ghada Abo-Zaid

Abstract:

Introduction: Oil is one of the most income source in Bahrain. Low oil price influence on the economy growth and the investment rate in Bahrain. For example, the economic growth was 3.7% in 2012, and it reduced to 2.9% in 2015. Investment rate was 9.8% in 2012, and it is reduced to be 5.9% and -12.1% in 2014 and 2015, respectively. The inflation rate is increased to the peak point in 2013 with 3.3 %. Objectives: The objectives here are to build statistical models to examine the effect of the interest rate inflation rate on the growth economy in Bahrain from 2000 to 2018. Methods: This study based on 18 years, and the multiple regression model is used for the analysis. All of the missing data are omitted from the analysis. Results: Regression model is used to examine the association between the Growth national product (GNP), the inflation rate, and real interest rate. We found that (i) Increase the real interest rate decrease the GNP. (ii) Increase the inflation rate does not effect on the growth economy in Bahrain since the average of the inflation rate was almost 2%, and this is considered as a low percentage. Conclusion: There is a positive impact of the real interest rate on the GNP in Bahrain. While the inflation rate does not show any negative influence on the GNP as the inflation rate was not large enough to effect negatively on the economy growth rate in Bahrain.

Keywords: growth national product, egypt, regression model, interest rate

Procedia PDF Downloads 141
41367 A Crowdsourced Homeless Data Collection System And Its Econometric Analysis: Strengthening Inclusive Public Administration Policies

Authors: Praniil Nagaraj

Abstract:

This paper proposes a method to collect homeless data using crowdsourcing and presents an approach to analyze the data, demonstrating its potential to strengthen existing and future policies aimed at promoting socio-economic equilibrium. The 2022 Annual Homeless Assessment Report (AHAR) to Congress highlighted alarming statistics, emphasizing the need for effective decision-making and budget allocation within local planning bodies known as Continuums of Care (CoC). This paper's contributions can be categorized into three main areas. Firstly, a unique method for collecting homeless data is introduced, utilizing a user-friendly smartphone app (currently available for Android). The app enables the general public to quickly record information about homeless individuals, including the number of people and details about their living conditions. The collected data, including date, time, and location, is anonymized and securely transmitted to the cloud. It is anticipated that an increasing number of users motivated to contribute to society will adopt the app, thus expanding the data collection efforts. Duplicate data is addressed through simple classification methods, and historical data is utilized to fill in missing information. The second contribution of this paper is the description of data analysis techniques applied to the collected data. By combining this new data with existing information, statistical regression analysis is employed to gain insights into various aspects, such as distinguishing between unsheltered and sheltered homeless populations, as well as examining their correlation with factors like unemployment rates, housing affordability, and labor demand. Initial data is collected in San Francisco, while pre-existing information is drawn from three cities: San Francisco, New York City, and Washington D.C., facilitating the conduction of simulations. The third contribution focuses on demonstrating the practical implications of the data processing results. The challenges faced by key stakeholders, including charitable organizations and local city governments, are taken into consideration. Two case studies are presented as examples. The first case study explores improving the efficiency of food and necessities distribution, as well as medical assistance, driven by charitable organizations. The second case study examines the correlation between micro-geographic budget expenditure by local city governments and homeless information to justify budget allocation and expenditures. The ultimate objective of this endeavor is to enable the continuous enhancement of the quality of life for the underprivileged. It is hoped that through increased crowdsourcing of data from the public, the Generosity Curve and the Need Curve will intersect, leading to a better world for all.

Keywords: crowdsourcing, homelessness, socio-economic policies, statistical regression

Procedia PDF Downloads 69
41366 Dispersion Rate of Spilled Oil in Water Column under Non-Breaking Water Waves

Authors: Hanifeh Imanian, Morteza Kolahdoozan

Abstract:

The purpose of this study is to present a mathematical phrase for calculating the dispersion rate of spilled oil in water column under non-breaking waves. In this regard, a multiphase numerical model is applied for which waves and oil phase were computed concurrently, and accuracy of its hydraulic calculations have been proven. More than 200 various scenarios of oil spilling in wave waters were simulated using the multiphase numerical model and its outcome were collected in a database. The recorded results were investigated to identify the major parameters affected vertical oil dispersion and finally 6 parameters were identified as main independent factors. Furthermore, some statistical tests were conducted to identify any relationship between the dependent variable (dispersed oil mass in the water column) and independent variables (water wave specifications containing height, length and wave period and spilled oil characteristics including density, viscosity and spilled oil mass). Finally, a mathematical-statistical relationship is proposed to predict dispersed oil in marine waters. To verify the proposed relationship, a laboratory example available in the literature was selected. Oil mass rate penetrated in water body computed by statistical regression was in accordance with experimental data was predicted. On this occasion, it was necessary to verify the proposed mathematical phrase. In a selected laboratory case available in the literature, mass oil rate penetrated in water body computed by suggested regression. Results showed good agreement with experimental data. The validated mathematical-statistical phrase is a useful tool for oil dispersion prediction in oil spill events in marine areas.

Keywords: dispersion, marine environment, mathematical-statistical relationship, oil spill

Procedia PDF Downloads 218
41365 Retail Strategy to Reduce Waste Keeping High Profit Utilizing Taylor's Law in Point-of-Sales Data

Authors: Gen Sakoda, Hideki Takayasu, Misako Takayasu

Abstract:

Waste reduction is a fundamental problem for sustainability. Methods for waste reduction with point-of-sales (POS) data are proposed, utilizing the knowledge of a recent econophysics study on a statistical property of POS data. Concretely, the non-stationary time series analysis method based on the Particle Filter is developed, which considers abnormal fluctuation scaling known as Taylor's law. This method is extended for handling incomplete sales data because of stock-outs by introducing maximum likelihood estimation for censored data. The way for optimal stock determination with pricing the cost of waste reduction is also proposed. This study focuses on the examination of the methods for large sales numbers where Taylor's law is obvious. Numerical analysis using aggregated POS data shows the effectiveness of the methods to reduce food waste maintaining a high profit for large sales numbers. Moreover, the way of pricing the cost of waste reduction reveals that a small profit loss realizes substantial waste reduction, especially in the case that the proportionality constant  of Taylor’s law is small. Specifically, around 1% profit loss realizes half disposal at =0.12, which is the actual  value of processed food items used in this research. The methods provide practical and effective solutions for waste reduction keeping a high profit, especially with large sales numbers.

Keywords: food waste reduction, particle filter, point-of-sales, sustainable development goals, Taylor's law, time series analysis

Procedia PDF Downloads 114
41364 Impact of Instagram Food Bloggers on Consumer (Generation Z) Decision Making Process in Islamabad. Pakistan

Authors: Tabinda Sadiq, Tehmina Ashfaq Qazi, Hoor Shumail

Abstract:

Recently, the advent of emerging technology has created an emerging generation of restaurant marketing. It explores the aspects that influence customers’ decision-making process in selecting a restaurant after reading food bloggers' reviews online. The motivation behind this research is to investigate the correlation between the credibility of the source and their attitude toward restaurant visits. The researcher collected the data by distributing a survey questionnaire through google forms by employing the Source credibility theory. Non- probability purposive sampling technique was used to collect data. The questionnaire used a predeveloped and validated scale by Ohanian to measure the relationship. Also, the researcher collected data from 250 respondents in order to investigate the influence of food bloggers on Gen Z's decision-making process. SPSS statistical version 26 was used for statistical testing and analyzing the data. The findings of the survey revealed that there is a moderate positive correlation between the variables. So, it can be analyzed that food bloggers do have an impact on Generation Z's decision making process.

Keywords: credibility, decision making, food bloggers, generation z, e-wom

Procedia PDF Downloads 54
41363 Using Multi-Level Analysis to Identify Future Trends in Small Device Digital Communication Examinations

Authors: Mark A. Spooner

Abstract:

The growth of technological advances in the digital communications industry has dictated the way forensic examination laboratories receive, analyze, and report on digital evidence. This study looks at the trends in a medium sized digital forensics lab that examines small communications devices (i.e., cellular telephones, tablets, thumb drives, etc.) over the past five years. As law enforcement and homeland security organizations budgets shrink, many agencies are being asked to perform more examinations with less resources available. Using multi-level statistical analysis using five years of examination data, this research shows the increasing technological demand trend. The research then extrapolates the current data into the model created and finds a continued exponential growth curve of said demands is well within the parameters defined earlier on in the research.

Keywords: digital forensics, forensic examination, small device, trends

Procedia PDF Downloads 184
41362 Statistical Tools for SFRA Diagnosis in Power Transformers

Authors: Rahul Srivastava, Priti Pundir, Y. R. Sood, Rajnish Shrivastava

Abstract:

For the interpretation of the signatures of sweep frequency response analysis(SFRA) of transformer different types of statistical techniques serves as an effective tool for doing either phase to phase comparison or sister unit comparison. In this paper with the discussion on SFRA several statistics techniques like cross correlation coefficient (CCF), root square error (RSQ), comparative standard deviation (CSD), Absolute difference, mean square error(MSE),Min-Max ratio(MM) are presented through several case studies. These methods require sample data size and spot frequencies of SFRA signatures that are being compared. The techniques used are based on power signal processing tools that can simplify result and limits can be created for the severity of the fault occurring in the transformer due to several short circuit forces or due to ageing. The advantages of using statistics techniques for analyzing of SFRA result are being indicated through several case studies and hence the results are obtained which determines the state of the transformer.

Keywords: absolute difference (DABS), cross correlation coefficient (CCF), mean square error (MSE), min-max ratio (MM-ratio), root square error (RSQ), standard deviation (CSD), sweep frequency response analysis (SFRA)

Procedia PDF Downloads 681
41361 A Relationship Extraction Method from Literary Fiction Considering Korean Linguistic Features

Authors: Hee-Jeong Ahn, Kee-Won Kim, Seung-Hoon Kim

Abstract:

The knowledge of the relationship between characters can help readers to understand the overall story or plot of the literary fiction. In this paper, we present a method for extracting the specific relationship between characters from a Korean literary fiction. Generally, methods for extracting relationships between characters in text are statistical or computational methods based on the sentence distance between characters without considering Korean linguistic features. Furthermore, it is difficult to extract the relationship with direction from text, such as one-sided love, because they consider only the weight of relationship, without considering the direction of the relationship. Therefore, in order to identify specific relationships between characters, we propose a statistical method considering linguistic features, such as syntactic patterns and speech verbs in Korean. The result of our method is represented by a weighted directed graph of the relationship between the characters. Furthermore, we expect that proposed method could be applied to the relationship analysis between characters of other content like movie or TV drama.

Keywords: data mining, Korean linguistic feature, literary fiction, relationship extraction

Procedia PDF Downloads 359
41360 Self-Efficacy Perceptions of Pre-Service Art and Music Teachers towards the Use of Information and Communication Technologies

Authors: Agah Tugrul Korucu

Abstract:

Information and communication technologies have become an important part of our daily lives with significant investments in technology in the 21st century. Individuals are more willing to design and implement computer-related activities, and they are the main component of computer self-efficacy and self-efficacy related to the fact that the increase in information technology, with operations in parallel with these activities more successful. The Self-efficacy level is a significant factor which determines how individuals act in events, situations and difficult processes. It is observed that individuals with higher self-efficacy perception of computers who encounter problems related to computer use overcome them more easily. Therefore, this study aimed to examine self-efficacy perceptions of pre-service art and music teachers towards the use of information and communication technologies in terms of different variables. Research group consists of 60 pre-service teachers who are studying at Necmettin Erbakan University Ahmet Keleşoğlu Faculty of Education Art and Music department. As data collection tool of the study; “personal information form” developed by the researcher and used to collect demographic data and "the perception scale related to self-efficacy of informational technology" are used. The scale is 5-point Likert-type scale. It consists of 27 items. The Kaiser-Meyer-Olkin (KMO) sample compliance value is found 0.959. The Cronbach alpha reliability coefficient of the scale is found to be 0.97. computer-based statistical software package (SPSS 21.0) is used in order to analyze the data collected by data collection tools; descriptive statistics, t-test, analysis of variance are used as statistical techniques.

Keywords: self-efficacy perceptions, teacher candidate, information and communication technologies, art teacher

Procedia PDF Downloads 307
41359 Use of Sentiel-2 Data to Monitor Plant Density and Establishment Rate of Winter Wheat Fields

Authors: Bing-Bing E. Goh

Abstract:

Plant counting is a labour intensive and time-consuming task for the farmers. However, it is an important indicator for farmers to make decisions on subsequent field management. This study is to evaluate the potential of Sentinel-2 images using statistical analysis to retrieve information on plant density for monitoring, especially during critical period at the beginning of March. The model was calibrated with in-situ data from 19 winter wheat fields in Republic of Ireland during the crop growing season in 2019-2020. The model for plant density resulted in R2 = 0.77, RMSECV = 103 and NRMSE = 14%. This study has shown the potential of using Sentinel-2 to estimate plant density and quantify plant establishment to effectively monitor crop progress and to ensure proper field management.

Keywords: winter wheat, remote sensing, crop monitoring, multivariate analysis

Procedia PDF Downloads 143
41358 Effect of Electronic Banking on the Performance of Deposit Money Banks in Nigeria: Using ATM and Mobile Phone as a Case Study

Authors: Charity Ifunanya Osakwe, Victoria Ogochuchukwu Obi-Nwosu, Chima Kenneth Anachedo

Abstract:

The study investigates how automated teller machines (ATM) and mobile banking affect deposit money banks in the Nigerian economy. The study made use of time series data which were obtained from the Central Bank of Nigeria Statistical Bulletin from 2009 to 2021. The Central Bank of Nigeria (CBN) data on automated teller machine and mobile phones were used to proxy electronic banking while total deposit in banks proxied the performance of deposit money banks. The analysis for the study was done using ordinary least square econometric technique with the aid of economic view statistical package. The results show that the automated teller machine has a positive and significant effect on the total deposits of deposit money banks in Nigeria and that making use of deposits of deposit money banks in Nigeria. It was concluded in the study that e-banking has equally increased banking access to customers and also created room for banks to expand their operations to more customers. The study recommends that banks in Nigeria should prioritize the expansion and maintenance of ATM networks as well as continue to invest in and develop more mobile banking services.

Keywords: electronic, banking, automated teller machines, mobile, deposit

Procedia PDF Downloads 35
41357 A User Identification Technique to Access Big Data Using Cloud Services

Authors: A. R. Manu, V. K. Agrawal, K. N. Balasubramanya Murthy

Abstract:

Authentication is required in stored database systems so that only authorized users can access the data and related cloud infrastructures. This paper proposes an authentication technique using multi-factor and multi-dimensional authentication system with multi-level security. The proposed technique is likely to be more robust as the probability of breaking the password is extremely low. This framework uses a multi-modal biometric approach and SMS to enforce additional security measures with the conventional Login/password system. The robustness of the technique is demonstrated mathematically using a statistical analysis. This work presents the authentication system along with the user authentication architecture diagram, activity diagrams, data flow diagrams, sequence diagrams, and algorithms.

Keywords: design, implementation algorithms, performance, biometric approach

Procedia PDF Downloads 455
41356 Detection of Change Points in Earthquakes Data: A Bayesian Approach

Authors: F. A. Al-Awadhi, D. Al-Hulail

Abstract:

In this study, we applied the Bayesian hierarchical model to detect single and multiple change points for daily earthquake body wave magnitude. The change point analysis is used in both backward (off-line) and forward (on-line) statistical research. In this study, it is used with the backward approach. Different types of change parameters are considered (mean, variance or both). The posterior model and the conditional distributions for single and multiple change points are derived and implemented using BUGS software. The model is applicable for any set of data. The sensitivity of the model is tested using different prior and likelihood functions. Using Mb data, we concluded that during January 2002 and December 2003, three changes occurred in the mean magnitude of Mb in Kuwait and its vicinity.

Keywords: multiple change points, Markov Chain Monte Carlo, earthquake magnitude, hierarchical Bayesian mode

Procedia PDF Downloads 440
41355 Studying the Effects of Job Training on Employees Efficiency: A Case Study of University Employees, Qom, Iran

Authors: Seyfollah Fazlollahi, Ahmad Bayan Memar

Abstract:

Background: A review of manpower planning includes a training analysis based on job descriptions and job specifications which looks carefully at training from the points of view of the company, its various departments and personnel. This may show weaknesses in some departments and as a result, training is needed for the staff. Purpose: The aim of this research is to investigate the effects of training on employee’s efficiency in different aspects of work. Methodology: This is a descriptive-survey study. Statistical population was 85 official employees of University of Qom, Iran. 70 of these individuals were selected on a statistical random sampling method using Morgan&Gorki table. The instrument used in this study was a questionnaire including 22 questions. Result: Findings in this study according to data analysis indicate that majority of respondents had positive attitude towards training programs, in the job or off the job. They believed that training programs promoted and enhanced their behavior positively which leads to high efficiency in their job. In fact, data support the main hypothesis that training has positive effects on job performance and efficiency. Conclusion: It is concluded from this study and other related researches that training (on the job and off the job) has positive and effective role in human development and labor as employee’s efficiency. Employees get acquainted with different tasks of a job. Group co-operation, creativity and innovation will be enforced. Training leads to job skills, increasing knowledge and information about a job. It also increases technical and conceptual human skills, which are important in an organization. We can also mention workers' increasing positive motivation toward their job, enforcement of coordinating moral, their good human relations and good contact with clients.

Keywords: training, work efficiency, employee, human relation, job satisfaction

Procedia PDF Downloads 184
41354 Identification of Hepatocellular Carcinoma Using Supervised Learning Algorithms

Authors: Sagri Sharma

Abstract:

Analysis of diseases integrating multi-factors increases the complexity of the problem and therefore, development of frameworks for the analysis of diseases is an issue that is currently a topic of intense research. Due to the inter-dependence of the various parameters, the use of traditional methodologies has not been very effective. Consequently, newer methodologies are being sought to deal with the problem. Supervised Learning Algorithms are commonly used for performing the prediction on previously unseen data. These algorithms are commonly used for applications in fields ranging from image analysis to protein structure and function prediction and they get trained using a known dataset to come up with a predictor model that generates reasonable predictions for the response to new data. Gene expression profiles generated by DNA analysis experiments can be quite complex since these experiments can involve hypotheses involving entire genomes. The application of well-known machine learning algorithm - Support Vector Machine - to analyze the expression levels of thousands of genes simultaneously in a timely, automated and cost effective way is thus used. The objectives to undertake the presented work are development of a methodology to identify genes relevant to Hepatocellular Carcinoma (HCC) from gene expression dataset utilizing supervised learning algorithms and statistical evaluations along with development of a predictive framework that can perform classification tasks on new, unseen data.

Keywords: artificial intelligence, biomarker, gene expression datasets, hepatocellular carcinoma, machine learning, supervised learning algorithms, support vector machine

Procedia PDF Downloads 415
41353 Analysis of Expression Data Using Unsupervised Techniques

Authors: M. A. I Perera, C. R. Wijesinghe, A. R. Weerasinghe

Abstract:

his study was conducted to review and identify the unsupervised techniques that can be employed to analyze gene expression data in order to identify better subtypes of tumors. Identifying subtypes of cancer help in improving the efficacy and reducing the toxicity of the treatments by identifying clues to find target therapeutics. Process of gene expression data analysis described under three steps as preprocessing, clustering, and cluster validation. Feature selection is important since the genomic data are high dimensional with a large number of features compared to samples. Hierarchical clustering and K Means are often used in the analysis of gene expression data. There are several cluster validation techniques used in validating the clusters. Heatmaps are an effective external validation method that allows comparing the identified classes with clinical variables and visual analysis of the classes.

Keywords: cancer subtypes, gene expression data analysis, clustering, cluster validation

Procedia PDF Downloads 131
41352 An Infinite Mixture Model for Modelling Stutter Ratio in Forensic Data Analysis

Authors: M. A. C. S. Sampath Fernando, James M. Curran, Renate Meyer

Abstract:

Forensic DNA analysis has received much attention over the last three decades, due to its incredible usefulness in human identification. The statistical interpretation of DNA evidence is recognised as one of the most mature fields in forensic science. Peak heights in an Electropherogram (EPG) are approximately proportional to the amount of template DNA in the original sample being tested. A stutter is a minor peak in an EPG, which is not masking as an allele of a potential contributor, and considered as an artefact that is presumed to be arisen due to miscopying or slippage during the PCR. Stutter peaks are mostly analysed in terms of stutter ratio that is calculated relative to the corresponding parent allele height. Analysis of mixture profiles has always been problematic in evidence interpretation, especially with the presence of PCR artefacts like stutters. Unlike binary and semi-continuous models; continuous models assign a probability (as a continuous weight) for each possible genotype combination, and significantly enhances the use of continuous peak height information resulting in more efficient reliable interpretations. Therefore, the presence of a sound methodology to distinguish between stutters and real alleles is essential for the accuracy of the interpretation. Sensibly, any such method has to be able to focus on modelling stutter peaks. Bayesian nonparametric methods provide increased flexibility in applied statistical modelling. Mixture models are frequently employed as fundamental data analysis tools in clustering and classification of data and assume unidentified heterogeneous sources for data. In model-based clustering, each unknown source is reflected by a cluster, and the clusters are modelled using parametric models. Specifying the number of components in finite mixture models, however, is practically difficult even though the calculations are relatively simple. Infinite mixture models, in contrast, do not require the user to specify the number of components. Instead, a Dirichlet process, which is an infinite-dimensional generalization of the Dirichlet distribution, is used to deal with the problem of a number of components. Chinese restaurant process (CRP), Stick-breaking process and Pólya urn scheme are frequently used as Dirichlet priors in Bayesian mixture models. In this study, we illustrate an infinite mixture of simple linear regression models for modelling stutter ratio and introduce some modifications to overcome weaknesses associated with CRP.

Keywords: Chinese restaurant process, Dirichlet prior, infinite mixture model, PCR stutter

Procedia PDF Downloads 309
41351 Extraction of Compound Words in Malay Sentences Using Linguistic and Statistical Approaches

Authors: Zamri Abu Bakar Zamri, Normaly Kamal Ismail Normaly, Mohd Izani Mohamed Rawi Izani

Abstract:

Malay noun compound are phrases that consist of two or more nouns. The key characteristic behind noun compounds lies on its frequent occurrences within the text. Therefore, extracting these noun compounds is essential for several domains of research such as Information Retrieval, Sentiment Analysis and Question Answering. Many research efforts have been proposed in terms of extracting Malay noun compounds using linguistic and statistical approaches. Most of the existing methods have concentrated on the extraction of bi-gram noun+noun compound. However, extracting noun+verb, noun+adjective and noun+prepositional is challenging due to the difficulty of selecting an appropriate method with effective results. Thus, there is still room for improvement in terms of enhancing the effectiveness of compound word extraction. Therefore, this study proposed a combination of linguistic approach and statistical measures in order to enhance the extraction of compound words. Several preprocessing steps are involved including normalization, tokenization, and stemming. The linguistic approach that has been used in this study is Part-of-Speech (POS) tagging. In addition, a new linguistic pattern for named entities has been utilized using a list of Malays named entities in order to enhance the linguistic approach in terms of noun compound recognition. The proposed statistical measures consists of NC-value, NTC-value and NLC value.

Keywords: Compound Word, Noun Compound, Linguistic Approach, Statistical Approach

Procedia PDF Downloads 328
41350 Evaluation of Diagnosis Performance Based on Pairwise Model Construction and Filtered Data

Authors: Hyun-Woo Cho

Abstract:

It is quite important to utilize right time and intelligent production monitoring and diagnosis of industrial processes in terms of quality and safety issues. When compared with monitoring task, fault diagnosis represents the task of finding process variables responsible causing a specific fault in the process. It can be helpful to process operators who should investigate and eliminate root causes more effectively and efficiently. This work focused on the active use of combining a nonlinear statistical technique with a preprocessing method in order to implement practical real-time fault identification schemes for data-rich cases. To compare its performance to existing identification schemes, a case study on a benchmark process was performed in several scenarios. The results showed that the proposed fault identification scheme produced more reliable diagnosis results than linear methods. In addition, the use of the filtering step improved the identification results for the complicated processes with massive data sets.

Keywords: diagnosis, filtering, nonlinear statistical techniques, process monitoring

Procedia PDF Downloads 223
41349 Dissecting Big Trajectory Data to Analyse Road Network Travel Efficiency

Authors: Rania Alshikhe, Vinita Jindal

Abstract:

Digital innovation has played a crucial role in managing smart transportation. For this, big trajectory data collected from traveling vehicles, such as taxis through installed global positioning system (GPS)-enabled devices can be utilized. It offers an unprecedented opportunity to trace the movements of vehicles in fine spatiotemporal granularity. This paper aims to explore big trajectory data to measure the travel efficiency of road networks using the proposed statistical travel efficiency measure (STEM) across an entire city. Further, it identifies the cause of low travel efficiency by proposed least square approximation network-based causality exploration (LANCE). Finally, the resulting data analysis reveals the causes of low travel efficiency, along with the road segments that need to be optimized to improve the traffic conditions and thus minimize the average travel time from given point A to point B in the road network. Obtained results show that our proposed approach outperforms the baseline algorithms for measuring the travel efficiency of the road network.

Keywords: GPS trajectory, road network, taxi trips, digital map, big data, STEM, LANCE

Procedia PDF Downloads 145
41348 Attributes That Influence Respondents When Choosing a Mate in Internet Dating Sites: An Innovative Matching Algorithm

Authors: Moti Zwilling, Srečko Natek

Abstract:

This paper aims to present an innovative predictive analytics analysis in order to find the best combination between two consumers who strive to find their partner or in internet sites. The methodology shown in this paper is based on analysis of consumer preferences and involves data mining and machine learning search techniques. The study is composed of two parts: The first part examines by means of descriptive statistics the correlations between a set of parameters that are taken between man and women where they intent to meet each other through the social media, usually the internet. In this part several hypotheses were examined and statistical analysis were taken place. Results show that there is a strong correlation between the affiliated attributes of man and woman as long as concerned to how they present themselves in a social media such as "Facebook". One interesting issue is the strong desire to develop a serious relationship between most of the respondents. In the second part, the authors used common data mining algorithms to search and classify the most important and effective attributes that affect the response rate of the other side. Results exhibit that personal presentation and education background are found as most affective to achieve a positive attitude to one's profile from the other mate.

Keywords: dating sites, social networks, machine learning, decision trees, data mining

Procedia PDF Downloads 280
41347 Assessment of the Effects of Urban Development on Urban Heat Islands and Community Perception in Semi-Arid Climates: Integrating Remote Sensing, GIS Tools, and Social Analysis - A Case Study of the Aures Region (Khanchela), Algeria

Authors: Amina Naidja, Zedira Khammar, Ines Soltani

Abstract:

This study investigates the impact of urban development on the urban heat island (UHI) effect in the semi-arid Aures region of Algeria, integrating remote sensing data with statistical analysis and community surveys to examine the interconnected environmental and social dynamics. Using Landsat 8 satellite imagery, temporal variations in the Normalized Difference Vegetation Index (NDVI), Normalized Difference Built-up Index (NDBI), and land use/land cover (LULC) changes are analyzed to understand patterns of urbanization and environmental transformation. These environmental metrics are correlated with land surface temperature (LST) data derived from remote sensing to quantify the UHI effect. To incorporate the social dimension, a structured questionnaire survey is conducted among residents in selected urban areas. The survey assesses community perceptions of urban heat, its impacts on daily life, health concerns, and coping strategies. Statistical analysis is employed to analyze survey responses, identifying correlations between demographic factors, socioeconomic status, and perceived heat stress. Preliminary findings reveal significant correlations between built-up areas (NDBI) and higher LST, indicating the contribution of urbanization to local warming. Conversely, areas with higher vegetation cover (NDVI) exhibit lower LST, highlighting the cooling effect of green spaces. Social survey results provide insights into how UHI affects different demographic groups, with vulnerable populations experiencing greater heat-related challenges. By integrating remote sensing analysis with statistical modeling and community surveys, this study offers a comprehensive understanding of the environmental and social implications of urban development in semi-arid climates. The findings contribute to evidence-based urban planning strategies that prioritize environmental sustainability and social well-being. Future research should focus on policy recommendations and community engagement initiatives to mitigate UHI impacts and promote climate-resilient urban development.

Keywords: urban heat island, remote sensing, social analysis, NDVI, NDBI, LST, community perception

Procedia PDF Downloads 20
41346 Coastal Environment: Statistical Analysis and Geomorphic Impact on Urban Tourism in Lagos, Portugal

Authors: Magdalena Kuleta

Abstract:

Ponta de Piedade (37º05 ' N, 08º40 ' W) is an area located in the southern part of the Lagos municipality, which include an abrasive and accumulative type of coastline. It is the one of the main touristic destinations of the city. The dynamic development of the attractiveness of the coast, is related with the expansion of the new tourism infrastructure and urban tourism products. These products are: transportation, sightseeing and entertainment in the form of the boat trips. Each type of excursion refers to the different product. This progress brings also many risks associated primarily with landslides cliffs. Natural conditions affecting the coast, create a huge impact on the evolution of urban tourism management. Based on observation, statistical analysis and survey method, author compare the period of six years from 2012 to 2016 in terms of the number of tourists, number and diversity of attractions, most frequently dialled products and infrastructure changes in the city. Carried methodology is based on data belonging to Turismo Portugal and the tourist company Days of Adventure. Main result, is to indicate the essence of the income from coastal tourism into the city development and how does it influence on the marketing and promoting of urban tourism in Lagos.

Keywords: geomorphology of the coast in Lagos, market and promotion, quality of tourism service, urban tourism products

Procedia PDF Downloads 299
41345 Drug Therapy Problem and Its Contributing Factors among Pediatric Patients with Infectious Diseases Admitted to Jimma University Medical Center, South West Ethiopia: Prospective Observational Study

Authors: Desalegn Feyissa Desu

Abstract:

Drug therapy problem is a significant challenge to provide high quality health care service for the patients. It is associated with morbidity, mortality, increased hospital stay, and reduced quality of life. Moreover, pediatric patients are quite susceptible to drug therapy problems. Thus this study aimed to assess drug therapy problem and its contributing factors among pediatric patients diagnosed with infectious disease admitted to pediatric ward of Jimma university medical center, from April 1 to June 30, 2018. Prospective observational study was conducted among pediatric patients with infectious disease admitted from April 01 to June 30, 2018. Drug therapy problems were identified by using Cipolle’s and strand’s drug related problem classification method. Patient’s written informed consent was obtained after explaining the purpose of the study. Patient’s specific data were collected using structured questionnaire. Data were entered into Epi data version 4.0.2 and then exported to statistical software package version 21.0 for analysis. To identify predictors of drug therapy problems occurrence, multiple stepwise backward logistic regression analysis was done. The 95% CI was used to show the accuracy of data analysis and statistical significance was considered at p-value < 0.05. A total of 304 pediatric patients were included in the study. Of these, 226(74.3%) patients had at least one drug therapy problem during their hospital stay. A total of 356 drug therapy problems were identified among two hundred twenty six patients. Non-compliance (28.65%) and dose too low (27.53%) were the most common type of drug related problems while disease comorbidity [AOR=3.39, 95% CI= (1.89-6.08)], Polypharmacy [AOR=3.16, 95% CI= (1.61-6.20)] and more than six days stay in hospital [AOR=3.37, 95% CI= (1.71-6.64) were independent predictors of drug therapy problem occurrence. Drug therapy problems were common in pediatric patients with infectious disease in the study area. Presence of comorbidity, polypharmacy and prolonged hospital stay were the predictors of drug therapy problem in study area. Therefore, to overcome the significant gaps in pediatric pharmaceutical care, clinical pharmacists, Pediatricians, and other health care professionals have to work in collaboration.

Keywords: drug therapy problem, pediatric, infectious disease, Ethiopia

Procedia PDF Downloads 138
41344 Anxiety and Depression in Caregivers of Autistic Children

Authors: Mou Juliet Rebeiro, S. M. Abul Kalam Azad

Abstract:

This study was carried out to see the anxiety and depression in caregivers of autistic children. The objectives of the research were to assess depression and anxiety among caregivers of autistic children and to find out the experience of caregivers. For this purpose, the research was conducted on a sample of 39 caregivers of autistic children. Participants were taken from a special school. To collect data for this study each of the caregivers were administered questionnaire comprising scales to measure anxiety and depression and some responses of the participants were taken through interview based on a topic guide. Obtained quantitative data were analyzed by using statistical analysis and qualitative data were analyzed according to themes. Mean of the anxiety score (55.85) and depression score (108.33) is above the cutoff point. Results showed that anxiety and depression is clinically present in caregivers of autistic children. Most of the caregivers experienced behavior, emotional, cognitive and social problems of their child that is linked with anxiety and depression.

Keywords: anxiety, autism, caregiver, depression

Procedia PDF Downloads 281
41343 Risk of Heatstroke Occurring in Indoor Built Environment Determined with Nationwide Sports and Health Database and Meteorological Outdoor Data

Authors: Go Iwashita

Abstract:

The paper describes how the frequencies of heatstroke occurring in indoor built environment are related to the outdoor thermal environment with big statistical data. As the statistical accident data of heatstroke, the nationwide accident data were obtained from the National Agency for the Advancement of Sports and Health (NAASH) . The meteorological database of the Japanese Meteorological Agency supplied data about 1-hour average temperature, humidity, wind speed, solar radiation, and so forth. Each heatstroke data point from the NAASH database was linked to the meteorological data point acquired from the nearest meteorological station where the accident of heatstroke occurred. This analysis was performed for a 10-year period (2005–2014). During the 10-year period, 3,819 cases of heatstroke were reported in the NAASH database for the investigated secondary/high schools of the nine Japanese representative cities. Heatstroke most commonly occurred in the outdoor schoolyard at a wet-bulb globe temperature (WBGT) of 31°C and in the indoor gymnasium during athletic club activities at a WBGT > 31°C. The determined accident ratio (number of accidents during each club activity divided by the club’s population) in the gymnasium during the female badminton club activities was the highest. Although badminton is played in a gymnasium, these WBGT results show that the risk level during badminton under hot and humid conditions is equal to that of baseball or rugby played in the schoolyard. Except sports, the high risk of heatstroke was observed in schools houses during cultural activities. The risk level for indoor environment under hot and humid condition would be equal to that for outdoor environment based on the above results of WBGT. Therefore control measures against hot and humid indoor condition were needed as installing air conditions not only schools but also residences.

Keywords: accidents in schools, club activity, gymnasium, heatstroke

Procedia PDF Downloads 202
41342 The Influence of the Vocational Teachers Empowerment toward the Vocational High Schools’ Performance Based on the Education National Standards of Indonesia

Authors: Abdul Haris Setiawan

Abstract:

Teachers empowerment is one of the important factors considered to contribute significantly to the achievement of the national education goals. This study was conducted to determine the influence on the vocational teachers empowerment toward the performance of the vocational high schools based on the Education National Standards of Indonesia. The population of the study was all vocational teachers at the State Vocational High schools in Surakarta, Central Java Province, Indonesia. The sampling technique used proportional random sampling technique. This study used a quantitative descriptive statistical analysis techniques. The data was collected using questionnaires. The data has been collected and then tested using analysis requirements test. Having tested using the requirements analysis and then the data processed using regression analysis between the independent and dependent variables to determine the effect and the regression equation. The results of the study found that the level of vocational high schools’ performance based on the Education National Standards of Indonesia was 74.29%, including in the high category; the level of vocational teachers empowerment was 76.20%, including in the high category; there was a positive influence of vocational teachers empowerment toward the vocational high schools’ performance based on the Education National Standards of Indonesia with a correlation coefficient of 0,886, and a contribution of 78.50% with the regression equation Y = 79.431 +0.534 X.

Keywords: vocational teachers, empowerment, vocational high school, the education national standards

Procedia PDF Downloads 381
41341 Data Mining Algorithms Analysis: Case Study of Price Predictions of Lands

Authors: Julio Albuja, David Zaldumbide

Abstract:

Data analysis is an important step before taking a decision about money. The aim of this work is to analyze the factors that influence the final price of the houses through data mining algorithms. To our best knowledge, previous work was researched just to compare results. Furthermore, before using the data of the data set, the Z-Transformation were used to standardize the data in the same range. Hence, the data was classified into two groups to visualize them in a readability format. A decision tree was built, and graphical data is displayed where clearly is easy to see the results and the factors' influence in these graphics. The definitions of these methods are described, as well as the descriptions of the results. Finally, conclusions and recommendations are presented related to the released results that our research showed making it easier to apply these algorithms using a customized data set.

Keywords: algorithms, data, decision tree, transformation

Procedia PDF Downloads 356