Search results for: statistical data
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 25738

Search results for: statistical data

25498 Direct Translation vs. Pivot Language Translation for Persian-Spanish Low-Resourced Statistical Machine Translation System

Authors: Benyamin Ahmadnia, Javier Serrano

Abstract:

In this paper we compare two different approaches for translating from Persian to Spanish, as a language pair with scarce parallel corpus. The first approach involves direct transfer using an statistical machine translation system, which is available for this language pair. The second approach involves translation through English, as a pivot language, which has more translation resources and more advanced translation systems available. The results show that, it is possible to achieve better translation quality using English as a pivot language in either approach outperforms direct translation from Persian to Spanish. Our best result is the pivot system which scores higher than direct translation by (1.12) BLEU points.

Keywords: statistical machine translation, direct translation approach, pivot language translation approach, parallel corpus

Procedia PDF Downloads 459
25497 Copula-Based Estimation of Direct and Indirect Effects in Path Analysis Model

Authors: Alam Ali, Ashok Kumar Pathak

Abstract:

Path analysis is a statistical technique used to evaluate the strength of the direct and indirect effects of variables. One or more structural regression equations are used to estimate a series of parameters in order to find the better fit of data. Sometimes, exogenous variables do not show a significant strength of their direct and indirect effect when the assumption of classical regression (ordinary least squares (OLS)) are violated by the nature of the data. The main motive of this article is to investigate the efficacy of the copula-based regression approach over the classical regression approach and calculate the direct and indirect effects of variables when data violates the OLS assumption and variables are linked through an elliptical copula. We perform this study using a well-organized numerical scheme. Finally, a real data application is also presented to demonstrate the performance of the superiority of the copula approach.

Keywords: path analysis, copula-based regression models, direct and indirect effects, k-fold cross validation technique

Procedia PDF Downloads 46
25496 Wind Power Forecast Error Simulation Model

Authors: Josip Vasilj, Petar Sarajcev, Damir Jakus

Abstract:

One of the major difficulties introduced with wind power penetration is the inherent uncertainty in production originating from uncertain wind conditions. This uncertainty impacts many different aspects of power system operation, especially the balancing power requirements. For this reason, in power system development planing, it is necessary to evaluate the potential uncertainty in future wind power generation. For this purpose, simulation models are required, reproducing the performance of wind power forecasts. This paper presents a wind power forecast error simulation models which are based on the stochastic process simulation. Proposed models capture the most important statistical parameters recognized in wind power forecast error time series. Furthermore, two distinct models are presented based on data availability. First model uses wind speed measurements on potential or existing wind power plant locations, while the seconds model uses statistical distribution of wind speeds.

Keywords: wind power, uncertainty, stochastic process, Monte Carlo simulation

Procedia PDF Downloads 451
25495 Predictive Analytics for Theory Building

Authors: Ho-Won Jung, Donghun Lee, Hyung-Jin Kim

Abstract:

Predictive analytics (data analysis) uses a subset of measurements (the features, predictor, or independent variable) to predict another measurement (the outcome, target, or dependent variable) on a single person or unit. It applies empirical methods in statistics, operations research, and machine learning to predict the future, or otherwise unknown events or outcome on a single or person or unit, based on patterns in data. Most analyses of metabolic syndrome are not predictive analytics but statistical explanatory studies that build a proposed model (theory building) and then validate metabolic syndrome predictors hypothesized (theory testing). A proposed theoretical model forms with causal hypotheses that specify how and why certain empirical phenomena occur. Predictive analytics and explanatory modeling have their own territories in analysis. However, predictive analytics can perform vital roles in explanatory studies, i.e., scientific activities such as theory building, theory testing, and relevance assessment. In the context, this study is to demonstrate how to use our predictive analytics to support theory building (i.e., hypothesis generation). For the purpose, this study utilized a big data predictive analytics platform TM based on a co-occurrence graph. The co-occurrence graph is depicted with nodes (e.g., items in a basket) and arcs (direct connections between two nodes), where items in a basket are fully connected. A cluster is a collection of fully connected items, where the specific group of items has co-occurred in several rows in a data set. Clusters can be ranked using importance metrics, such as node size (number of items), frequency, surprise (observed frequency vs. expected), among others. The size of a graph can be represented by the numbers of nodes and arcs. Since the size of a co-occurrence graph does not depend directly on the number of observations (transactions), huge amounts of transactions can be represented and processed efficiently. For a demonstration, a total of 13,254 metabolic syndrome training data is plugged into the analytics platform to generate rules (potential hypotheses). Each observation includes 31 predictors, for example, associated with sociodemographic, habits, and activities. Some are intentionally included to get predictive analytics insights on variable selection such as cancer examination, house type, and vaccination. The platform automatically generates plausible hypotheses (rules) without statistical modeling. Then the rules are validated with an external testing dataset including 4,090 observations. Results as a kind of inductive reasoning show potential hypotheses extracted as a set of association rules. Most statistical models generate just one estimated equation. On the other hand, a set of rules (many estimated equations from a statistical perspective) in this study may imply heterogeneity in a population (i.e., different subpopulations with unique features are aggregated). Next step of theory development, i.e., theory testing, statistically tests whether a proposed theoretical model is a plausible explanation of a phenomenon interested in. If hypotheses generated are tested statistically with several thousand observations, most of the variables will become significant as the p-values approach zero. Thus, theory validation needs statistical methods utilizing a part of observations such as bootstrap resampling with an appropriate sample size.

Keywords: explanatory modeling, metabolic syndrome, predictive analytics, theory building

Procedia PDF Downloads 244
25494 Comparative Study to Evaluate Chronological Age and Dental Age in North Indian Population Using Cameriere Method

Authors: Ranjitkumar Patil

Abstract:

Age estimation has its importance in forensic dentistry. Dental age estimation has emerged as an alternative to skeletal age determination. The methods based on stages of tooth formation, as appreciated on radiographs, seems to be more appropriate in the assessment of age than those based on skeletal development. The study was done to evaluate dental age in north Indian population using Cameriere’s method. Aims/Objectives: The study was conducted to assess the dental age of North Indian children using Cameriere’smethodand to compare the chronological age and dental age for validation of the Cameriere’smethod in the north Indian population. A comparative study of 02 year duration on the OPG (using PLANMECA Promax 3D) data of 497 individuals with age ranging from 5 to 15 years was done based on simple random technique ethical approval obtained from the institutional ethical committee. The data was obtained based on inclusion and exclusion criteria was analyzed by a software for dental age estimation. Statistical analysis: Student’s t test was used to compare the morphological variables of males with those of females and to compare observed age with estimated age. Regression formula was also calculated. Results: Present study was a comparative study of 497 subjects with a distribution between male and female, with their dental age assessed by using Panoramic radiograph, following the method described by Cameriere, which is widely accepted. Statistical analysis in our study indicated that gender does not have a significant influence on age estimation. (R2= 0.787). Conclusion: This infers that cameriere’s method can be effectively applied in north Indianpopulation.

Keywords: Forensic, Chronological Age, Dental Age, Skeletal Age

Procedia PDF Downloads 64
25493 The Influence of Housing Choice Vouchers on the Private Rental Market

Authors: Randy D. Colon

Abstract:

Through a freedom of information request, data pertaining to Housing Choice Voucher (HCV) households has been obtained from the Chicago Housing Authority, including rent price and number of bedrooms per HCV household, community area, and zip code from 2013 to the first quarter of 2018. Similar data pertaining to the private rental market will be obtained through public records found through the United States Department of Housing and Urban Development. The datasets will be analyzed through statistical and mapping software to investigate the potential link between HCV households and distorted rent prices. Quantitative data will be supplemented by qualitative data to investigate the lived experience of Chicago residents. Qualitative data will be collected at community meetings in the Chicago Englewood neighborhood through participation in neighborhood meetings and informal interviews with residents and community leaders. The qualitative data will be used to gain insight on the lived experience of community leaders and residents of the Englewood neighborhood in relation to housing, the rental market, and HCV. While there is an abundance of quantitative data on this subject, this qualitative data is necessary to capture the lived experience of local residents effected by a changing rental market. This topic reflects concerns voiced by members of the Englewood community, and this study aims to keep the community relevant in its findings.

Keywords: Chicago, housing, housing choice voucher program, housing subsidies, rental market

Procedia PDF Downloads 83
25492 Materialized View Effect on Query Performance

Authors: Yusuf Ziya Ayık, Ferhat Kahveci

Abstract:

Currently, database management systems have various tools such as backup and maintenance, and also provide statistical information such as resource usage and security. In terms of query performance, this paper covers query optimization, views, indexed tables, pre-computation materialized view, query performance analysis in which query plan alternatives can be created and the least costly one selected to optimize a query. Indexes and views can be created for related table columns. The literature review of this study showed that, in the course of time, despite the growing capabilities of the database management system, only database administrators are aware of the need for dealing with archival and transactional data types differently. These data may be constantly changing data used in everyday life, and also may be from the completed questionnaire whose data input was completed. For both types of data, the database uses its capabilities; but as shown in the findings section, instead of repeating similar heavy calculations which are carrying out same results with the same query over a survey results, using materialized view results can be in a more simple way. In this study, this performance difference was observed quantitatively considering the cost of the query.

Keywords: cost of query, database management systems, materialized view, query performance

Procedia PDF Downloads 250
25491 Anxiety and Depression in Caregivers of Autistic Children

Authors: Mou Juliet Rebeiro, S. M. Abul Kalam Azad

Abstract:

This study was carried out to see the anxiety and depression in caregivers of autistic children. The objectives of the research were to assess depression and anxiety among caregivers of autistic children and to find out the experience of caregivers. For this purpose, the research was conducted on a sample of 39 caregivers of autistic children. Participants were taken from a special school. To collect data for this study each of the caregivers were administered questionnaire comprising scales to measure anxiety and depression and some responses of the participants were taken through interview based on a topic guide. Obtained quantitative data were analyzed by using statistical analysis and qualitative data were analyzed according to themes. Mean of the anxiety score (55.85) and depression score (108.33) is above the cutoff point. Results showed that anxiety and depression is clinically present in caregivers of autistic children. Most of the caregivers experienced behavior, emotional, cognitive and social problems of their child that is linked with anxiety and depression.

Keywords: anxiety, autism, caregiver, depression

Procedia PDF Downloads 267
25490 Automatic Tuning for a Systemic Model of Banking Originated Losses (SYMBOL) Tool on Multicore

Authors: Ronal Muresano, Andrea Pagano

Abstract:

Nowadays, the mathematical/statistical applications are developed with more complexity and accuracy. However, these precisions and complexities have brought as result that applications need more computational power in order to be executed faster. In this sense, the multicore environments are playing an important role to improve and to optimize the execution time of these applications. These environments allow us the inclusion of more parallelism inside the node. However, to take advantage of this parallelism is not an easy task, because we have to deal with some problems such as: cores communications, data locality, memory sizes (cache and RAM), synchronizations, data dependencies on the model, etc. These issues are becoming more important when we wish to improve the application’s performance and scalability. Hence, this paper describes an optimization method developed for Systemic Model of Banking Originated Losses (SYMBOL) tool developed by the European Commission, which is based on analyzing the application's weakness in order to exploit the advantages of the multicore. All these improvements are done in an automatic and transparent manner with the aim of improving the performance metrics of our tool. Finally, experimental evaluations show the effectiveness of our new optimized version, in which we have achieved a considerable improvement on the execution time. The time has been reduced around 96% for the best case tested, between the original serial version and the automatic parallel version.

Keywords: algorithm optimization, bank failures, OpenMP, parallel techniques, statistical tool

Procedia PDF Downloads 344
25489 Time of Week Intensity Estimation from Interval Censored Data with Application to Police Patrol Planning

Authors: Jiahao Tian, Michael D. Porter

Abstract:

Law enforcement agencies are tasked with crime prevention and crime reduction under limited resources. Having an accurate temporal estimate of the crime rate would be valuable to achieve such a goal. However, estimation is usually complicated by the interval-censored nature of crime data. We cast the problem of intensity estimation as a Poisson regression using an EM algorithm to estimate the parameters. Two special penalties are added that provide smoothness over the time of day and day of the week. This approach presented here provides accurate intensity estimates and can also uncover day-of-week clusters that share the same intensity patterns. Anticipating where and when crimes might occur is a key element to successful policing strategies. However, this task is complicated by the presence of interval-censored data. The censored data refers to the type of data that the event time is only known to lie within an interval instead of being observed exactly. This type of data is prevailing in the field of criminology because of the absence of victims for certain types of crime. Despite its importance, the research in temporal analysis of crime has lagged behind the spatial component. Inspired by the success of solving crime-related problems with a statistical approach, we propose a statistical model for the temporal intensity estimation of crime with censored data. The model is built on Poisson regression and has special penalty terms added to the likelihood. An EM algorithm was derived to obtain maximum likelihood estimates, and the resulting model shows superior performance to the competing model. Our research is in line with the smart policing initiative (SPI) proposed by the Bureau Justice of Assistance (BJA) as an effort to support law enforcement agencies in building evidence-based, data-driven law enforcement tactics. The goal is to identify strategic approaches that are effective in crime prevention and reduction. In our case, we allow agencies to deploy their resources for a relatively short period of time to achieve the maximum level of crime reduction. By analyzing a particular area within cities where data are available, our proposed approach could not only provide an accurate estimate of intensities for the time unit considered but a time-variation crime incidence pattern. Both will be helpful in the allocation of limited resources by either improving the existing patrol plan with the understanding of the discovery of the day of week cluster or supporting extra resources available.

Keywords: cluster detection, EM algorithm, interval censoring, intensity estimation

Procedia PDF Downloads 41
25488 Different Data-Driven Bivariate Statistical Approaches to Landslide Susceptibility Mapping (Uzundere, Erzurum, Turkey)

Authors: Azimollah Aleshzadeh, Enver Vural Yavuz

Abstract:

The main goal of this study is to produce landslide susceptibility maps using different data-driven bivariate statistical approaches; namely, entropy weight method (EWM), evidence belief function (EBF), and information content model (ICM), at Uzundere county, Erzurum province, in the north-eastern part of Turkey. Past landslide occurrences were identified and mapped from an interpretation of high-resolution satellite images, and earlier reports as well as by carrying out field surveys. In total, 42 landslide incidence polygons were mapped using ArcGIS 10.4.1 software and randomly split into a construction dataset 70 % (30 landslide incidences) for building the EWM, EBF, and ICM models and the remaining 30 % (12 landslides incidences) were used for verification purposes. Twelve layers of landslide-predisposing parameters were prepared, including total surface radiation, maximum relief, soil groups, standard curvature, distance to stream/river sites, distance to the road network, surface roughness, land use pattern, engineering geological rock group, topographical elevation, the orientation of slope, and terrain slope gradient. The relationships between the landslide-predisposing parameters and the landslide inventory map were determined using different statistical models (EWM, EBF, and ICM). The model results were validated with landslide incidences, which were not used during the model construction. In addition, receiver operating characteristic curves were applied, and the area under the curve (AUC) was determined for the different susceptibility maps using the success (construction data) and prediction (verification data) rate curves. The results revealed that the AUC for success rates are 0.7055, 0.7221, and 0.7368, while the prediction rates are 0.6811, 0.6997, and 0.7105 for EWM, EBF, and ICM models, respectively. Consequently, landslide susceptibility maps were classified into five susceptibility classes, including very low, low, moderate, high, and very high. Additionally, the portion of construction and verification landslides incidences in high and very high landslide susceptibility classes in each map was determined. The results showed that the EWM, EBF, and ICM models produced satisfactory accuracy. The obtained landslide susceptibility maps may be useful for future natural hazard mitigation studies and planning purposes for environmental protection.

Keywords: entropy weight method, evidence belief function, information content model, landslide susceptibility mapping

Procedia PDF Downloads 104
25487 Imputation of Incomplete Large-Scale Monitoring Count Data via Penalized Estimation

Authors: Mohamed Dakki, Genevieve Robin, Marie Suet, Abdeljebbar Qninba, Mohamed A. El Agbani, Asmâa Ouassou, Rhimou El Hamoumi, Hichem Azafzaf, Sami Rebah, Claudia Feltrup-Azafzaf, Nafouel Hamouda, Wed a.L. Ibrahim, Hosni H. Asran, Amr A. Elhady, Haitham Ibrahim, Khaled Etayeb, Essam Bouras, Almokhtar Saied, Ashrof Glidan, Bakar M. Habib, Mohamed S. Sayoud, Nadjiba Bendjedda, Laura Dami, Clemence Deschamps, Elie Gaget, Jean-Yves Mondain-Monval, Pierre Defos Du Rau

Abstract:

In biodiversity monitoring, large datasets are becoming more and more widely available and are increasingly used globally to estimate species trends and con- servation status. These large-scale datasets challenge existing statistical analysis methods, many of which are not adapted to their size, incompleteness and heterogeneity. The development of scalable methods to impute missing data in incomplete large-scale monitoring datasets is crucial to balance sampling in time or space and thus better inform conservation policies. We developed a new method based on penalized Poisson models to impute and analyse incomplete monitoring data in a large-scale framework. The method al- lows parameterization of (a) space and time factors, (b) the main effects of predic- tor covariates, as well as (c) space–time interactions. It also benefits from robust statistical and computational capability in large-scale settings. The method was tested extensively on both simulated and real-life waterbird data, with the findings revealing that it outperforms six existing methods in terms of missing data imputation errors. Applying the method to 16 waterbird species, we estimated their long-term trends for the first time at the entire North African scale, a region where monitoring data suffer from many gaps in space and time series. This new approach opens promising perspectives to increase the accuracy of species-abundance trend estimations. We made it freely available in the r package ‘lori’ (https://CRAN.R-project.org/package=lori) and recommend its use for large- scale count data, particularly in citizen science monitoring programmes.

Keywords: biodiversity monitoring, high-dimensional statistics, incomplete count data, missing data imputation, waterbird trends in North-Africa

Procedia PDF Downloads 117
25486 Processing Big Data: An Approach Using Feature Selection

Authors: Nikat Parveen, M. Ananthi

Abstract:

Big data is one of the emerging technology, which collects the data from various sensors and those data will be used in many fields. Data retrieval is one of the major issue where there is a need to extract the exact data as per the need. In this paper, large amount of data set is processed by using the feature selection. Feature selection helps to choose the data which are actually needed to process and execute the task. The key value is the one which helps to point out exact data available in the storage space. Here the available data is streamed and R-Center is proposed to achieve this task.

Keywords: big data, key value, feature selection, retrieval, performance

Procedia PDF Downloads 308
25485 A Comparative Study to Evaluate Chronological Age and Dental Age in the North Indian Population Using Cameriere's Method

Authors: Ranjitkumar Patil

Abstract:

Age estimation has importance in forensic dentistry. Dental age estimation has emerged as an alternative to skeletal age determination. The methods based on stages of tooth formation, as appreciated on radiographs, seem to be more appropriate in the assessment of age than those based on skeletal development. The study was done to evaluate dental age in the north Indian population using Cameriere’s method. Aims/Objectives: The study was conducted to assess the dental age of North Indian children using Cameriere’s method and to compare the chronological age and dental age for validation of the Cameriere’s method in the north Indian population. A comparative study of 02-year duration on the OPG (using PLANMECA Promax 3D) data of 497 individuals with ages ranging from 5 to 15 years was done based on simple random technique ethical approval obtained from institutional ethical committee. The data was obtained based on inclusion and exclusion criteria and was analyzed by software for dental age estimation. Statistical analysis: The student’s t-test was used to compare the morphological variables of males with those of females and to compare observed age with estimated age. The regression formula was also calculated. Results: Present study was a comparative study of 497 subjects with a distribution between males and females, with their dental age assessed by using a Panoramic radiograph, following the method described by Cameriere, which is widely accepted. Statistical analysis in our study indicated that gender does not have a significant influence on age estimation. (R2= 0.787). Conclusion: This infers that Cameriere’s method can be effectively applied to the north Indian population.

Keywords: forensic, dental age, skeletal age, chronological age, Cameriere’s method

Procedia PDF Downloads 93
25484 Statistical Analysis of Rainfall Change over the Blue Nile Basin

Authors: Hany Mustafa, Mahmoud Roushdi, Khaled Kheireldin

Abstract:

Rainfall variability is an important feature of semi-arid climates. Climate change is very likely to increase the frequency, magnitude, and variability of extreme weather events such as droughts, floods, and storms. The Blue Nile Basin is facing extreme climate change-related events such as floods and droughts and its possible impacts on ecosystem, livelihood, agriculture, livestock, and biodiversity are expected. Rainfall variability is a threat to food production in the Blue Nile Basin countries. This study investigates the long-term variations and trends of seasonal and annual precipitation over the Blue Nile Basin for 102-year period (1901-2002). Six statistical trend analysis of precipitation was performed with nonparametric Mann-Kendall test and Sen's slope estimator. On the other hands, four statistical absolute homogeneity tests: Standard Normal Homogeneity Test, Buishand Range test, Pettitt test and the Von Neumann ratio test were applied to test the homogeneity of the rainfall data, using XLSTAT software, which results of p-valueless than alpha=0.05, were significant. The percentages of significant trends obtained for each parameter in the different seasons are presented. The study recommends adaptation strategies to be streamlined to relevant policies, enhancing local farmers’ adaptive capacity for facing future climate change effects.

Keywords: Blue Nile basin, climate change, Mann-Kendall test, trend analysis

Procedia PDF Downloads 503
25483 Radar Signal Detection Using Neural Networks in Log-Normal Clutter for Multiple Targets Situations

Authors: Boudemagh Naime

Abstract:

Automatic radar detection requires some methods of adapting to variations in the background clutter in order to control their false alarm rate. The problem becomes more complicated in non-Gaussian environment. In fact, the conventional approach in real time applications requires a complex statistical modeling and much computational operations. To overcome these constraints, we propose another approach based on artificial neural network (ANN-CMLD-CFAR) using a Back Propagation (BP) training algorithm. The considered environment follows a log-normal distribution in the presence of multiple Rayleigh-targets. To evaluate the performances of the considered detector, several situations, such as scale parameter and the number of interferes targets, have been investigated. The simulation results show that the ANN-CMLD-CFAR processor outperforms the conventional statistical one.

Keywords: radat detection, ANN-CMLD-CFAR, log-normal clutter, statistical modelling

Procedia PDF Downloads 334
25482 Geostatistical and Geochemical Study of the Aquifer System Waters Complex Terminal in the Valley of Oued Righ-Arid Area Algeria

Authors: Asma Bettahar, Imed Eddine Nezli, Sameh Habes

Abstract:

Groundwater resources in the Oued Righ valley are represented like the parts of the eastern basin of the Algerian Sahara, superposed by two major aquifers: the Intercalary Continental (IC) and the Terminal Complex (TC). From a qualitative point of view, various studies have highlighted that the waters of this region showed excessive mineralization, including the waters of the terminal complex (EC Avg equal 5854.61 S/cm) .The present article is a statistical approach by two multi methods various complementary (ACP, CAH), applied to the analytical data of multilayered aquifer waters Terminal Complex of the Oued Righ valley. The approach is to establish a correlation between the chemical composition of water and the lithological nature of different aquifer levels formations, and predict possible connection between groundwater’s layers. The results show that the mineralization of water is from geological origin. They concern the composition of the layers that make up the complex terminal.

Keywords: complex terminal, mineralization, oued righ, statistical approach

Procedia PDF Downloads 359
25481 Statistical Randomness Testing of Some Second Round Candidate Algorithms of CAESAR Competition

Authors: Fatih Sulak, Betül A. Özdemir, Beyza Bozdemir

Abstract:

In order to improve symmetric key research, several competitions had been arranged by organizations like National Institute of Standards and Technology (NIST) and International Association for Cryptologic Research (IACR). In recent years, the importance of authenticated encryption has rapidly increased because of the necessity of simultaneously enabling integrity, confidentiality and authenticity. Therefore, at January 2013, IACR announced the Competition for Authenticated Encryption: Security, Applicability, and Robustness (CAESAR Competition) which will select secure and efficient algorithms for authenticated encryption. Cryptographic algorithms are anticipated to behave like random mappings; hence, it is important to apply statistical randomness tests to the outputs of the algorithms. In this work, the statistical randomness tests in the NIST Test Suite and the other recently designed randomness tests are applied to six second round algorithms of the CAESAR Competition. It is observed that AEGIS achieves randomness after 3 rounds, Ascon permutation function achieves randomness after 1 round, Joltik encryption function achieves randomness after 9 rounds, Morus state update function achieves randomness after 3 rounds, Pi-cipher achieves randomness after 1 round, and Tiaoxin achieves randomness after 1 round.

Keywords: authenticated encryption, CAESAR competition, NIST test suite, statistical randomness tests

Procedia PDF Downloads 293
25480 Statistical Models and Time Series Forecasting on Crime Data in Nepal

Authors: Dila Ram Bhandari

Abstract:

Throughout the 20th century, new governments were created where identities such as ethnic, religious, linguistic, caste, communal, tribal, and others played a part in the development of constitutions and the legal system of victim and criminal justice. Acute issues with extremism, poverty, environmental degradation, cybercrimes, human rights violations, crime against, and victimization of both individuals and groups have recently plagued South Asian nations. Everyday massive number of crimes are steadfast, these frequent crimes have made the lives of common citizens restless. Crimes are one of the major threats to society and also for civilization. Crime is a bone of contention that can create a societal disturbance. The old-style crime solving practices are unable to live up to the requirement of existing crime situations. Crime analysis is one of the most important activities of the majority of intelligent and law enforcement organizations all over the world. The South Asia region lacks such a regional coordination mechanism, unlike central Asia of Asia Pacific regions, to facilitate criminal intelligence sharing and operational coordination related to organized crime, including illicit drug trafficking and money laundering. There have been numerous conversations in recent years about using data mining technology to combat crime and terrorism. The Data Detective program from Sentient as a software company, uses data mining techniques to support the police (Sentient, 2017). The goals of this internship are to test out several predictive model solutions and choose the most effective and promising one. First, extensive literature reviews on data mining, crime analysis, and crime data mining were conducted. Sentient offered a 7-year archive of crime statistics that were daily aggregated to produce a univariate dataset. Moreover, a daily incidence type aggregation was performed to produce a multivariate dataset. Each solution's forecast period lasted seven days. Statistical models and neural network models were the two main groups into which the experiments were split. For the crime data, neural networks fared better than statistical models. This study gives a general review of the applied statistics and neural network models. A detailed image of each model's performance on the available data and generalizability is provided by a comparative analysis of all the models on a comparable dataset. Obviously, the studies demonstrated that, in comparison to other models, Gated Recurrent Units (GRU) produced greater prediction. The crime records of 2005-2019 which was collected from Nepal Police headquarter and analysed by R programming. In conclusion, gated recurrent unit implementation could give benefit to police in predicting crime. Hence, time series analysis using GRU could be a prospective additional feature in Data Detective.

Keywords: time series analysis, forecasting, ARIMA, machine learning

Procedia PDF Downloads 131
25479 Develop a Conceptual Data Model of Geotechnical Risk Assessment in Underground Coal Mining Using a Cloud-Based Machine Learning Platform

Authors: Reza Mohammadzadeh

Abstract:

The major challenges in geotechnical engineering in underground spaces arise from uncertainties and different probabilities. The collection, collation, and collaboration of existing data to incorporate them in analysis and design for given prospect evaluation would be a reliable, practical problem solving method under uncertainty. Machine learning (ML) is a subfield of artificial intelligence in statistical science which applies different techniques (e.g., Regression, neural networks, support vector machines, decision trees, random forests, genetic programming, etc.) on data to automatically learn and improve from them without being explicitly programmed and make decisions and predictions. In this paper, a conceptual database schema of geotechnical risks in underground coal mining based on a cloud system architecture has been designed. A new approach of risk assessment using a three-dimensional risk matrix supported by the level of knowledge (LoK) has been proposed in this model. Subsequently, the model workflow methodology stages have been described. In order to train data and LoK models deployment, an ML platform has been implemented. IBM Watson Studio, as a leading data science tool and data-driven cloud integration ML platform, is employed in this study. As a Use case, a data set of geotechnical hazards and risk assessment in underground coal mining were prepared to demonstrate the performance of the model, and accordingly, the results have been outlined.

Keywords: data model, geotechnical risks, machine learning, underground coal mining

Procedia PDF Downloads 242
25478 Degumming of Eri Silk Fabric with Ionic Liquid

Authors: Shweta K. Vyas, Rakesh Musale, Sanjeev R. Shukla

Abstract:

Eri silk is a non mulberry silk which is obtained without killing the silkworms and hence it is also known as Ahmisa silk. In the present study, the results on degumming of eri silk with alkaline peroxide have been compared with those obtained by using ionic liquid (IL) 1-Butyl-3-methylimidazolium chloride [BMIM]Cl. Experiments were designed to find out the optimum processing parameters for degumming of eri silk by response surface methodology. The statistical software, Design-Expert 6.0 was used for regression analysis and graphical analysis of the responses obtained by running the set of designed experiments. Analysis of variance (ANOVA) was used to estimate the statistical parameters. The polynomial equation of quadratic order was employed to fit the experimental data. The quality and model terms were evaluated by F-test. Three dimensional surface plots were prepared to study the effect of variables on different responses. The optimum conditions for IL treatment were selected from predicted combinations and the experiments were repeated under these conditions to determine the reproducibility.

Keywords: silk degumming, ionic liquid, response surface methodology, ANOVA

Procedia PDF Downloads 560
25477 The Beta-Fisher Snedecor Distribution with Applications to Cancer Remission Data

Authors: K. A. Adepoju, O. I. Shittu, A. U. Chukwu

Abstract:

In this paper, a new four-parameter generalized version of the Fisher Snedecor distribution called Beta- F distribution is introduced. The comprehensive account of the statistical properties of the new distributions was considered. Formal expressions for the cumulative density function, moments, moment generating function and maximum likelihood estimation, as well as its Fisher information, were obtained. The flexibility of this distribution as well as its robustness using cancer remission time data was demonstrated. The new distribution can be used in most applications where the assumption underlying the use of other lifetime distributions is violated.

Keywords: fisher-snedecor distribution, beta-f distribution, outlier, maximum likelihood method

Procedia PDF Downloads 314
25476 Statistical Characteristics of Distribution of Radiation-Induced Defects under Random Generation

Authors: P. Selyshchev

Abstract:

We consider fluctuations of defects density taking into account their interaction. Stochastic field of displacement generation rate gives random defect distribution. We determinate statistical characteristics (mean and dispersion) of random field of point defect distribution as function of defect generation parameters, temperature and properties of irradiated crystal.

Keywords: irradiation, primary defects, interaction, fluctuations

Procedia PDF Downloads 299
25475 Advances in Artificial intelligence Using Speech Recognition

Authors: Khaled M. Alhawiti

Abstract:

This research study aims to present a retrospective study about speech recognition systems and artificial intelligence. Speech recognition has become one of the widely used technologies, as it offers great opportunity to interact and communicate with automated machines. Precisely, it can be affirmed that speech recognition facilitates its users and helps them to perform their daily routine tasks, in a more convenient and effective manner. This research intends to present the illustration of recent technological advancements, which are associated with artificial intelligence. Recent researches have revealed the fact that speech recognition is found to be the utmost issue, which affects the decoding of speech. In order to overcome these issues, different statistical models were developed by the researchers. Some of the most prominent statistical models include acoustic model (AM), language model (LM), lexicon model, and hidden Markov models (HMM). The research will help in understanding all of these statistical models of speech recognition. Researchers have also formulated different decoding methods, which are being utilized for realistic decoding tasks and constrained artificial languages. These decoding methods include pattern recognition, acoustic phonetic, and artificial intelligence. It has been recognized that artificial intelligence is the most efficient and reliable methods, which are being used in speech recognition.

Keywords: speech recognition, acoustic phonetic, artificial intelligence, hidden markov models (HMM), statistical models of speech recognition, human machine performance

Procedia PDF Downloads 445
25474 A Crowdsourced Homeless Data Collection System and Its Econometric Analysis

Authors: Praniil Nagaraj

Abstract:

This paper proposes a method to collect homeless data using crowdsourcing and presents an approach to analyze the data, demonstrating its potential to strengthen existing and future policies aimed at promoting socio-economic equilibrium. The 2022 Annual Homeless Assessment Report (AHAR) to Congress highlighted alarming statistics, emphasizing the need for effective decision-making and budget allocation within local planning bodies known as Continuums of Care (CoC). This paper's contributions can be categorized into three main areas. Firstly, a unique method for collecting homeless data is introduced, utilizing a user-friendly smartphone app (currently available for Android). The app enables the general public to quickly record information about homeless individuals, including the number of people and details about their living conditions. The collected data, including date, time, and location, is anonymized and securely transmitted to the cloud. It is anticipated that an increasing number of users motivated to contribute to society will adopt the app, thus expanding the data collection efforts. Duplicate data is addressed through simple classification methods, and historical data is utilized to fill in missing information. The second contribution of this paper is the description of data analysis techniques applied to the collected data. By combining this new data with existing information, statistical regression analysis is employed to gain insights into various aspects, such as distinguishing between unsheltered and sheltered homeless populations, as well as examining their correlation with factors like unemployment rates, housing affordability, and labor demand. Initial data is collected in San Francisco, while pre-existing information is drawn from three cities: San Francisco, New York City, and Washington D.C., facilitating the conduction of simulations. The third contribution focuses on demonstrating the practical implications of the data processing results. The challenges faced by key stakeholders, including charitable organizations and local city governments, are taken into consideration. Two case studies are presented as examples. The first case study explores improving the efficiency of food and necessities distribution, as well as medical assistance, driven by charitable organizations. The second case study examines the correlation between micro-geographic budget expenditure by local city governments and homeless information to justify budget allocation and expenditures. The ultimate objective of this endeavor is to enable the continuous enhancement of the quality of life for the underprivileged. It is hoped that through increased crowdsourcing of data from the public, the Generosity Curve and the Need Curve will intersect, leading to a better world for all.

Keywords: crowdsourcing, homelessness, socio-economic policies, statistical analysis

Procedia PDF Downloads 32
25473 Genetic Variation of Autosomal STR Loci from Unrelated Individual in Iraq

Authors: H. Imad, Q. Cheah, J. Mohammad, O. Aamera

Abstract:

The aim of this study is twofold. One is to determine the genetic structure of Iraq population and the second objective of the study was to evaluate the importance of these loci for forensic genetic purposes. FTA® Technology (FTA™ paper DNA extraction) utilized to extract DNA. Twenty STR loci and Amelogenin including D3S1358, D13S317, PentaE, D16S539, D18S51, D2S1338, CSF1PO, Penta D, THO1, vWA, D21S11, D7S820, TPOX, D8S1179, FGA, D2S1338, D5S818, D6S1043, D12S391, D19S433, and Amelogenin amplified by using power plex21® kit. PCR products detected by genetic analyzer 3730xL then data analyzed by PowerStatsV1.2. Based on the allelic frequencies, several statistical parameters of genetic and forensic efficiency have been estimated. This includes the homozygosity and heterozygosity, effective number of alleles (n), the polymorphism information content (PIC), the power of discrimination (DP), and the power of exclusion (PE). The power of discrimination values for all tested loci was from 75% to 96% therefore, those loci can be safely used to establish a DNA-based database for Iraq population.

Keywords: autosomal STR, genetic variation, Middle and South of Iraq, statistical parameters

Procedia PDF Downloads 359
25472 Spatial Analysis of the Impact of City Developments Degradation of Green Space in Urban Fringe Eastern City of Yogyakarta Year 2005-2010

Authors: Pebri Nurhayati, Rozanah Ahlam Fadiyah

Abstract:

In the development of the city often use rural areas that can not be separated from the change in land use that lead to the degradation of urban green space in the city fringe. In the long run, the degradation of green open space this can impact on the decline of ecological, psychological and public health. Therefore, this research aims to (1) determine the relationship between the parameters of the degradation rate of urban development with green space, (2) develop a spatial model of the impact of urban development on the degradation of green open space with remote sensing techniques and Geographical Information Systems in an integrated manner. This research is a descriptive research with data collection techniques of observation and secondary data . In the data analysis, to interpret the direction of urban development and degradation of green open space is required in 2005-2010 ASTER image with NDVI. Of interpretation will generate two maps, namely maps and map development built land degradation green open space. Secondary data related to the rate of population growth, the level of accessibility, and the main activities of each city map is processed into a population growth rate, the level of accessibility maps, and map the main activities of the town. Each map is used as a parameter to map the degradation of green space and analyzed by non-parametric statistical analysis using Crosstab thus obtained value of C (coefficient contingency). C values were then compared with the Cmaximum to determine the relationship. From this research will be obtained in the form of modeling spatial map of the City Development Impact Degradation Green Space in Urban Fringe eastern city of Yogyakarta 2005-2010. In addition, this research also generate statistical analysis of the test results of each parameter to the degradation of green open space in the Urban Fringe eastern city of Yogyakarta 2005-2010.

Keywords: spatial analysis, urban development, degradation of green space, urban fringe

Procedia PDF Downloads 281
25471 Transport Related Air Pollution Modeling Using Artificial Neural Network

Authors: K. D. Sharma, M. Parida, S. S. Jain, Anju Saini, V. K. Katiyar

Abstract:

Air quality models form one of the most important components of an urban air quality management plan. Various statistical modeling techniques (regression, multiple regression and time series analysis) have been used to predict air pollution concentrations in the urban environment. These models calculate pollution concentrations due to observed traffic, meteorological and pollution data after an appropriate relationship has been obtained empirically between these parameters. Artificial neural network (ANN) is increasingly used as an alternative tool for modeling the pollutants from vehicular traffic particularly in urban areas. In the present paper, an attempt has been made to model traffic air pollution, specifically CO concentration using neural networks. In case of CO concentration, two scenarios were considered. First, with only classified traffic volume input and the second with both classified traffic volume and meteorological variables. The results showed that CO concentration can be predicted with good accuracy using artificial neural network (ANN).

Keywords: air quality management, artificial neural network, meteorological variables, statistical modeling

Procedia PDF Downloads 489
25470 Analyzing the Relationship between the Spatial Characteristics of Cultural Structure, Activities, and the Tourism Demand

Authors: Deniz Karagöz

Abstract:

This study is attempt to comprehend the relationship between the spatial characteristics of cultural structure, activities and the tourism demand in Turkey. The analysis divided into four parts. The first part consisted of a cultural structure and cultural activity (CSCA) index provided by principal component analysis. The analysis determined four distinct dimensions, namely, cultural activity/structure, accessing culture, consumption, and cultural management. The exploratory spatial data analysis employed to determine the spatial models of cultural structure and cultural activities in 81 provinces in Turkey. Global Moran I indices is used to ascertain the cultural activities and the structural clusters. Finally, the relationship between the cultural activities/cultural structure and tourism demand was analyzed. The raw/original data of the study official databases. The data on the cultural structure and activities gathered from the Turkish Statistical Institute and the data related to the tourism demand was provided by the Republic of Turkey Ministry of Culture and Tourism.

Keywords: cultural activities, cultural structure, spatial characteristics, tourism demand, Turkey

Procedia PDF Downloads 518
25469 Prediction of Marijuana Use among Iranian Early Youth: an Application of Integrative Model of Behavioral Prediction

Authors: Mehdi Mirzaei Alavijeh, Farzad Jalilian

Abstract:

Background: Marijuana is the most widely used illicit drug worldwide, especially among adolescents and young adults, which can cause numerous complications. The aim of this study was to determine the pattern, motivation use, and factors related to marijuana use among Iranian youths based on the integrative model of behavioral prediction Methods: A cross-sectional study was conducted among 174 youths marijuana user in Kermanshah County and Isfahan County, during summer 2014 which was selected with the convenience sampling for participation in this study. A self-reporting questionnaire was applied for collecting data. Data were analyzed by SPSS version 21 using bivariate correlations and linear regression statistical tests. Results: The mean marijuana use of respondents was 4.60 times at during week [95% CI: 4.06, 5.15]. Linear regression statistical showed, the structures of integrative model of behavioral prediction accounted for 36% of the variation in the outcome measure of the marijuana use at during week (R2 = 36% & P < 0.001); and among them attitude, marijuana refuse, and subjective norms were a stronger predictors. Conclusion: Comprehensive health education and prevention programs need to emphasize on cognitive factors that predict youth’s health-related behaviors. Based on our findings it seems, designing educational and behavioral intervention for reducing positive belief about marijuana, marijuana self-efficacy refuse promotion and reduce subjective norms encourage marijuana use has an effective potential to protect youths marijuana use.

Keywords: marijuana, youth, integrative model of behavioral prediction, Iran

Procedia PDF Downloads 530