Search results for: statistical data
25498 Direct Translation vs. Pivot Language Translation for Persian-Spanish Low-Resourced Statistical Machine Translation System
Authors: Benyamin Ahmadnia, Javier Serrano
Abstract:
In this paper we compare two different approaches for translating from Persian to Spanish, as a language pair with scarce parallel corpus. The first approach involves direct transfer using an statistical machine translation system, which is available for this language pair. The second approach involves translation through English, as a pivot language, which has more translation resources and more advanced translation systems available. The results show that, it is possible to achieve better translation quality using English as a pivot language in either approach outperforms direct translation from Persian to Spanish. Our best result is the pivot system which scores higher than direct translation by (1.12) BLEU points.Keywords: statistical machine translation, direct translation approach, pivot language translation approach, parallel corpus
Procedia PDF Downloads 45925497 Copula-Based Estimation of Direct and Indirect Effects in Path Analysis Model
Authors: Alam Ali, Ashok Kumar Pathak
Abstract:
Path analysis is a statistical technique used to evaluate the strength of the direct and indirect effects of variables. One or more structural regression equations are used to estimate a series of parameters in order to find the better fit of data. Sometimes, exogenous variables do not show a significant strength of their direct and indirect effect when the assumption of classical regression (ordinary least squares (OLS)) are violated by the nature of the data. The main motive of this article is to investigate the efficacy of the copula-based regression approach over the classical regression approach and calculate the direct and indirect effects of variables when data violates the OLS assumption and variables are linked through an elliptical copula. We perform this study using a well-organized numerical scheme. Finally, a real data application is also presented to demonstrate the performance of the superiority of the copula approach.Keywords: path analysis, copula-based regression models, direct and indirect effects, k-fold cross validation technique
Procedia PDF Downloads 4625496 Wind Power Forecast Error Simulation Model
Authors: Josip Vasilj, Petar Sarajcev, Damir Jakus
Abstract:
One of the major difficulties introduced with wind power penetration is the inherent uncertainty in production originating from uncertain wind conditions. This uncertainty impacts many different aspects of power system operation, especially the balancing power requirements. For this reason, in power system development planing, it is necessary to evaluate the potential uncertainty in future wind power generation. For this purpose, simulation models are required, reproducing the performance of wind power forecasts. This paper presents a wind power forecast error simulation models which are based on the stochastic process simulation. Proposed models capture the most important statistical parameters recognized in wind power forecast error time series. Furthermore, two distinct models are presented based on data availability. First model uses wind speed measurements on potential or existing wind power plant locations, while the seconds model uses statistical distribution of wind speeds.Keywords: wind power, uncertainty, stochastic process, Monte Carlo simulation
Procedia PDF Downloads 45125495 Predictive Analytics for Theory Building
Authors: Ho-Won Jung, Donghun Lee, Hyung-Jin Kim
Abstract:
Predictive analytics (data analysis) uses a subset of measurements (the features, predictor, or independent variable) to predict another measurement (the outcome, target, or dependent variable) on a single person or unit. It applies empirical methods in statistics, operations research, and machine learning to predict the future, or otherwise unknown events or outcome on a single or person or unit, based on patterns in data. Most analyses of metabolic syndrome are not predictive analytics but statistical explanatory studies that build a proposed model (theory building) and then validate metabolic syndrome predictors hypothesized (theory testing). A proposed theoretical model forms with causal hypotheses that specify how and why certain empirical phenomena occur. Predictive analytics and explanatory modeling have their own territories in analysis. However, predictive analytics can perform vital roles in explanatory studies, i.e., scientific activities such as theory building, theory testing, and relevance assessment. In the context, this study is to demonstrate how to use our predictive analytics to support theory building (i.e., hypothesis generation). For the purpose, this study utilized a big data predictive analytics platform TM based on a co-occurrence graph. The co-occurrence graph is depicted with nodes (e.g., items in a basket) and arcs (direct connections between two nodes), where items in a basket are fully connected. A cluster is a collection of fully connected items, where the specific group of items has co-occurred in several rows in a data set. Clusters can be ranked using importance metrics, such as node size (number of items), frequency, surprise (observed frequency vs. expected), among others. The size of a graph can be represented by the numbers of nodes and arcs. Since the size of a co-occurrence graph does not depend directly on the number of observations (transactions), huge amounts of transactions can be represented and processed efficiently. For a demonstration, a total of 13,254 metabolic syndrome training data is plugged into the analytics platform to generate rules (potential hypotheses). Each observation includes 31 predictors, for example, associated with sociodemographic, habits, and activities. Some are intentionally included to get predictive analytics insights on variable selection such as cancer examination, house type, and vaccination. The platform automatically generates plausible hypotheses (rules) without statistical modeling. Then the rules are validated with an external testing dataset including 4,090 observations. Results as a kind of inductive reasoning show potential hypotheses extracted as a set of association rules. Most statistical models generate just one estimated equation. On the other hand, a set of rules (many estimated equations from a statistical perspective) in this study may imply heterogeneity in a population (i.e., different subpopulations with unique features are aggregated). Next step of theory development, i.e., theory testing, statistically tests whether a proposed theoretical model is a plausible explanation of a phenomenon interested in. If hypotheses generated are tested statistically with several thousand observations, most of the variables will become significant as the p-values approach zero. Thus, theory validation needs statistical methods utilizing a part of observations such as bootstrap resampling with an appropriate sample size.Keywords: explanatory modeling, metabolic syndrome, predictive analytics, theory building
Procedia PDF Downloads 24425494 Comparative Study to Evaluate Chronological Age and Dental Age in North Indian Population Using Cameriere Method
Authors: Ranjitkumar Patil
Abstract:
Age estimation has its importance in forensic dentistry. Dental age estimation has emerged as an alternative to skeletal age determination. The methods based on stages of tooth formation, as appreciated on radiographs, seems to be more appropriate in the assessment of age than those based on skeletal development. The study was done to evaluate dental age in north Indian population using Cameriere’s method. Aims/Objectives: The study was conducted to assess the dental age of North Indian children using Cameriere’smethodand to compare the chronological age and dental age for validation of the Cameriere’smethod in the north Indian population. A comparative study of 02 year duration on the OPG (using PLANMECA Promax 3D) data of 497 individuals with age ranging from 5 to 15 years was done based on simple random technique ethical approval obtained from the institutional ethical committee. The data was obtained based on inclusion and exclusion criteria was analyzed by a software for dental age estimation. Statistical analysis: Student’s t test was used to compare the morphological variables of males with those of females and to compare observed age with estimated age. Regression formula was also calculated. Results: Present study was a comparative study of 497 subjects with a distribution between male and female, with their dental age assessed by using Panoramic radiograph, following the method described by Cameriere, which is widely accepted. Statistical analysis in our study indicated that gender does not have a significant influence on age estimation. (R2= 0.787). Conclusion: This infers that cameriere’s method can be effectively applied in north Indianpopulation.Keywords: Forensic, Chronological Age, Dental Age, Skeletal Age
Procedia PDF Downloads 6425493 The Influence of Housing Choice Vouchers on the Private Rental Market
Authors: Randy D. Colon
Abstract:
Through a freedom of information request, data pertaining to Housing Choice Voucher (HCV) households has been obtained from the Chicago Housing Authority, including rent price and number of bedrooms per HCV household, community area, and zip code from 2013 to the first quarter of 2018. Similar data pertaining to the private rental market will be obtained through public records found through the United States Department of Housing and Urban Development. The datasets will be analyzed through statistical and mapping software to investigate the potential link between HCV households and distorted rent prices. Quantitative data will be supplemented by qualitative data to investigate the lived experience of Chicago residents. Qualitative data will be collected at community meetings in the Chicago Englewood neighborhood through participation in neighborhood meetings and informal interviews with residents and community leaders. The qualitative data will be used to gain insight on the lived experience of community leaders and residents of the Englewood neighborhood in relation to housing, the rental market, and HCV. While there is an abundance of quantitative data on this subject, this qualitative data is necessary to capture the lived experience of local residents effected by a changing rental market. This topic reflects concerns voiced by members of the Englewood community, and this study aims to keep the community relevant in its findings.Keywords: Chicago, housing, housing choice voucher program, housing subsidies, rental market
Procedia PDF Downloads 8325492 Materialized View Effect on Query Performance
Authors: Yusuf Ziya Ayık, Ferhat Kahveci
Abstract:
Currently, database management systems have various tools such as backup and maintenance, and also provide statistical information such as resource usage and security. In terms of query performance, this paper covers query optimization, views, indexed tables, pre-computation materialized view, query performance analysis in which query plan alternatives can be created and the least costly one selected to optimize a query. Indexes and views can be created for related table columns. The literature review of this study showed that, in the course of time, despite the growing capabilities of the database management system, only database administrators are aware of the need for dealing with archival and transactional data types differently. These data may be constantly changing data used in everyday life, and also may be from the completed questionnaire whose data input was completed. For both types of data, the database uses its capabilities; but as shown in the findings section, instead of repeating similar heavy calculations which are carrying out same results with the same query over a survey results, using materialized view results can be in a more simple way. In this study, this performance difference was observed quantitatively considering the cost of the query.Keywords: cost of query, database management systems, materialized view, query performance
Procedia PDF Downloads 25025491 Anxiety and Depression in Caregivers of Autistic Children
Authors: Mou Juliet Rebeiro, S. M. Abul Kalam Azad
Abstract:
This study was carried out to see the anxiety and depression in caregivers of autistic children. The objectives of the research were to assess depression and anxiety among caregivers of autistic children and to find out the experience of caregivers. For this purpose, the research was conducted on a sample of 39 caregivers of autistic children. Participants were taken from a special school. To collect data for this study each of the caregivers were administered questionnaire comprising scales to measure anxiety and depression and some responses of the participants were taken through interview based on a topic guide. Obtained quantitative data were analyzed by using statistical analysis and qualitative data were analyzed according to themes. Mean of the anxiety score (55.85) and depression score (108.33) is above the cutoff point. Results showed that anxiety and depression is clinically present in caregivers of autistic children. Most of the caregivers experienced behavior, emotional, cognitive and social problems of their child that is linked with anxiety and depression.Keywords: anxiety, autism, caregiver, depression
Procedia PDF Downloads 26725490 Automatic Tuning for a Systemic Model of Banking Originated Losses (SYMBOL) Tool on Multicore
Authors: Ronal Muresano, Andrea Pagano
Abstract:
Nowadays, the mathematical/statistical applications are developed with more complexity and accuracy. However, these precisions and complexities have brought as result that applications need more computational power in order to be executed faster. In this sense, the multicore environments are playing an important role to improve and to optimize the execution time of these applications. These environments allow us the inclusion of more parallelism inside the node. However, to take advantage of this parallelism is not an easy task, because we have to deal with some problems such as: cores communications, data locality, memory sizes (cache and RAM), synchronizations, data dependencies on the model, etc. These issues are becoming more important when we wish to improve the application’s performance and scalability. Hence, this paper describes an optimization method developed for Systemic Model of Banking Originated Losses (SYMBOL) tool developed by the European Commission, which is based on analyzing the application's weakness in order to exploit the advantages of the multicore. All these improvements are done in an automatic and transparent manner with the aim of improving the performance metrics of our tool. Finally, experimental evaluations show the effectiveness of our new optimized version, in which we have achieved a considerable improvement on the execution time. The time has been reduced around 96% for the best case tested, between the original serial version and the automatic parallel version.Keywords: algorithm optimization, bank failures, OpenMP, parallel techniques, statistical tool
Procedia PDF Downloads 34425489 Time of Week Intensity Estimation from Interval Censored Data with Application to Police Patrol Planning
Authors: Jiahao Tian, Michael D. Porter
Abstract:
Law enforcement agencies are tasked with crime prevention and crime reduction under limited resources. Having an accurate temporal estimate of the crime rate would be valuable to achieve such a goal. However, estimation is usually complicated by the interval-censored nature of crime data. We cast the problem of intensity estimation as a Poisson regression using an EM algorithm to estimate the parameters. Two special penalties are added that provide smoothness over the time of day and day of the week. This approach presented here provides accurate intensity estimates and can also uncover day-of-week clusters that share the same intensity patterns. Anticipating where and when crimes might occur is a key element to successful policing strategies. However, this task is complicated by the presence of interval-censored data. The censored data refers to the type of data that the event time is only known to lie within an interval instead of being observed exactly. This type of data is prevailing in the field of criminology because of the absence of victims for certain types of crime. Despite its importance, the research in temporal analysis of crime has lagged behind the spatial component. Inspired by the success of solving crime-related problems with a statistical approach, we propose a statistical model for the temporal intensity estimation of crime with censored data. The model is built on Poisson regression and has special penalty terms added to the likelihood. An EM algorithm was derived to obtain maximum likelihood estimates, and the resulting model shows superior performance to the competing model. Our research is in line with the smart policing initiative (SPI) proposed by the Bureau Justice of Assistance (BJA) as an effort to support law enforcement agencies in building evidence-based, data-driven law enforcement tactics. The goal is to identify strategic approaches that are effective in crime prevention and reduction. In our case, we allow agencies to deploy their resources for a relatively short period of time to achieve the maximum level of crime reduction. By analyzing a particular area within cities where data are available, our proposed approach could not only provide an accurate estimate of intensities for the time unit considered but a time-variation crime incidence pattern. Both will be helpful in the allocation of limited resources by either improving the existing patrol plan with the understanding of the discovery of the day of week cluster or supporting extra resources available.Keywords: cluster detection, EM algorithm, interval censoring, intensity estimation
Procedia PDF Downloads 4125488 Different Data-Driven Bivariate Statistical Approaches to Landslide Susceptibility Mapping (Uzundere, Erzurum, Turkey)
Authors: Azimollah Aleshzadeh, Enver Vural Yavuz
Abstract:
The main goal of this study is to produce landslide susceptibility maps using different data-driven bivariate statistical approaches; namely, entropy weight method (EWM), evidence belief function (EBF), and information content model (ICM), at Uzundere county, Erzurum province, in the north-eastern part of Turkey. Past landslide occurrences were identified and mapped from an interpretation of high-resolution satellite images, and earlier reports as well as by carrying out field surveys. In total, 42 landslide incidence polygons were mapped using ArcGIS 10.4.1 software and randomly split into a construction dataset 70 % (30 landslide incidences) for building the EWM, EBF, and ICM models and the remaining 30 % (12 landslides incidences) were used for verification purposes. Twelve layers of landslide-predisposing parameters were prepared, including total surface radiation, maximum relief, soil groups, standard curvature, distance to stream/river sites, distance to the road network, surface roughness, land use pattern, engineering geological rock group, topographical elevation, the orientation of slope, and terrain slope gradient. The relationships between the landslide-predisposing parameters and the landslide inventory map were determined using different statistical models (EWM, EBF, and ICM). The model results were validated with landslide incidences, which were not used during the model construction. In addition, receiver operating characteristic curves were applied, and the area under the curve (AUC) was determined for the different susceptibility maps using the success (construction data) and prediction (verification data) rate curves. The results revealed that the AUC for success rates are 0.7055, 0.7221, and 0.7368, while the prediction rates are 0.6811, 0.6997, and 0.7105 for EWM, EBF, and ICM models, respectively. Consequently, landslide susceptibility maps were classified into five susceptibility classes, including very low, low, moderate, high, and very high. Additionally, the portion of construction and verification landslides incidences in high and very high landslide susceptibility classes in each map was determined. The results showed that the EWM, EBF, and ICM models produced satisfactory accuracy. The obtained landslide susceptibility maps may be useful for future natural hazard mitigation studies and planning purposes for environmental protection.Keywords: entropy weight method, evidence belief function, information content model, landslide susceptibility mapping
Procedia PDF Downloads 10425487 Imputation of Incomplete Large-Scale Monitoring Count Data via Penalized Estimation
Authors: Mohamed Dakki, Genevieve Robin, Marie Suet, Abdeljebbar Qninba, Mohamed A. El Agbani, Asmâa Ouassou, Rhimou El Hamoumi, Hichem Azafzaf, Sami Rebah, Claudia Feltrup-Azafzaf, Nafouel Hamouda, Wed a.L. Ibrahim, Hosni H. Asran, Amr A. Elhady, Haitham Ibrahim, Khaled Etayeb, Essam Bouras, Almokhtar Saied, Ashrof Glidan, Bakar M. Habib, Mohamed S. Sayoud, Nadjiba Bendjedda, Laura Dami, Clemence Deschamps, Elie Gaget, Jean-Yves Mondain-Monval, Pierre Defos Du Rau
Abstract:
In biodiversity monitoring, large datasets are becoming more and more widely available and are increasingly used globally to estimate species trends and con- servation status. These large-scale datasets challenge existing statistical analysis methods, many of which are not adapted to their size, incompleteness and heterogeneity. The development of scalable methods to impute missing data in incomplete large-scale monitoring datasets is crucial to balance sampling in time or space and thus better inform conservation policies. We developed a new method based on penalized Poisson models to impute and analyse incomplete monitoring data in a large-scale framework. The method al- lows parameterization of (a) space and time factors, (b) the main effects of predic- tor covariates, as well as (c) space–time interactions. It also benefits from robust statistical and computational capability in large-scale settings. The method was tested extensively on both simulated and real-life waterbird data, with the findings revealing that it outperforms six existing methods in terms of missing data imputation errors. Applying the method to 16 waterbird species, we estimated their long-term trends for the first time at the entire North African scale, a region where monitoring data suffer from many gaps in space and time series. This new approach opens promising perspectives to increase the accuracy of species-abundance trend estimations. We made it freely available in the r package ‘lori’ (https://CRAN.R-project.org/package=lori) and recommend its use for large- scale count data, particularly in citizen science monitoring programmes.Keywords: biodiversity monitoring, high-dimensional statistics, incomplete count data, missing data imputation, waterbird trends in North-Africa
Procedia PDF Downloads 11725486 Processing Big Data: An Approach Using Feature Selection
Authors: Nikat Parveen, M. Ananthi
Abstract:
Big data is one of the emerging technology, which collects the data from various sensors and those data will be used in many fields. Data retrieval is one of the major issue where there is a need to extract the exact data as per the need. In this paper, large amount of data set is processed by using the feature selection. Feature selection helps to choose the data which are actually needed to process and execute the task. The key value is the one which helps to point out exact data available in the storage space. Here the available data is streamed and R-Center is proposed to achieve this task.Keywords: big data, key value, feature selection, retrieval, performance
Procedia PDF Downloads 30825485 A Comparative Study to Evaluate Chronological Age and Dental Age in the North Indian Population Using Cameriere's Method
Authors: Ranjitkumar Patil
Abstract:
Age estimation has importance in forensic dentistry. Dental age estimation has emerged as an alternative to skeletal age determination. The methods based on stages of tooth formation, as appreciated on radiographs, seem to be more appropriate in the assessment of age than those based on skeletal development. The study was done to evaluate dental age in the north Indian population using Cameriere’s method. Aims/Objectives: The study was conducted to assess the dental age of North Indian children using Cameriere’s method and to compare the chronological age and dental age for validation of the Cameriere’s method in the north Indian population. A comparative study of 02-year duration on the OPG (using PLANMECA Promax 3D) data of 497 individuals with ages ranging from 5 to 15 years was done based on simple random technique ethical approval obtained from institutional ethical committee. The data was obtained based on inclusion and exclusion criteria and was analyzed by software for dental age estimation. Statistical analysis: The student’s t-test was used to compare the morphological variables of males with those of females and to compare observed age with estimated age. The regression formula was also calculated. Results: Present study was a comparative study of 497 subjects with a distribution between males and females, with their dental age assessed by using a Panoramic radiograph, following the method described by Cameriere, which is widely accepted. Statistical analysis in our study indicated that gender does not have a significant influence on age estimation. (R2= 0.787). Conclusion: This infers that Cameriere’s method can be effectively applied to the north Indian population.Keywords: forensic, dental age, skeletal age, chronological age, Cameriere’s method
Procedia PDF Downloads 9325484 Statistical Analysis of Rainfall Change over the Blue Nile Basin
Authors: Hany Mustafa, Mahmoud Roushdi, Khaled Kheireldin
Abstract:
Rainfall variability is an important feature of semi-arid climates. Climate change is very likely to increase the frequency, magnitude, and variability of extreme weather events such as droughts, floods, and storms. The Blue Nile Basin is facing extreme climate change-related events such as floods and droughts and its possible impacts on ecosystem, livelihood, agriculture, livestock, and biodiversity are expected. Rainfall variability is a threat to food production in the Blue Nile Basin countries. This study investigates the long-term variations and trends of seasonal and annual precipitation over the Blue Nile Basin for 102-year period (1901-2002). Six statistical trend analysis of precipitation was performed with nonparametric Mann-Kendall test and Sen's slope estimator. On the other hands, four statistical absolute homogeneity tests: Standard Normal Homogeneity Test, Buishand Range test, Pettitt test and the Von Neumann ratio test were applied to test the homogeneity of the rainfall data, using XLSTAT software, which results of p-valueless than alpha=0.05, were significant. The percentages of significant trends obtained for each parameter in the different seasons are presented. The study recommends adaptation strategies to be streamlined to relevant policies, enhancing local farmers’ adaptive capacity for facing future climate change effects.Keywords: Blue Nile basin, climate change, Mann-Kendall test, trend analysis
Procedia PDF Downloads 50325483 Radar Signal Detection Using Neural Networks in Log-Normal Clutter for Multiple Targets Situations
Authors: Boudemagh Naime
Abstract:
Automatic radar detection requires some methods of adapting to variations in the background clutter in order to control their false alarm rate. The problem becomes more complicated in non-Gaussian environment. In fact, the conventional approach in real time applications requires a complex statistical modeling and much computational operations. To overcome these constraints, we propose another approach based on artificial neural network (ANN-CMLD-CFAR) using a Back Propagation (BP) training algorithm. The considered environment follows a log-normal distribution in the presence of multiple Rayleigh-targets. To evaluate the performances of the considered detector, several situations, such as scale parameter and the number of interferes targets, have been investigated. The simulation results show that the ANN-CMLD-CFAR processor outperforms the conventional statistical one.Keywords: radat detection, ANN-CMLD-CFAR, log-normal clutter, statistical modelling
Procedia PDF Downloads 33425482 Geostatistical and Geochemical Study of the Aquifer System Waters Complex Terminal in the Valley of Oued Righ-Arid Area Algeria
Authors: Asma Bettahar, Imed Eddine Nezli, Sameh Habes
Abstract:
Groundwater resources in the Oued Righ valley are represented like the parts of the eastern basin of the Algerian Sahara, superposed by two major aquifers: the Intercalary Continental (IC) and the Terminal Complex (TC). From a qualitative point of view, various studies have highlighted that the waters of this region showed excessive mineralization, including the waters of the terminal complex (EC Avg equal 5854.61 S/cm) .The present article is a statistical approach by two multi methods various complementary (ACP, CAH), applied to the analytical data of multilayered aquifer waters Terminal Complex of the Oued Righ valley. The approach is to establish a correlation between the chemical composition of water and the lithological nature of different aquifer levels formations, and predict possible connection between groundwater’s layers. The results show that the mineralization of water is from geological origin. They concern the composition of the layers that make up the complex terminal.Keywords: complex terminal, mineralization, oued righ, statistical approach
Procedia PDF Downloads 35925481 Statistical Randomness Testing of Some Second Round Candidate Algorithms of CAESAR Competition
Authors: Fatih Sulak, Betül A. Özdemir, Beyza Bozdemir
Abstract:
In order to improve symmetric key research, several competitions had been arranged by organizations like National Institute of Standards and Technology (NIST) and International Association for Cryptologic Research (IACR). In recent years, the importance of authenticated encryption has rapidly increased because of the necessity of simultaneously enabling integrity, confidentiality and authenticity. Therefore, at January 2013, IACR announced the Competition for Authenticated Encryption: Security, Applicability, and Robustness (CAESAR Competition) which will select secure and efficient algorithms for authenticated encryption. Cryptographic algorithms are anticipated to behave like random mappings; hence, it is important to apply statistical randomness tests to the outputs of the algorithms. In this work, the statistical randomness tests in the NIST Test Suite and the other recently designed randomness tests are applied to six second round algorithms of the CAESAR Competition. It is observed that AEGIS achieves randomness after 3 rounds, Ascon permutation function achieves randomness after 1 round, Joltik encryption function achieves randomness after 9 rounds, Morus state update function achieves randomness after 3 rounds, Pi-cipher achieves randomness after 1 round, and Tiaoxin achieves randomness after 1 round.Keywords: authenticated encryption, CAESAR competition, NIST test suite, statistical randomness tests
Procedia PDF Downloads 29325480 Statistical Models and Time Series Forecasting on Crime Data in Nepal
Authors: Dila Ram Bhandari
Abstract:
Throughout the 20th century, new governments were created where identities such as ethnic, religious, linguistic, caste, communal, tribal, and others played a part in the development of constitutions and the legal system of victim and criminal justice. Acute issues with extremism, poverty, environmental degradation, cybercrimes, human rights violations, crime against, and victimization of both individuals and groups have recently plagued South Asian nations. Everyday massive number of crimes are steadfast, these frequent crimes have made the lives of common citizens restless. Crimes are one of the major threats to society and also for civilization. Crime is a bone of contention that can create a societal disturbance. The old-style crime solving practices are unable to live up to the requirement of existing crime situations. Crime analysis is one of the most important activities of the majority of intelligent and law enforcement organizations all over the world. The South Asia region lacks such a regional coordination mechanism, unlike central Asia of Asia Pacific regions, to facilitate criminal intelligence sharing and operational coordination related to organized crime, including illicit drug trafficking and money laundering. There have been numerous conversations in recent years about using data mining technology to combat crime and terrorism. The Data Detective program from Sentient as a software company, uses data mining techniques to support the police (Sentient, 2017). The goals of this internship are to test out several predictive model solutions and choose the most effective and promising one. First, extensive literature reviews on data mining, crime analysis, and crime data mining were conducted. Sentient offered a 7-year archive of crime statistics that were daily aggregated to produce a univariate dataset. Moreover, a daily incidence type aggregation was performed to produce a multivariate dataset. Each solution's forecast period lasted seven days. Statistical models and neural network models were the two main groups into which the experiments were split. For the crime data, neural networks fared better than statistical models. This study gives a general review of the applied statistics and neural network models. A detailed image of each model's performance on the available data and generalizability is provided by a comparative analysis of all the models on a comparable dataset. Obviously, the studies demonstrated that, in comparison to other models, Gated Recurrent Units (GRU) produced greater prediction. The crime records of 2005-2019 which was collected from Nepal Police headquarter and analysed by R programming. In conclusion, gated recurrent unit implementation could give benefit to police in predicting crime. Hence, time series analysis using GRU could be a prospective additional feature in Data Detective.Keywords: time series analysis, forecasting, ARIMA, machine learning
Procedia PDF Downloads 13125479 Develop a Conceptual Data Model of Geotechnical Risk Assessment in Underground Coal Mining Using a Cloud-Based Machine Learning Platform
Authors: Reza Mohammadzadeh
Abstract:
The major challenges in geotechnical engineering in underground spaces arise from uncertainties and different probabilities. The collection, collation, and collaboration of existing data to incorporate them in analysis and design for given prospect evaluation would be a reliable, practical problem solving method under uncertainty. Machine learning (ML) is a subfield of artificial intelligence in statistical science which applies different techniques (e.g., Regression, neural networks, support vector machines, decision trees, random forests, genetic programming, etc.) on data to automatically learn and improve from them without being explicitly programmed and make decisions and predictions. In this paper, a conceptual database schema of geotechnical risks in underground coal mining based on a cloud system architecture has been designed. A new approach of risk assessment using a three-dimensional risk matrix supported by the level of knowledge (LoK) has been proposed in this model. Subsequently, the model workflow methodology stages have been described. In order to train data and LoK models deployment, an ML platform has been implemented. IBM Watson Studio, as a leading data science tool and data-driven cloud integration ML platform, is employed in this study. As a Use case, a data set of geotechnical hazards and risk assessment in underground coal mining were prepared to demonstrate the performance of the model, and accordingly, the results have been outlined.Keywords: data model, geotechnical risks, machine learning, underground coal mining
Procedia PDF Downloads 24225478 Degumming of Eri Silk Fabric with Ionic Liquid
Authors: Shweta K. Vyas, Rakesh Musale, Sanjeev R. Shukla
Abstract:
Eri silk is a non mulberry silk which is obtained without killing the silkworms and hence it is also known as Ahmisa silk. In the present study, the results on degumming of eri silk with alkaline peroxide have been compared with those obtained by using ionic liquid (IL) 1-Butyl-3-methylimidazolium chloride [BMIM]Cl. Experiments were designed to find out the optimum processing parameters for degumming of eri silk by response surface methodology. The statistical software, Design-Expert 6.0 was used for regression analysis and graphical analysis of the responses obtained by running the set of designed experiments. Analysis of variance (ANOVA) was used to estimate the statistical parameters. The polynomial equation of quadratic order was employed to fit the experimental data. The quality and model terms were evaluated by F-test. Three dimensional surface plots were prepared to study the effect of variables on different responses. The optimum conditions for IL treatment were selected from predicted combinations and the experiments were repeated under these conditions to determine the reproducibility.Keywords: silk degumming, ionic liquid, response surface methodology, ANOVA
Procedia PDF Downloads 56025477 The Beta-Fisher Snedecor Distribution with Applications to Cancer Remission Data
Authors: K. A. Adepoju, O. I. Shittu, A. U. Chukwu
Abstract:
In this paper, a new four-parameter generalized version of the Fisher Snedecor distribution called Beta- F distribution is introduced. The comprehensive account of the statistical properties of the new distributions was considered. Formal expressions for the cumulative density function, moments, moment generating function and maximum likelihood estimation, as well as its Fisher information, were obtained. The flexibility of this distribution as well as its robustness using cancer remission time data was demonstrated. The new distribution can be used in most applications where the assumption underlying the use of other lifetime distributions is violated.Keywords: fisher-snedecor distribution, beta-f distribution, outlier, maximum likelihood method
Procedia PDF Downloads 31425476 Statistical Characteristics of Distribution of Radiation-Induced Defects under Random Generation
Authors: P. Selyshchev
Abstract:
We consider fluctuations of defects density taking into account their interaction. Stochastic field of displacement generation rate gives random defect distribution. We determinate statistical characteristics (mean and dispersion) of random field of point defect distribution as function of defect generation parameters, temperature and properties of irradiated crystal.Keywords: irradiation, primary defects, interaction, fluctuations
Procedia PDF Downloads 29925475 Advances in Artificial intelligence Using Speech Recognition
Authors: Khaled M. Alhawiti
Abstract:
This research study aims to present a retrospective study about speech recognition systems and artificial intelligence. Speech recognition has become one of the widely used technologies, as it offers great opportunity to interact and communicate with automated machines. Precisely, it can be affirmed that speech recognition facilitates its users and helps them to perform their daily routine tasks, in a more convenient and effective manner. This research intends to present the illustration of recent technological advancements, which are associated with artificial intelligence. Recent researches have revealed the fact that speech recognition is found to be the utmost issue, which affects the decoding of speech. In order to overcome these issues, different statistical models were developed by the researchers. Some of the most prominent statistical models include acoustic model (AM), language model (LM), lexicon model, and hidden Markov models (HMM). The research will help in understanding all of these statistical models of speech recognition. Researchers have also formulated different decoding methods, which are being utilized for realistic decoding tasks and constrained artificial languages. These decoding methods include pattern recognition, acoustic phonetic, and artificial intelligence. It has been recognized that artificial intelligence is the most efficient and reliable methods, which are being used in speech recognition.Keywords: speech recognition, acoustic phonetic, artificial intelligence, hidden markov models (HMM), statistical models of speech recognition, human machine performance
Procedia PDF Downloads 44525474 A Crowdsourced Homeless Data Collection System and Its Econometric Analysis
Authors: Praniil Nagaraj
Abstract:
This paper proposes a method to collect homeless data using crowdsourcing and presents an approach to analyze the data, demonstrating its potential to strengthen existing and future policies aimed at promoting socio-economic equilibrium. The 2022 Annual Homeless Assessment Report (AHAR) to Congress highlighted alarming statistics, emphasizing the need for effective decision-making and budget allocation within local planning bodies known as Continuums of Care (CoC). This paper's contributions can be categorized into three main areas. Firstly, a unique method for collecting homeless data is introduced, utilizing a user-friendly smartphone app (currently available for Android). The app enables the general public to quickly record information about homeless individuals, including the number of people and details about their living conditions. The collected data, including date, time, and location, is anonymized and securely transmitted to the cloud. It is anticipated that an increasing number of users motivated to contribute to society will adopt the app, thus expanding the data collection efforts. Duplicate data is addressed through simple classification methods, and historical data is utilized to fill in missing information. The second contribution of this paper is the description of data analysis techniques applied to the collected data. By combining this new data with existing information, statistical regression analysis is employed to gain insights into various aspects, such as distinguishing between unsheltered and sheltered homeless populations, as well as examining their correlation with factors like unemployment rates, housing affordability, and labor demand. Initial data is collected in San Francisco, while pre-existing information is drawn from three cities: San Francisco, New York City, and Washington D.C., facilitating the conduction of simulations. The third contribution focuses on demonstrating the practical implications of the data processing results. The challenges faced by key stakeholders, including charitable organizations and local city governments, are taken into consideration. Two case studies are presented as examples. The first case study explores improving the efficiency of food and necessities distribution, as well as medical assistance, driven by charitable organizations. The second case study examines the correlation between micro-geographic budget expenditure by local city governments and homeless information to justify budget allocation and expenditures. The ultimate objective of this endeavor is to enable the continuous enhancement of the quality of life for the underprivileged. It is hoped that through increased crowdsourcing of data from the public, the Generosity Curve and the Need Curve will intersect, leading to a better world for all.Keywords: crowdsourcing, homelessness, socio-economic policies, statistical analysis
Procedia PDF Downloads 3225473 Genetic Variation of Autosomal STR Loci from Unrelated Individual in Iraq
Authors: H. Imad, Q. Cheah, J. Mohammad, O. Aamera
Abstract:
The aim of this study is twofold. One is to determine the genetic structure of Iraq population and the second objective of the study was to evaluate the importance of these loci for forensic genetic purposes. FTA® Technology (FTA™ paper DNA extraction) utilized to extract DNA. Twenty STR loci and Amelogenin including D3S1358, D13S317, PentaE, D16S539, D18S51, D2S1338, CSF1PO, Penta D, THO1, vWA, D21S11, D7S820, TPOX, D8S1179, FGA, D2S1338, D5S818, D6S1043, D12S391, D19S433, and Amelogenin amplified by using power plex21® kit. PCR products detected by genetic analyzer 3730xL then data analyzed by PowerStatsV1.2. Based on the allelic frequencies, several statistical parameters of genetic and forensic efficiency have been estimated. This includes the homozygosity and heterozygosity, effective number of alleles (n), the polymorphism information content (PIC), the power of discrimination (DP), and the power of exclusion (PE). The power of discrimination values for all tested loci was from 75% to 96% therefore, those loci can be safely used to establish a DNA-based database for Iraq population.Keywords: autosomal STR, genetic variation, Middle and South of Iraq, statistical parameters
Procedia PDF Downloads 35925472 Spatial Analysis of the Impact of City Developments Degradation of Green Space in Urban Fringe Eastern City of Yogyakarta Year 2005-2010
Authors: Pebri Nurhayati, Rozanah Ahlam Fadiyah
Abstract:
In the development of the city often use rural areas that can not be separated from the change in land use that lead to the degradation of urban green space in the city fringe. In the long run, the degradation of green open space this can impact on the decline of ecological, psychological and public health. Therefore, this research aims to (1) determine the relationship between the parameters of the degradation rate of urban development with green space, (2) develop a spatial model of the impact of urban development on the degradation of green open space with remote sensing techniques and Geographical Information Systems in an integrated manner. This research is a descriptive research with data collection techniques of observation and secondary data . In the data analysis, to interpret the direction of urban development and degradation of green open space is required in 2005-2010 ASTER image with NDVI. Of interpretation will generate two maps, namely maps and map development built land degradation green open space. Secondary data related to the rate of population growth, the level of accessibility, and the main activities of each city map is processed into a population growth rate, the level of accessibility maps, and map the main activities of the town. Each map is used as a parameter to map the degradation of green space and analyzed by non-parametric statistical analysis using Crosstab thus obtained value of C (coefficient contingency). C values were then compared with the Cmaximum to determine the relationship. From this research will be obtained in the form of modeling spatial map of the City Development Impact Degradation Green Space in Urban Fringe eastern city of Yogyakarta 2005-2010. In addition, this research also generate statistical analysis of the test results of each parameter to the degradation of green open space in the Urban Fringe eastern city of Yogyakarta 2005-2010.Keywords: spatial analysis, urban development, degradation of green space, urban fringe
Procedia PDF Downloads 28125471 Transport Related Air Pollution Modeling Using Artificial Neural Network
Authors: K. D. Sharma, M. Parida, S. S. Jain, Anju Saini, V. K. Katiyar
Abstract:
Air quality models form one of the most important components of an urban air quality management plan. Various statistical modeling techniques (regression, multiple regression and time series analysis) have been used to predict air pollution concentrations in the urban environment. These models calculate pollution concentrations due to observed traffic, meteorological and pollution data after an appropriate relationship has been obtained empirically between these parameters. Artificial neural network (ANN) is increasingly used as an alternative tool for modeling the pollutants from vehicular traffic particularly in urban areas. In the present paper, an attempt has been made to model traffic air pollution, specifically CO concentration using neural networks. In case of CO concentration, two scenarios were considered. First, with only classified traffic volume input and the second with both classified traffic volume and meteorological variables. The results showed that CO concentration can be predicted with good accuracy using artificial neural network (ANN).Keywords: air quality management, artificial neural network, meteorological variables, statistical modeling
Procedia PDF Downloads 48925470 Analyzing the Relationship between the Spatial Characteristics of Cultural Structure, Activities, and the Tourism Demand
Authors: Deniz Karagöz
Abstract:
This study is attempt to comprehend the relationship between the spatial characteristics of cultural structure, activities and the tourism demand in Turkey. The analysis divided into four parts. The first part consisted of a cultural structure and cultural activity (CSCA) index provided by principal component analysis. The analysis determined four distinct dimensions, namely, cultural activity/structure, accessing culture, consumption, and cultural management. The exploratory spatial data analysis employed to determine the spatial models of cultural structure and cultural activities in 81 provinces in Turkey. Global Moran I indices is used to ascertain the cultural activities and the structural clusters. Finally, the relationship between the cultural activities/cultural structure and tourism demand was analyzed. The raw/original data of the study official databases. The data on the cultural structure and activities gathered from the Turkish Statistical Institute and the data related to the tourism demand was provided by the Republic of Turkey Ministry of Culture and Tourism.Keywords: cultural activities, cultural structure, spatial characteristics, tourism demand, Turkey
Procedia PDF Downloads 51825469 Prediction of Marijuana Use among Iranian Early Youth: an Application of Integrative Model of Behavioral Prediction
Authors: Mehdi Mirzaei Alavijeh, Farzad Jalilian
Abstract:
Background: Marijuana is the most widely used illicit drug worldwide, especially among adolescents and young adults, which can cause numerous complications. The aim of this study was to determine the pattern, motivation use, and factors related to marijuana use among Iranian youths based on the integrative model of behavioral prediction Methods: A cross-sectional study was conducted among 174 youths marijuana user in Kermanshah County and Isfahan County, during summer 2014 which was selected with the convenience sampling for participation in this study. A self-reporting questionnaire was applied for collecting data. Data were analyzed by SPSS version 21 using bivariate correlations and linear regression statistical tests. Results: The mean marijuana use of respondents was 4.60 times at during week [95% CI: 4.06, 5.15]. Linear regression statistical showed, the structures of integrative model of behavioral prediction accounted for 36% of the variation in the outcome measure of the marijuana use at during week (R2 = 36% & P < 0.001); and among them attitude, marijuana refuse, and subjective norms were a stronger predictors. Conclusion: Comprehensive health education and prevention programs need to emphasize on cognitive factors that predict youth’s health-related behaviors. Based on our findings it seems, designing educational and behavioral intervention for reducing positive belief about marijuana, marijuana self-efficacy refuse promotion and reduce subjective norms encourage marijuana use has an effective potential to protect youths marijuana use.Keywords: marijuana, youth, integrative model of behavioral prediction, Iran
Procedia PDF Downloads 530