Search results for: classification of big data actors
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 8153

Search results for: classification of big data actors

7343 Main Control Factors of Fluid Loss in Drilling and Completion in Shunbei Oilfield by Unmanned Intervention Algorithm

Authors: Peng Zhang, Lihui Zheng, Xiangchun Wang, Xiaopan Kou

Abstract:

Quantitative research on the main control factors of lost circulation has few considerations and single data source. Using Unmanned Intervention Algorithm to find the main control factors of lost circulation adopts all measurable parameters. The degree of lost circulation is characterized by the loss rate as the objective function. Geological, engineering and fluid data are used as layers, and 27 factors such as wellhead coordinates and Weight on Bit (WOB) used as dimensions. Data classification is implemented to determine function independent variables. The mathematical equation of loss rate and 27 influencing factors is established by multiple regression method, and the undetermined coefficient method is used to solve the undetermined coefficient of the equation. Only three factors in t-test are greater than the test value 40, and the F-test value is 96.557%, indicating that the correlation of the model is good. The funnel viscosity, final shear force and drilling time were selected as the main control factors by elimination method, contribution rate method and functional method. The calculated values of the two wells used for verification differ from the actual values by -3.036 m3/h and -2.374 m3/h, with errors of 7.21% and 6.35%. The influence of engineering factors on the loss rate is greater than that of funnel viscosity and final shear force, and the influence of the three factors is less than that of geological factors. The best combination of funnel viscosity, final shear force and drilling time is obtained through quantitative calculation. The minimum loss rate of lost circulation wells in Shunbei area is 10 m3/h. It can be seen that man-made main control factors can only slow down the leakage, but cannot fundamentally eliminate it. This is more in line with the characteristics of karst caves and fractures in Shunbei fault solution oil and gas reservoir.

Keywords: Drilling fluid, loss rate, main controlling factors, Unmanned Intervention Algorithm.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 399
7342 Assessment of Urban Heat Island through Remote Sensing in Nagpur Urban Area Using Landsat 7 ETM+ Satellite Images

Authors: Meenal Surawar, Rajashree Kotharkar

Abstract:

Urban Heat Island (UHI) is found more pronounced as a prominent urban environmental concern in developing cities. To study the UHI effect in the Indian context, the Nagpur urban area has been explored in this paper using Landsat 7 ETM+ satellite images through Remote Sensing and GIS techniques. This paper intends to study the effect of LU/LC pattern on daytime Land Surface Temperature (LST) variation, contributing UHI formation within the Nagpur Urban area. Supervised LU/LC area classification was carried to study urban Change detection using ENVI 5. Change detection has been studied by carrying Normalized Difference Vegetation Index (NDVI) to understand the proportion of vegetative cover with respect to built-up ratio. Detection of spectral radiance from the thermal band of satellite images was processed to calibrate LST. Specific representative areas on the basis of urban built-up and vegetation classification were selected for observation of point LST. The entire Nagpur urban area shows that, as building density increases with decrease in vegetation cover, LST increases, thereby causing the UHI effect. UHI intensity has gradually increased by 0.7°C from 2000 to 2006; however, a drastic increase has been observed with difference of 1.8°C during the period 2006 to 2013. Within the Nagpur urban area, the UHI effect was formed due to increase in building density and decrease in vegetative cover.

Keywords: Land use, land cover, land surface temperature, remote sensing, urban heat island.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2607
7341 Towards a Competence Management Approach Based on Continuous Improvement

Authors: N. Sefiani, C. Fikri Benbrahim, A. Boumane, K. Reklaoui

Abstract:

Nowadays, the reflection on competence management is the basic for new competitive strategies. It is considered as the core of the problems of the global supply chain. It interact a variety of actors: information, physical and activities flows, etc. Even though competence management is seen as the key factor for any business success, the existing approaches demonstrate the deficiencies and limitations of the competence concept. This research has two objectives: The first is to make a contribution by focusing on the development of a competence approach, based on continuous improvement. It allows the enterprise to spot key competencies, mobilize them in order to serve its strategic objectives and to develop future competencies. The second is to propose a method to evaluate the Level of Collective Competence. The approach was confirmed through an application carried out at an automotive company.

Keywords: Competencies, approach, continuous improvement, collective competence level, performance indicator.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1526
7340 Summarizing Data Sets for Data Mining by Using Statistical Methods in Coastal Engineering

Authors: Yunus Doğan, Ahmet Durap

Abstract:

Coastal regions are the one of the most commonly used places by the natural balance and the growing population. In coastal engineering, the most valuable data is wave behaviors. The amount of this data becomes very big because of observations that take place for periods of hours, days and months. In this study, some statistical methods such as the wave spectrum analysis methods and the standard statistical methods have been used. The goal of this study is the discovery profiles of the different coast areas by using these statistical methods, and thus, obtaining an instance based data set from the big data to analysis by using data mining algorithms. In the experimental studies, the six sample data sets about the wave behaviors obtained by 20 minutes of observations from Mersin Bay in Turkey and converted to an instance based form, while different clustering techniques in data mining algorithms were used to discover similar coastal places. Moreover, this study discusses that this summarization approach can be used in other branches collecting big data such as medicine.

Keywords: Clustering algorithms, coastal engineering, data mining, data summarization, statistical methods.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1243
7339 Dimensional Modeling of HIV Data Using Open Source

Authors: Charles D. Otine, Samuel B. Kucel, Lena Trojer

Abstract:

Selecting the data modeling technique for an information system is determined by the objective of the resultant data model. Dimensional modeling is the preferred modeling technique for data destined for data warehouses and data mining, presenting data models that ease analysis and queries which are in contrast with entity relationship modeling. The establishment of data warehouses as components of information system landscapes in many organizations has subsequently led to the development of dimensional modeling. This has been significantly more developed and reported for the commercial database management systems as compared to the open sources thereby making it less affordable for those in resource constrained settings. This paper presents dimensional modeling of HIV patient information using open source modeling tools. It aims to take advantage of the fact that the most affected regions by the HIV virus are also heavily resource constrained (sub-Saharan Africa) whereas having large quantities of HIV data. Two HIV data source systems were studied to identify appropriate dimensions and facts these were then modeled using two open source dimensional modeling tools. Use of open source would reduce the software costs for dimensional modeling and in turn make data warehousing and data mining more feasible even for those in resource constrained settings but with data available.

Keywords: About Database, Data Mining, Data warehouse, Dimensional Modeling, Open Source.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1958
7338 Comparative Evaluation of Accuracy of Selected Machine Learning Classification Techniques for Diagnosis of Cancer: A Data Mining Approach

Authors: Rajvir Kaur, Jeewani Anupama Ginige

Abstract:

With recent trends in Big Data and advancements in Information and Communication Technologies, the healthcare industry is at the stage of its transition from clinician oriented to technology oriented. Many people around the world die of cancer because the diagnosis of disease was not done at an early stage. Nowadays, the computational methods in the form of Machine Learning (ML) are used to develop automated decision support systems that can diagnose cancer with high confidence in a timely manner. This paper aims to carry out the comparative evaluation of a selected set of ML classifiers on two existing datasets: breast cancer and cervical cancer. The ML classifiers compared in this study are Decision Tree (DT), Support Vector Machine (SVM), k-Nearest Neighbor (k-NN), Logistic Regression, Ensemble (Bagged Tree) and Artificial Neural Networks (ANN). The evaluation is carried out based on standard evaluation metrics Precision (P), Recall (R), F1-score and Accuracy. The experimental results based on the evaluation metrics show that ANN showed the highest-level accuracy (99.4%) when tested with breast cancer dataset. On the other hand, when these ML classifiers are tested with the cervical cancer dataset, Ensemble (Bagged Tree) technique gave better accuracy (93.1%) in comparison to other classifiers.

Keywords: Artificial neural networks, breast cancer, cancer dataset, classifiers, cervical cancer, F-score, logistic regression, machine learning, precision, recall, support vector machine.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1551
7337 Efficient Lossless Compression of Weather Radar Data

Authors: Wei-hua Ai, Wei Yan, Xiang Li

Abstract:

Data compression is used operationally to reduce bandwidth and storage requirements. An efficient method for achieving lossless weather radar data compression is presented. The characteristics of the data are taken into account and the optical linear prediction is used for the PPI images in the weather radar data in the proposed method. The next PPI image is identical to the current one and a dramatic reduction in source entropy is achieved by using the prediction algorithm. Some lossless compression methods are used to compress the predicted data. Experimental results show that for the weather radar data, the method proposed in this paper outperforms the other methods.

Keywords: Lossless compression, weather radar data, optical linear prediction, PPI image

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2256
7336 Conceptualizing the Knowledge to Manage and Utilize Data Assets in the Context of Digitization: Case Studies of Multinational Industrial Enterprises

Authors: Martin Böhmer, Agatha Dabrowski, Boris Otto

Abstract:

The trend of digitization significantly changes the role of data for enterprises. Data turn from an enabler to an intangible organizational asset that requires management and qualifies as a tradeable good. The idea of a networked economy has gained momentum in the data domain as collaborative approaches for data management emerge. Traditional organizational knowledge consequently needs to be extended by comprehensive knowledge about data. The knowledge about data is vital for organizations to ensure that data quality requirements are met and data can be effectively utilized and sovereignly governed. As this specific knowledge has been paid little attention to so far by academics, the aim of the research presented in this paper is to conceptualize it by proposing a “data knowledge model”. Relevant model entities have been identified based on a design science research (DSR) approach that iteratively integrates insights of various industry case studies and literature research.

Keywords: Data management, digitization, Industry 4.0, knowledge engineering, metamodel.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1456
7335 Direct Democracy and Social Contract in Ancient Athens

Authors: Nicholas Kyriazis, Emmanouil Marios L. Economou, Jr, Loukas Zachilas

Abstract:

In the present essay, a model of choice by actors is analysedby utilizing the theory of chaos to explain how change comes about. Then, by using ancient and modern sources of literature, the theory of the social contract is analysed as a historical phenomenon that first appeared during the period of Classical Greece. Then, based on the findings of this analysis, the practice of direct democracy and public choice in ancient Athens is analysed, through two historical cases: Eubulus and Lycurgus political program in the second half of the 4th century. The main finding of this research is that these policies can be interpreted as an implementation of a social contract, through which citizens were taking decisions based on rational choice according to economic considerations.

Keywords: Chaos theory, public choice, social contract, 4th century BC. Athens.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2398
7334 A Methodology for Data Migration between Different Database Management Systems

Authors: Bogdan Walek, Cyril Klimes

Abstract:

In present days the area of data migration is very topical. Current tools for data migration in the area of relational database have several disadvantages that are presented in this paper. We propose a methodology for data migration of the database tables and their data between various types of relational database systems (RDBMS). The proposed methodology contains an expert system. The expert system contains a knowledge base that is composed of IFTHEN rules and based on the input data suggests appropriate data types of columns of database tables. The proposed tool, which contains an expert system, also includes the possibility of optimizing the data types in the target RDBMS database tables based on processed data of the source RDBMS database tables. The proposed expert system is shown on data migration of selected database of the source RDBMS to the target RDBMS.

Keywords: Expert system, fuzzy, data migration, database, relational database, data type, relational database management system.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3491
7333 Government of Ghana’s Budget: Its Functions, Coverage, Classification, and Integration with Chart of Accounts

Authors: Mohammed Sani Abdulai

Abstract:

Government budgets are the primary instruments for formulating and implementing a country’s fiscal policy objectives, development priorities, and the overall socio-economic aspirations of its people. Thus, in this paper, the author examined the Government of Ghana’s budgets with respect to their functions, coverage, classifications, and integration with the country’s chart of accounts. The author did so by amalgamating the research findings of extant literature with (a) the operational and procedural guidelines underpinning the formulation and execution of the government’s budgets; (b) the recommendations made by various development partners and thinktanks on reforming the country’s budgeting processes and procedures; and (c) the lessons Ghana could learn from the budget reform efforts of other countries. By way of research findings, the paper showed that the Government of Ghana’s budgets in terms of function are both eclectic and multidimensional. On coverage, the paper showed that the country’s budgets duly cover the revenues and expenditures of the general government (i.e., both the central and sub-national governments). Finally, on classifications, the paper noted with delight the Government of Ghana’s effort in providing classificatory codes to both its national development agenda and such international development goals as the AU’s Agenda 2063 and the UN’s Sustainable Development Goals. However, the paper found some significant lapses that require a complete overhaul and structuring on the integrations of its budget classifications with its chart of accounts. Thus, the paper concluded with a detailed examination of the challenges confronting the country’s current chart of accounts and recommendations for addressing them.

Keywords: Budget, budgetary transactions, budgetary governance, Chart of Accounts, classification, composition, coverage, Public Financial Management.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 514
7332 Comparative Analysis of Machine Learning Tools: A Review

Authors: S. Sarumathi, M. Vaishnavi, S. Geetha, P. Ranjetha

Abstract:

Machine learning is a new and exciting area of artificial intelligence nowadays. Machine learning is the most valuable, time, supervised, and cost-effective approach. It is not a narrow learning approach; it also includes a wide range of methods and techniques that can be applied to a wide range of complex realworld problems and time domains. Biological image classification, adaptive testing, computer vision, natural language processing, object detection, cancer detection, face recognition, handwriting recognition, speech recognition, and many other applications of machine learning are widely used in research, industry, and government. Every day, more data are generated, and conventional machine learning techniques are becoming obsolete as users move to distributed and real-time operations. By providing fundamental knowledge of machine learning tools and research opportunities in the field, the aim of this article is to serve as both a comprehensive overview and a guide. A diverse set of machine learning resources is demonstrated and contrasted with the key features in this survey.

Keywords: Artificial intelligence, machine learning, deep learning, machine learning algorithms, machine learning tools.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1846
7331 Process of Revitalization of the City Centers in Poland: The Problem of Cooperation between Sectors

Authors: Ewa M. Boryczka

Abstract:

Contemporary city is a subject to rapid economic and social changes. Therefore, it requires an active policy designed to meet the diverse needs of their residents, build competitive position and capacity to compete with other cities. Competitiveness of cities depends largely on their resources but also to a large extent, on the policies and performance of local authorities. Cooperation with social sector also plays an important role, as it affects the use of resources and builds an advantage over other cities. The subject of this article is city's contemporary problems of development with particular emphasis on central areas. This issue is a starting point for reflection on the process of urban regeneration in medium size cities in Poland, as well as cooperation between various actors and their roles in the revitalization processes of Polish cities' centers.

Keywords: City, cooperation between sectors, crisis of city centers, revitalization.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1651
7330 Classification of Precipitation Types Detected in Malaysia

Authors: K. Badron, A. F. Ismail, A. L. Asnawi, N. F. A. Malik, S. Z. Abidin, S. Dzulkifly

Abstract:

The occurrences of precipitation, also commonly referred as rain, in the form of "convective" and "stratiform" have been identified to exist worldwide. In this study, the radar return echoes or known as reflectivity values acquired from radar scans have been exploited in the process of classifying the type of rain endured. The investigation use radar data from Malaysian Meteorology Department (MMD). It is possible to discriminate the types of rain experienced in tropical region by observing the vertical characteristics of the rain structure. .Heavy rain in tropical region profoundly affects radiowave signals, causing transmission interference and signal fading. Required wireless system fade margin depends on the type of rain. Information relating to the two mentioned types of rain is critical for the system engineers and researchers in their endeavour to improve the reliability of communication links. This paper highlights the quantification of percentage occurrences over one year period in 2009.

Keywords: Stratiform, convective, tropical region, attenuation radar reflectivity.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1443
7329 Opening up Government Datasets for Big Data Analysis to Support Policy Decisions

Authors: K. Hardy, A. Maurushat

Abstract:

Policy makers are increasingly looking to make evidence-based decisions. Evidence-based decisions have historically used rigorous methodologies of empirical studies by research institutes, as well as less reliable immediate survey/polls often with limited sample sizes. As we move into the era of Big Data analytics, policy makers are looking to different methodologies to deliver reliable empirics in real-time. The question is not why did these people do this for the last 10 years, but why are these people doing this now, and if the this is undesirable, and how can we have an impact to promote change immediately. Big data analytics rely heavily on government data that has been released in to the public domain. The open data movement promises greater productivity and more efficient delivery of services; however, Australian government agencies remain reluctant to release their data to the general public. This paper considers the barriers to releasing government data as open data, and how these barriers might be overcome.

Keywords: Big data, open data, productivity, transparency.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1633
7328 Exploring the Spatial Characteristics of Mortality Map: A Statistical Area Perspective

Authors: Jung-Hong Hong, Jing-Cen Yang, Cai-Yu Ou

Abstract:

The analysis of geographic inequality heavily relies on the use of location-enabled statistical data and quantitative measures to present the spatial patterns of the selected phenomena and analyze their differences. To protect the privacy of individual instance and link to administrative units, point-based datasets are spatially aggregated to area-based statistical datasets, where only the overall status for the selected levels of spatial units is used for decision making. The partition of the spatial units thus has dominant influence on the outcomes of the analyzed results, well known as the Modifiable Areal Unit Problem (MAUP). A new spatial reference framework, the Taiwan Geographical Statistical Classification (TGSC), was recently introduced in Taiwan based on the spatial partition principles of homogeneous consideration of the number of population and households. Comparing to the outcomes of the traditional township units, TGSC provides additional levels of spatial units with finer granularity for presenting spatial phenomena and enables domain experts to select appropriate dissemination level for publishing statistical data. This paper compares the results of respectively using TGSC and township unit on the mortality data and examines the spatial characteristics of their outcomes. For the mortality data between the period of January 1st, 2008 and December 31st, 2010 of the Taitung County, the all-cause age-standardized death rate (ASDR) ranges from 571 to 1757 per 100,000 persons, whereas the 2nd dissemination area (TGSC) shows greater variation, ranged from 0 to 2222 per 100,000. The finer granularity of spatial units of TGSC clearly provides better outcomes for identifying and evaluating the geographic inequality and can be further analyzed with the statistical measures from other perspectives (e.g., population, area, environment.). The management and analysis of the statistical data referring to the TGSC in this research is strongly supported by the use of Geographic Information System (GIS) technology. An integrated workflow that consists of the tasks of the processing of death certificates, the geocoding of street address, the quality assurance of geocoded results, the automatic calculation of statistic measures, the standardized encoding of measures and the geo-visualization of statistical outcomes is developed. This paper also introduces a set of auxiliary measures from a geographic distribution perspective to further examine the hidden spatial characteristics of mortality data and justify the analyzed results. With the common statistical area framework like TGSC, the preliminary results demonstrate promising potential for developing a web-based statistical service that can effectively access domain statistical data and present the analyzed outcomes in meaningful ways to avoid wrong decision making.

Keywords: Mortality map, spatial patterns, statistical area, variation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 989
7327 Geosynthetic Reinforced Unpaved Road: Literature Study and Design Example

Authors: D. Jayalakshmi, S. Bhosale

Abstract:

This paper, in its first part, presents the state-of-the-art literature of design approaches for geosynthetic reinforced unpaved roads. The literature starting since 1970 and the critical appraisal of flexible pavement design by Giroud and Han (2004) and Jonathan Fannin (2006) is presented. The design example is illustrated for Indian conditions. The example emphasizes the results computed by Giroud and Han's (2004) design method with the Indian road congress guidelines by IRC SP 72 -2015. The input data considered are related to the subgrade soil condition of Maharashtra State in India. The unified soil classification of the subgrade soil is inorganic clay with high plasticity (CH), which is expansive with a California bearing ratio (CBR) of 2% to 3%. The example exhibits the unreinforced case and geotextile as reinforcement by varying the rut depth from 25 mm to 100 mm. The present result reveals the base thickness for the unreinforced case from the IRC design catalogs is in good agreement with Giroud and Han (2004) approach for a range of 75 mm to 100 mm rut depth. Since Giroud and Han (2004) method is applicable for both reinforced and unreinforced cases, for the same data with appropriate Nc factor, for the same rut depth, the base thickness for the reinforced case has arrived for the Indian condition. From this trial, for the CBR of 2%, the base thickness reduction due to geotextile inclusion is 35%. For the CBR range of 2% to 5% with different stiffness in geosynthetics, the reduction in base course thickness will be evaluated, and the validation will be executed by the full-scale accelerated pavement testing set up at the College of Engineering Pune (COE), India.

Keywords: Base thickness, design approach, equation, full scale accelerated pavement set up, Indian condition.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 653
7326 Forthcoming Big Data on Smart Buildings and Cities: An Experimental Study on Correlations among Urban Data

Authors: Yu-Mi Song, Sung-Ah Kim, Dongyoun Shin

Abstract:

Cities are complex systems of diverse and inter-tangled activities. These activities and their complex interrelationships create diverse urban phenomena. And such urban phenomena have considerable influences on the lives of citizens. This research aimed to develop a method to reveal the causes and effects among diverse urban elements in order to enable better understanding of urban activities and, therefrom, to make better urban planning strategies. Specifically, this study was conducted to solve a data-recommendation problem found on a Korean public data homepage. First, a correlation analysis was conducted to find the correlations among random urban data. Then, based on the results of that correlation analysis, the weighted data network of each urban data was provided to people. It is expected that the weights of urban data thereby obtained will provide us with insights into cities and show us how diverse urban activities influence each other and induce feedback.

Keywords: Big data, correlation analysis, data recommendation system, urban data network.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1104
7325 Markov Random Field-Based Segmentation Algorithm for Detection of Land Cover Changes Using Uninhabited Aerial Vehicle Synthetic Aperture Radar Polarimetric Images

Authors: Mehrnoosh Omati, Mahmod Reza Sahebi

Abstract:

The information on land use/land cover changing plays an essential role for environmental assessment, planning and management in regional development. Remotely sensed imagery is widely used for providing information in many change detection applications. Polarimetric Synthetic aperture radar (PolSAR) image, with the discrimination capability between different scattering mechanisms, is a powerful tool for environmental monitoring applications. This paper proposes a new boundary-based segmentation algorithm as a fundamental step for land cover change detection. In this method, first, two PolSAR images are segmented using integration of marker-controlled watershed algorithm and coupled Markov random field (MRF). Then, object-based classification is performed to determine changed/no changed image objects. Compared with pixel-based support vector machine (SVM) classifier, this novel segmentation algorithm significantly reduces the speckle effect in PolSAR images and improves the accuracy of binary classification in object-based level. The experimental results on Uninhabited Aerial Vehicle Synthetic Aperture Radar (UAVSAR) polarimetric images show a 3% and 6% improvement in overall accuracy and kappa coefficient, respectively. Also, the proposed method can correctly distinguish homogeneous image parcels.

Keywords: Coupled Markov random field, environment, object-based analysis, Polarimetric SAR images.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 862
7324 Determination of Water Pollution and Water Quality with Decision Trees

Authors: Çiğdem Bakır, Mecit Yüzkat

Abstract:

With the increasing emphasis on water quality worldwide, the search for and expanding the market for new and intelligent monitoring systems has increased. The current method is the laboratory process, where samples are taken from bodies of water, and tests are carried out in laboratories. This method is time-consuming, a waste of manpower and uneconomical. To solve this problem, we used machine learning methods to detect water pollution in our study. We created decision trees with the Orange3 software used in the study and tried to determine all the factors that cause water pollution. An automatic prediction model based on water quality was developed by taking many model inputs such as water temperature, pH, transparency, conductivity, dissolved oxygen, and ammonia nitrogen with machine learning methods. The proposed approach consists of three stages: Preprocessing of the data used, feature detection and classification. We tried to determine the success of our study with different accuracy metrics and the results were presented comparatively. In addition, we achieved approximately 98% success with the decision tree.

Keywords: Decision tree, water quality, water pollution, machine learning.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 258
7323 Identification of Spam Keywords Using Hierarchical Category in C2C E-commerce

Authors: Shao Bo Cheng, Yong-Jin Han, Se Young Park, Seong-Bae Park

Abstract:

Consumer-to-Consumer (C2C) E-commerce has been growing at a very high speed in recent years. Since identical or nearly-same kinds of products compete one another by relying on keyword search in C2C E-commerce, some sellers describe their products with spam keywords that are popular but are not related to their products. Though such products get more chances to be retrieved and selected by consumers than those without spam keywords, the spam keywords mislead the consumers and waste their time. This problem has been reported in many commercial services like ebay and taobao, but there have been little research to solve this problem. As a solution to this problem, this paper proposes a method to classify whether keywords of a product are spam or not. The proposed method assumes that a keyword for a given product is more reliable if the keyword is observed commonly in specifications of products which are the same or the same kind as the given product. This is because that a hierarchical category of a product in general determined precisely by a seller of the product and so is the specification of the product. Since higher layers of the hierarchical category represent more general kinds of products, a reliable degree is differently determined according to the layers. Hence, reliable degrees from different layers of a hierarchical category become features for keywords and they are used together with features only from specifications for classification of the keywords. Support Vector Machines are adopted as a basic classifier using the features, since it is powerful, and widely used in many classification tasks. In the experiments, the proposed method is evaluated with a golden standard dataset from Yi-han-wang, a Chinese C2C E-commerce, and is compared with a baseline method that does not consider the hierarchical category. The experimental results show that the proposed method outperforms the baseline in F1-measure, which proves that spam keywords are effectively identified by a hierarchical category in C2C E-commerce.

Keywords: Spam Keyword, E-commerce, keyword features, spam filtering.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2507
7322 Classifying Students for E-Learning in Information Technology Course Using ANN

Authors: S. Areerachakul, N. Ployong, S. Na Songkla

Abstract:

This research’s objective is to select the model with most accurate value by using Neural Network Technique as a way to filter potential students who enroll in IT course by Electronic learning at Suan Suanadha Rajabhat University. It is designed to help students selecting the appropriate courses by themselves. The result showed that the most accurate model was 100 Folds Cross-validation which had 73.58% points of accuracy.

Keywords: Artificial neural network, classification, students.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1496
7321 Comparison of Imputation Techniques for Efficient Prediction of Software Fault Proneness in Classes

Authors: Geeta Sikka, Arvinder Kaur Takkar, Moin Uddin

Abstract:

Missing data is a persistent problem in almost all areas of empirical research. The missing data must be treated very carefully, as data plays a fundamental role in every analysis. Improper treatment can distort the analysis or generate biased results. In this paper, we compare and contrast various imputation techniques on missing data sets and make an empirical evaluation of these methods so as to construct quality software models. Our empirical study is based on NASA-s two public dataset. KC4 and KC1. The actual data sets of 125 cases and 2107 cases respectively, without any missing values were considered. The data set is used to create Missing at Random (MAR) data Listwise Deletion(LD), Mean Substitution(MS), Interpolation, Regression with an error term and Expectation-Maximization (EM) approaches were used to compare the effects of the various techniques.

Keywords: Missing data, Imputation, Missing Data Techniques.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1666
7320 Effective Density for the Classification of Transport Activity Centers

Authors: Dubbale Daniel A., Tsutsumi J.

Abstract:

This research work takes a different approach in the discussion of urban form impacts on transport planning and auto dependency. Concentrated density represented by effective density explains auto dependency better than the conventional density and it is proved to be a realistic density representative for the urban transportation analysis. Model analysis reveals that effective density is influenced by the shopping accessibility index as well as job density factor. It is also combined with the job access variable to classify four levels of Transport Activity Centers (TACs) in Okinawa, Japan. Trip attraction capacity and levels of the newly classified TACs was found agreeable with the amount of daily trips attracted to each center. The trip attraction data set was drawn from a 2007 Okinawa personal trip survey. This research suggests a planning methodology which guides logical transport supply routes and concentrated local development schemes.

Keywords: Effective density, urban form, auto-dependency, transport activity centers

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1514
7319 IoT Device Cost Effective Storage Architecture and Real-Time Data Analysis/Data Privacy Framework

Authors: Femi Elegbeleye, Seani Rananga

Abstract:

This paper focused on cost effective storage architecture using fog and cloud data storage gateway, and presented the design of the framework for the data privacy model and data analytics framework on a real-time analysis when using machine learning method. The paper began with the system analysis, system architecture and its component design, as well as the overall system operations. Several results obtained from this study on data privacy models show that when two or more data privacy models are integrated via a fog storage gateway, we often have more secure data. Our main focus in the study is to design a framework for the data privacy model, data storage, and real-time analytics. This paper also shows the major system components and their framework specification. And lastly, the overall research system architecture was shown, including its structure, and its interrelationships.

Keywords: IoT, fog storage, cloud storage, data analysis, data privacy.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 241
7318 Arterial Stiffness Detection Depending on Neural Network Classification of the Multi- Input Parameters

Authors: Firas Salih, Luban Hameed, Afaf Kamil, Armin Bolz

Abstract:

Diagnostic and detection of the arterial stiffness is very important; which gives indication of the associated increased risk of cardiovascular diseases. To make a cheap and easy method for general screening technique to avoid the future cardiovascular complexes , due to the rising of the arterial stiffness ; a proposed algorithm depending on photoplethysmogram to be used. The photoplethysmograph signals would be processed in MATLAB. The signal will be filtered, baseline wandering removed, peaks and valleys detected and normalization of the signals should be achieved .The area under the catacrotic phase of the photoplethysmogram pulse curve is calculated using trapezoidal algorithm ; then will used in cooperation with other parameters such as age, height, blood pressure in neural network for arterial stiffness detection. The Neural network were implemented with sensitivity of 80%, accuracy 85% and specificity of 90% were got from the patients data. It is concluded that neural network can detect the arterial STIFFNESS depending on risk factor parameters.

Keywords: Arterial stiffness, area under the catacrotic phase of the photoplethysmograph pulse, neural network

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1651
7317 Semantic Indexing Approach of a Corpora Based On Ontology

Authors: Mohammed Erritali

Abstract:

The growth in the volume of text data such as books and articles in libraries for centuries has imposed to establish effective mechanisms to locate them. Early techniques such as abstraction, indexing and the use of classification categories have marked the birth of a new field of research called "Information Retrieval". Information Retrieval (IR) can be defined as the task of defining models and systems whose purpose is to facilitate access to a set of documents in electronic form (corpus) to allow a user to find the relevant ones for him, that is to say, the contents which matches with the information needs of the user. This paper presents a new semantic indexing approach of a documentary corpus. The indexing process starts first by a term weighting phase to determine the importance of these terms in the documents. Then the use of a thesaurus like Wordnet allows moving to the conceptual level. Each candidate concept is evaluated by determining its level of representation of the document, that is to say, the importance of the concept in relation to other concepts of the document. Finally, the semantic index is constructed by attaching to each concept of the ontology, the documents of the corpus in which these concepts are found.

Keywords: Semantic, indexing, corpora, WordNet, ontology.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1367
7316 Towards an Integrated Proposal for Performance Measurement Indicators (Financial and Operational) in Advanced Production Practices

Authors: José A. D. Machuca, Bernabé Escobar-Pérez, Pedro Garrido Vega, Darkys E. Lujan García

Abstract:

Starting with an analysis of the financial and operational indicators that can be found in the specialised literature, this study aims to contribute to improvements in the performance measurement systems used when the unit of analysis is the manufacturing plant. For this a search was done in the highest impact Journals of Production and Operations Management and Management Accounting , with the aim of determining the financial and operational indicators used to evaluate performance when Advanced Production Practices have been implemented, more specifically when the practices implemented are Total Quality Management, JIT/Lean Manufacturing and Total Productive Maintenance. This has enabled us to obtain a classification of the two types of indicators based on how much each is used. For the financial indicators we have also prepared a proposal that can be adapted to manufacturing plants- accounting features. In the near future we will propose a model that links practices implementation with financial and operational indicators and these two last with each other. We aim to will test this model empirically with the data obtained in the High Performance Manufacturing Project.

Keywords: Advanced Production Practices, Financial Indicators, Non-Financial Indicators

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1505
7315 Mining Image Features in an Automatic Two-Dimensional Shape Recognition System

Authors: R. A. Salam, M.A. Rodrigues

Abstract:

The number of features required to represent an image can be very huge. Using all available features to recognize objects can suffer from curse dimensionality. Feature selection and extraction is the pre-processing step of image mining. Main issues in analyzing images is the effective identification of features and another one is extracting them. The mining problem that has been focused is the grouping of features for different shapes. Experiments have been conducted by using shape outline as the features. Shape outline readings are put through normalization and dimensionality reduction process using an eigenvector based method to produce a new set of readings. After this pre-processing step data will be grouped through their shapes. Through statistical analysis, these readings together with peak measures a robust classification and recognition process is achieved. Tests showed that the suggested methods are able to automatically recognize objects through their shapes. Finally, experiments also demonstrate the system invariance to rotation, translation, scale, reflection and to a small degree of distortion.

Keywords: Image mining, feature selection, shape recognition, peak measures.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1457
7314 Quantification of E-Waste: A Case Study in Federal University of Espírito Santo, Brazil

Authors: Andressa S. T. Gomes, Luiza A. Souza, Luciana H. Yamane, Renato R. Siman

Abstract:

The segregation of waste of electrical and electronic equipment (WEEE) in the generating source, its characterization (quali-quantitative) and identification of origin, besides being integral parts of classification reports, are crucial steps to the success of its integrated management. The aim of this paper was to count WEEE generation at the Federal University of Espírito Santo (UFES), Brazil, as well as to define sources, temporary storage sites, main transportations routes and destinations, the most generated WEEE and its recycling potential. Quantification of WEEE generated at the University in the years between 2010 and 2015 was performed using data analysis provided by UFES’s sector of assets management. EEE and WEEE flow in the campuses information were obtained through questionnaires applied to the University workers. It was recorded 6028 WEEEs units of data processing equipment disposed by the university between 2010 and 2015. Among these waste, the most generated were CRT screens, desktops, keyboards and printers. Furthermore, it was observed that these WEEEs are temporarily stored in inappropriate places at the University campuses. In general, these WEEE units are donated to NGOs of the city, or sold through auctions (2010 and 2013). As for recycling potential, from the primary processing and further sale of printed circuit boards (PCB) from the computers, the amount collected could reach U$ 27,839.23. The results highlight the importance of a WEEE management policy at the University.

Keywords: Solid waste, waste of electric and electronic equipment, waste management, institutional generation of solid waste.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1565