Search results for: training data condensation.
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 7973

Search results for: training data condensation.

6983 Machine Learning Facing Behavioral Noise Problem in an Imbalanced Data Using One Side Behavioral Noise Reduction: Application to a Fraud Detection

Authors: Salma El Hajjami, Jamal Malki, Alain Bouju, Mohammed Berrada

Abstract:

With the expansion of machine learning and data mining in the context of Big Data analytics, the common problem that affects data is class imbalance. It refers to an imbalanced distribution of instances belonging to each class. This problem is present in many real world applications such as fraud detection, network intrusion detection, medical diagnostics, etc. In these cases, data instances labeled negatively are significantly more numerous than the instances labeled positively. When this difference is too large, the learning system may face difficulty when tackling this problem, since it is initially designed to work in relatively balanced class distribution scenarios. Another important problem, which usually accompanies these imbalanced data, is the overlapping instances between the two classes. It is commonly referred to as noise or overlapping data. In this article, we propose an approach called: One Side Behavioral Noise Reduction (OSBNR). This approach presents a way to deal with the problem of class imbalance in the presence of a high noise level. OSBNR is based on two steps. Firstly, a cluster analysis is applied to groups similar instances from the minority class into several behavior clusters. Secondly, we select and eliminate the instances of the majority class, considered as behavioral noise, which overlap with behavior clusters of the minority class. The results of experiments carried out on a representative public dataset confirm that the proposed approach is efficient for the treatment of class imbalances in the presence of noise.

Keywords: Machine learning, Imbalanced data, Data mining, Big data.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1101
6982 Content Based Sampling over Transactional Data Streams

Authors: Mansour Tarafdar, Mohammad Saniee Abade

Abstract:

This paper investigates the problem of sampling from transactional data streams. We introduce CFISDS as a content based sampling algorithm that works on a landmark window model of data streams and preserve more informed sample in sample space. This algorithm that work based on closed frequent itemset mining tasks, first initiate a concept lattice using initial data, then update lattice structure using an incremental mechanism.Incremental mechanism insert, update and delete nodes in/from concept lattice in batch manner. Presented algorithm extracts the final samples on demand of user. Experimental results show the accuracy of CFISDS on synthetic and real datasets, despite on CFISDS algorithm is not faster than exist sampling algorithms such as Z and DSS.

Keywords: Sampling, data streams, closed frequent item set mining.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1688
6981 An Automatic Tool for Checking Consistency between Data Flow Diagrams (DFDs)

Authors: Rosziati Ibrahim, Siow Yen Yen

Abstract:

System development life cycle (SDLC) is a process uses during the development of any system. SDLC consists of four main phases: analysis, design, implement and testing. During analysis phase, context diagram and data flow diagrams are used to produce the process model of a system. A consistency of the context diagram to lower-level data flow diagrams is very important in smoothing up developing process of a system. However, manual consistency check from context diagram to lower-level data flow diagrams by using a checklist is time-consuming process. At the same time, the limitation of human ability to validate the errors is one of the factors that influence the correctness and balancing of the diagrams. This paper presents a tool that automates the consistency check between Data Flow Diagrams (DFDs) based on the rules of DFDs. The tool serves two purposes: as an editor to draw the diagrams and as a checker to check the correctness of the diagrams drawn. The consistency check from context diagram to lower-level data flow diagrams is embedded inside the tool to overcome the manual checking problem.

Keywords: Data Flow Diagram, Context Diagram, ConsistencyCheck, Syntax and Semantic Rules

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3417
6980 Understanding Innovation by Analyzing the Pillars of the Global Competitiveness Index

Authors: Ujjwala Bhand, Mridula Goel

Abstract:

Global Competitiveness Index (GCI) prepared by World Economic Forum has become a benchmark in studying the competitiveness of countries and for understanding the factors that enable competitiveness. Innovation is a key pillar in competitiveness and has the unique property of enabling exponential economic growth. This paper attempts to analyze how the pillars comprising the Global Competitiveness Index affect innovation and whether GDP growth can directly affect innovation outcomes for a country. The key objective of the study is to identify areas on which governments of developing countries can focus policies and programs to improve their country’s innovativeness. We have compiled a panel data set for top innovating countries and large emerging economies called BRICS from 2007-08 to 2014-15 in order to find the significant factors that affect innovation. The results of the regression analysis suggest that government should make policies to improve labor market efficiency, establish sophisticated business networks, provide basic health and primary education to its people and strengthen the quality of higher education and training services in the economy. The achievements of smaller economies on innovation suggest that concerted efforts by governments can counter any size related disadvantage, and in fact can provide greater flexibility and speed in encouraging innovation.

Keywords: Innovation, Global Competitiveness Index, BRICS, economic growth.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1025
6979 Surrogate based Evolutionary Algorithm for Design Optimization

Authors: Maumita Bhattacharya

Abstract:

Optimization is often a critical issue for most system design problems. Evolutionary Algorithms are population-based, stochastic search techniques, widely used as efficient global optimizers. However, finding optimal solution to complex high dimensional, multimodal problems often require highly computationally expensive function evaluations and hence are practically prohibitive. The Dynamic Approximate Fitness based Hybrid EA (DAFHEA) model presented in our earlier work [14] reduced computation time by controlled use of meta-models to partially replace the actual function evaluation by approximate function evaluation. However, the underlying assumption in DAFHEA is that the training samples for the meta-model are generated from a single uniform model. Situations like model formation involving variable input dimensions and noisy data certainly can not be covered by this assumption. In this paper we present an enhanced version of DAFHEA that incorporates a multiple-model based learning approach for the SVM approximator. DAFHEA-II (the enhanced version of the DAFHEA framework) also overcomes the high computational expense involved with additional clustering requirements of the original DAFHEA framework. The proposed framework has been tested on several benchmark functions and the empirical results illustrate the advantages of the proposed technique.

Keywords: Evolutionary algorithm, Fitness function, Optimization, Meta-model, Stochastic method.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1557
6978 Real-Time Implementation of STANAG 4539 High-Speed HF Modem

Authors: S. Saraç, F. Kara, C.Vural

Abstract:

High-frequency (HF) communications have been used by military organizations for more than 90 years. The opportunity of very long range communications without the need for advanced equipment makes HF a convenient and inexpensive alternative of satellite communications. Besides the advantages, voice and data transmission over HF is a challenging task, because the HF channel generally suffers from Doppler shift and spread, multi-path, cochannel interference, and many other sources of noise. In constructing an HF data modem, all these effects must be taken into account. STANAG 4539 is a NATO standard for high-speed data transmission over HF. It allows data rates up to 12800 bps over an HF channel of 3 kHz. In this work, an efficient implementation of STANAG 4539 on a single Texas Instruments- TMS320C6747 DSP chip is described. The state-of-the-art algorithms used in the receiver and the efficiency of the implementation enables real-time high-speed data / digitized voice transmission over poor HF channels.

Keywords: High frequency, modem, STANAG 4539.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 5309
6977 Detection Efficient Enterprises via Data Envelopment Analysis

Authors: S. Turkan

Abstract:

In this paper, the Turkey’s Top 500 Industrial Enterprises data in 2014 were analyzed by data envelopment analysis. Data envelopment analysis is used to detect efficient decision-making units such as universities, hospitals, schools etc. by using inputs and outputs. The decision-making units in this study are enterprises. To detect efficient enterprises, some financial ratios are determined as inputs and outputs. For this reason, financial indicators related to productivity of enterprises are considered. The efficient foreign weighted owned capital enterprises are detected via super efficiency model. According to the results, it is said that Mercedes-Benz is the most efficient foreign weighted owned capital enterprise in Turkey.

Keywords: Data envelopment analysis, super efficiency, financial ratios, BCC model.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 859
6976 Fusion of ETM+ Multispectral and Panchromatic Texture for Remote Sensing Classification

Authors: Mahesh Pal

Abstract:

This paper proposes to use ETM+ multispectral data and panchromatic band as well as texture features derived from the panchromatic band for land cover classification. Four texture features including one 'internal texture' and three GLCM based textures namely correlation, entropy, and inverse different moment were used in combination with ETM+ multispectral data. Two data sets involving combination of multispectral, panchromatic band and its texture were used and results were compared with those obtained by using multispectral data alone. A decision tree classifier with and without boosting were used to classify different datasets. Results from this study suggest that the dataset consisting of panchromatic band, four of its texture features and multispectral data was able to increase the classification accuracy by about 2%. In comparison, a boosted decision tree was able to increase the classification accuracy by about 3% with the same dataset.

Keywords: Internal texture; GLCM; decision tree; boosting; classification accuracy.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1716
6975 A Formal Approach for Instructional Design Integrated with Data Visualization for Learning Analytics

Authors: Douglas A. Menezes, Isabel D. Nunes, Ulrich Schiel

Abstract:

Most Virtual Learning Environments do not provide support mechanisms for the integrated planning, construction and follow-up of Instructional Design supported by Learning Analytic results. The present work aims to present an authoring tool that will be responsible for constructing the structure of an Instructional Design (ID), without the data being altered during the execution of the course. The visual interface aims to present the critical situations present in this ID, serving as a support tool for the course follow-up and possible improvements, which can be made during its execution or in the planning of a new edition of this course. The model for the ID is based on High-Level Petri Nets and the visualization forms are determined by the specific kind of the data generated by an e-course, a population of students generating sequentially dependent data.

Keywords: Educational data visualization, high-level petri nets, instructional design, learning analytics.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 828
6974 Visual Text Analytics Technologies for Real-Time Big Data: Chronological Evolution and Issues

Authors: Siti Azrina B. A. Aziz, Siti Hafizah A. Hamid

Abstract:

New approaches to analyze and visualize data stream in real-time basis is important in making a prompt decision by the decision maker. Financial market trading and surveillance, large-scale emergency response and crowd control are some example scenarios that require real-time analytic and data visualization. This situation has led to the development of techniques and tools that support humans in analyzing the source data. With the emergence of Big Data and social media, new techniques and tools are required in order to process the streaming data. Today, ranges of tools which implement some of these functionalities are available. In this paper, we present chronological evolution evaluation of technologies for supporting of real-time analytic and visualization of the data stream. Based on the past research papers published from 2002 to 2014, we gathered the general information, main techniques, challenges and open issues. The techniques for streaming text visualization are identified based on Text Visualization Browser in chronological order. This paper aims to review the evolution of streaming text visualization techniques and tools, as well as to discuss the problems and challenges for each of identified tools.

Keywords: Information visualization, visual analytics, text mining, visual text analytics tools, big data visualization.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 977
6973 Churn Prediction for Telecommunication Industry Using Artificial Neural Networks

Authors: Ulas Vural, M. Ergun Okay, E. Mesut Yildiz

Abstract:

Telecommunication service providers demand accurate and precise prediction of customer churn probabilities to increase the effectiveness of their customer relation services. The large amount of customer data owned by the service providers is suitable for analysis by machine learning methods. In this study, expenditure data of customers are analyzed by using an artificial neural network (ANN). The ANN model is applied to the data of customers with different billing duration. The proposed model successfully predicts the churn probabilities at 83% accuracy for only three months expenditure data and the prediction accuracy increases up to 89% when the nine month data is used. The experiments also show that the accuracy of ANN model increases on an extended feature set with information of the changes on the bill amounts.

Keywords: Customer relationship management, churn prediction, telecom industry, deep learning, Artificial Neural Networks, ANN.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 726
6972 A Technical Perspective on Roadway Safety in Eastern Province: Data Evaluation and Spatial Analysis

Authors: Muhammad Farhan, Sayed Faruque, Amr Mohammed, Sami Osman, Omar Al-Jabari, Abdul Almojil

Abstract:

Saudi Arabia in recent years has seen drastic increase in traffic related crashes. With population of over 29 million, Saudi Arabia is considered as a fast growing and emerging economy. The rapid population increase and economic growth has resulted in rapid expansion of transportation infrastructure, which has led to increase in road crashes. Saudi Ministry of Interior reported more than 7,000 people killed and 68,000 injured in 2011 ranking Saudi Arabia to be one of the worst worldwide in traffic safety. The traffic safety issues in the country also result in distress to road users and cause and economic loss exceeding 3.7 billion Euros annually. Keeping this in view, the researchers in Saudi Arabia are investigating ways to improve traffic safety conditions in the country. This paper presents a multilevel approach to collect traffic safety related data required to do traffic safety studies in the region. Two highway corridors including King Fahd Highway 39 kilometre and Gulf Cooperation Council Highway 42 kilometre long connecting the cities of Dammam and Khobar were selected as a study area. Traffic data collected included traffic counts, crash data, travel time data, and speed data. The collected data was analysed using geographic information system to evaluate any correlation. Further research is needed to investigate the effectiveness of traffic safety related data when collected in a concerted effort.

Keywords: Crash Data, Data Collection, Traffic Safety.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2330
6971 Machine Scoring Model Using Data Mining Techniques

Authors: Wimalin S. Laosiritaworn, Pongsak Holimchayachotikul

Abstract:

this article proposed a methodology for computer numerical control (CNC) machine scoring. The case study company is a manufacturer of hard disk drive parts in Thailand. In this company, sample of parts manufactured from CNC machine are usually taken randomly for quality inspection. These inspection data were used to make a decision to shut down the machine if it has tendency to produce parts that are out of specification. Large amount of data are produced in this process and data mining could be very useful technique in analyzing them. In this research, data mining techniques were used to construct a machine scoring model called 'machine priority assessment model (MPAM)'. This model helps to ensure that the machine with higher risk of producing defective parts be inspected before those with lower risk. If the defective prone machine is identified sooner, defective part and rework could be reduced hence improving the overall productivity. The results showed that the proposed method can be successfully implemented and approximately 351,000 baht of opportunity cost could have saved in the case study company.

Keywords: Computer Numerical Control, Data Mining, HardDisk Drive.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1376
6970 School-Based Intervention for Academic Achievement: Targeting Cognitive, Motivational and Affective Factors

Authors: Joan Antony

Abstract:

Outcome in any learning process should target three goals – propelling the underachiever’s engagement in the learning process, enhancing the drive to achieve, and modifying attitudes and beliefs in his/her capabilities. An intervention study with a three-pronged approach incorporating self-regulatory training targeting three categories of strategies – cognitive, metacognitive and motivational – was designed adopting the before and after control-experimental group design. The evaluation of the training process was based on pre- and post-intervention measures obtained through three indices of measurement – academic scores based on grades on school examinations and comprehension tests, affective variables scores and level of strategy use obtained through responses on scales and questionnaires, and content analysis of subjective responses to open-ended probes. The evaluation relied on three sources – student, teacher and parent. The t-test results for the experimental and control groups on the pre- and post-intervention measurements indicate a significant increase on comprehension tasks for the experimental group. Though statistically significant difference was not found on the school examination scores for the experimental group, there was considerable decline in performance for the control group. Analysis of covariance (ANCOVA) was applied on the scores obtained on affective variables, namely, self-esteem, personal achievement goals, personal ego goals, personal task goals, and locus of control. The experimental group showed increase in personal achievement goals and personal ego goals as compared to the control group. Responses given by the experimental group to the open-ended probes on causal attributions indicated a considerable shift from external to internal causes when moving from the pre- to post-intervention stage. ANCOVA results revealed significantly higher use of learning strategies inclusive of mental learning strategies, behavioral learning strategies, self-regulatory strategies, and an improvement in study orientation encompassing study habits and study attitudes among the experimental group students. Parents and teachers reported significant progressive transformation towards constructive engagement with study material and self-imposed regulation. The implications of this study are three-fold: firstly, strategies training (cognitive, metacognitive and motivational) should be embedded into daily classroom routine; secondly, scaffolding by teachers through activities based on curriculum will eventually enable students to rely more on their own judgements of effective strategy use; thirdly, enhanced confidence will radiate to the affective aspects with enduring effects on other domains of life as well. The cyclic nature of the interaction between utilizing one’s resources, managing effort and regulating emotions forms the foundation for academic achievement.

Keywords: Academic achievement, cognitive strategies, metacognitive strategies, motivational strategies.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 433
6969 The Impact of Seasonality on Rainfall Patterns: A Case Study

Authors: Priti Kaushik, Randhir Singh Baghel, Somil Khandelwal

Abstract:

This study uses whole-year data from Rajasthan, India, at the meteorological divisional level to analyze and evaluate long-term spatiotemporal trends in rainfall and looked at the data from each of the thirteen tehsils in the Jaipur district to see how the rainfall pattern has altered over the last 10 years. Data on daily rainfall from the Indian Meteorological Department (IMD) in Jaipur are available for the years 2012 through 2021. We mainly focus on comparing data of tehsil wise in the Jaipur district, Rajasthan, India. Also analyzed is the fact that July and August always see higher rainfall than any other month. Rainfall usually starts to rise around week 25th and peaks in weeks 32nd or 33rd. They showed that on several occasions, 2017 saw the least amount of rainfall during a long span of 10 years. The greatest rain fell between 2012 and 2021 in 2013, 2019, and 2020.

Keywords: Data analysis, extreme events, rainfall, descriptive case studies, precipitation temperature.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 145
6968 Enhance the Power of Sentiment Analysis

Authors: Yu Zhang, Pedro Desouza

Abstract:

Since big data has become substantially more accessible and manageable due to the development of powerful tools for dealing with unstructured data, people are eager to mine information from social media resources that could not be handled in the past. Sentiment analysis, as a novel branch of text mining, has in the last decade become increasingly important in marketing analysis, customer risk prediction and other fields. Scientists and researchers have undertaken significant work in creating and improving their sentiment models. In this paper, we present a concept of selecting appropriate classifiers based on the features and qualities of data sources by comparing the performances of five classifiers with three popular social media data sources: Twitter, Amazon Customer Reviews, and Movie Reviews. We introduced a couple of innovative models that outperform traditional sentiment classifiers for these data sources, and provide insights on how to further improve the predictive power of sentiment analysis. The modeling and testing work was done in R and Greenplum in-database analytic tools.

Keywords: Sentiment Analysis, Social Media, Twitter, Amazon, Data Mining, Machine Learning, Text Mining.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3492
6967 Lineup Optimization Model of Basketball Players Based on the Prediction of Recursive Neural Networks

Authors: Wang Yichen, Haruka Yamashita

Abstract:

In recent years, in the field of sports, decision making such as member in the game and strategy of the game based on then analysis of the accumulated sports data are widely attempted. In fact, in the NBA basketball league where the world's highest level players gather, to win the games, teams analyze the data using various statistical techniques. However, it is difficult to analyze the game data for each play such as the ball tracking or motion of the players in the game, because the situation of the game changes rapidly, and the structure of the data should be complicated. Therefore, it is considered that the analysis method for real time game play data is proposed. In this research, we propose an analytical model for "determining the optimal lineup composition" using the real time play data, which is considered to be difficult for all coaches. In this study, because replacing the entire lineup is too complicated, and the actual question for the replacement of players is "whether or not the lineup should be changed", and “whether or not Small Ball lineup is adopted”. Therefore, we propose an analytical model for the optimal player selection problem based on Small Ball lineups. In basketball, we can accumulate scoring data for each play, which indicates a player's contribution to the game, and the scoring data can be considered as a time series data. In order to compare the importance of players in different situations and lineups, we combine RNN (Recurrent Neural Network) model, which can analyze time series data, and NN (Neural Network) model, which can analyze the situation on the field, to build the prediction model of score. This model is capable to identify the current optimal lineup for different situations. In this research, we collected all the data of accumulated data of NBA from 2019-2020. Then we apply the method to the actual basketball play data to verify the reliability of the proposed model.

Keywords: Recurrent Neural Network, players lineup, basketball data, decision making model.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 780
6966 New Multisensor Data Fusion Method Based on Probabilistic Grids Representation

Authors: Zhichao Zhao, Yi Liu, Shunping Xiao

Abstract:

A new data fusion method called joint probability density matrix (JPDM) is proposed, which can associate and fuse measurements from spatially distributed heterogeneous sensors to identify the real target in a surveillance region. Using the probabilistic grids representation, we numerically combine the uncertainty regions of all the measurements in a general framework. The NP-hard multisensor data fusion problem has been converted to a peak picking problem in the grids map. Unlike most of the existing data fusion method, the JPDM method dose not need association processing, and will not lead to combinatorial explosion. Its convergence to the CRLB with a diminishing grid size has been proved. Simulation results are presented to illustrate the effectiveness of the proposed technique.

Keywords: Cramer-Rao lower bound (CRLB), data fusion, probabilistic grids, joint probability density matrix, localization, sensor network.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1779
6965 Sampled-Data Model Predictive Tracking Control for Mobile Robot

Authors: Wookyong Kwon, Sangmoon Lee

Abstract:

In this paper, a sampled-data model predictive tracking control method is presented for mobile robots which is modeled as constrained continuous-time linear parameter varying (LPV) systems. The presented sampled-data predictive controller is designed by linear matrix inequality approach. Based on the input delay approach, a controller design condition is derived by constructing a new Lyapunov function. Finally, a numerical example is given to demonstrate the effectiveness of the presented method.

Keywords: Model predictive control, sampled-data control, linear parameter varying systems, LPV.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1243
6964 Talent Management through Integration of Talent Value Chain and Human Capital Analytics Approaches

Authors: Wuttigrai Ngamsirijit

Abstract:

Talent management in today’s modern organizations has become data-driven due to a demand for objective human resource decision making and development of analytics technologies. HR managers have been faced with some obstacles in exploiting data and information to obtain their effective talent management decisions. These include process-based data and records; insufficient human capital-related measures and metrics; lack of capabilities in data modeling in strategic manners; and, time consuming to add up numbers and make decisions. This paper proposes a framework of talent management through integration of talent value chain and human capital analytics approaches. It encompasses key data, measures, and metrics regarding strategic talent management decisions along the organizational and talent value chain. Moreover, specific predictive and prescriptive models incorporating these data and information are recommended to help managers in understanding the state of talent, gaps in managing talent and the organization, and the ways to develop optimized talent strategies.    

Keywords: Decision making, human capital analytics, talent management, talent value chain.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 930
6963 Enabling Factors towards Safety Improvement for Industrialised Building System (IBS)

Authors: Nasyairi Mat Nasir, Zulhabri Ismail, Faridah Ismail, Sharifah Nur Aina Syed Alwee, Masnizan Che Mat

Abstract:

The utilisation of Industrial Building System (IBS) in construction industry will lead to a safe site condition since minimum numbers of workers are required to be on-site, timely material delivery, systematic component storage, reduction of construction material and waste. These matters are being promoted in the Construction Industry Master Plan (CIMP 2006-2015). However, the enabling factors of IBS that will foster a safer working environment are indefinite; on that basis a research has been conducted. The purpose of this paper is to discuss and identify the relevant factors towards safety improvement for IBS. A quantitative research by way of questionnaire surveys have been conducted to 314 construction companies. The target group was Grade 5 to Grade 7 contractors registered with Construction Industry Development Board (CIDB) which specialise in IBS. The findings disclosed seven factors linked to the safety improvement of IBS construction site in Malaysia. The factors were historical, economic, psychological, technical, procedural, organisational and the environmental factors. From the findings, a psychological factor ranked as the highest and most crucial factor contributing to safer IBS construction site. The psychological factor included the self-awareness and influences from workmates behaviour. Followed by organisational factors, where project management style will encourage the safety efforts. From the procedural factors, it was also found that training was one of the significant factors to improve safety culture of IBS construction site. Another important finding that formed as a part of the environmental factor was storage of IBS components, in which proper planning of the layout would able to contribute to a safer site condition. To conclude, in order to improve safety of IBS construction site, a welltrained and skilled workers are required for IBS projects, thus proper training is permissible and should be emphasised.

Keywords: Enabling Factors, Industrialised Building System, Safety Improvement.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2900
6962 Enhancing K-Means Algorithm with Initial Cluster Centers Derived from Data Partitioning along the Data Axis with the Highest Variance

Authors: S. Deelers, S. Auwatanamongkol

Abstract:

In this paper, we propose an algorithm to compute initial cluster centers for K-means clustering. Data in a cell is partitioned using a cutting plane that divides cell in two smaller cells. The plane is perpendicular to the data axis with the highest variance and is designed to reduce the sum squared errors of the two cells as much as possible, while at the same time keep the two cells far apart as possible. Cells are partitioned one at a time until the number of cells equals to the predefined number of clusters, K. The centers of the K cells become the initial cluster centers for K-means. The experimental results suggest that the proposed algorithm is effective, converge to better clustering results than those of the random initialization method. The research also indicated the proposed algorithm would greatly improve the likelihood of every cluster containing some data in it.

Keywords: Clustering algorithm, K-means algorithm, Datapartitioning, Initial cluster centers.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2842
6961 Methodology of the Turkey’s National Geographic Information System Integration Project

Authors: Buse A. Ataç, Doğan K. Cenan, Arda Çetinkaya, Naz D. Şahin, Köksal Sanlı, Zeynep Koç, Akın Kısa

Abstract:

With its spatial data reliability, interpretation and questioning capabilities, Geographical Information Systems make significant contributions to scientists, planners and practitioners. Geographic information systems have received great attention in today's digital world, growing rapidly, and increasing the efficiency of use. Access to and use of current and accurate geographical data, which are the most important components of the Geographical Information System, has become a necessity rather than a need for sustainable and economic development. This project aims to enable sharing of data collected by public institutions and organizations on a web-based platform. Within the scope of the project, INSPIRE (Infrastructure for Spatial Information in the European Community) data specifications are considered as a road-map. In this context, Turkey's National Geographic Information System (TUCBS) Integration Project supports sharing spatial data within 61 pilot public institutions as complied with defined national standards. In this paper, which is prepared by the project team members in the TUCBS Integration Project, the technical process with a detailed methodology is explained. In this context, the main technical processes of the Project consist of Geographic Data Analysis, Geographic Data Harmonization (Standardization), Web Service Creation (WMS, WFS) and Metadata Creation-Publication. In this paper, the integration process carried out to provide the data produced by 61 institutions to be shared from the National Geographic Data Portal (GEOPORTAL), have been trying to be conveyed with a detailed methodology.

Keywords: Data specification, geoportal, GIS, INSPIRE, TUCBS, Turkey’s National Geographic Information System.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 659
6960 Exploring SSD Suitable Allocation Schemes Incompliance with Workload Patterns

Authors: Jae Young Park, Hwansu Jung, Jong Tae Kim

Abstract:

In the Solid-State-Drive (SSD) performance, whether the data has been well parallelized is an important factor. SSD parallelization is affected by allocation scheme and it is directly connected to SSD performance. There are dynamic allocation and static allocation in representative allocation schemes. Dynamic allocation is more adaptive in exploiting write operation parallelism, while static allocation is better in read operation parallelism. Therefore, it is hard to select the appropriate allocation scheme when the workload is mixed read and write operations. We simulated conditions on a few mixed data patterns and analyzed the results to help the right choice for better performance. As the results, if data arrival interval is long enough prior operations to be finished and continuous read intensive data environment static allocation is more suitable. Dynamic allocation performs the best on write performance and random data patterns.

Keywords: Dynamic allocation, NAND Flash based SSD, SSD parallelism, static allocation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1969
6959 WebAppShield: An Approach Exploiting Machine Learning to Detect SQLi Attacks in an Application Layer in Run-Time

Authors: Ahmed Abdulla Ashlam, Atta Badii, Frederic Stahl

Abstract:

In recent years, SQL injection attacks have been identified as being prevalent against web applications. They affect network security and user data, which leads to a considerable loss of money and data every year. This paper presents the use of classification algorithms in machine learning using a method to classify the login data filtering inputs into "SQLi" or "Non-SQLi,” thus increasing the reliability and accuracy of results in terms of deciding whether an operation is an attack or a valid operation. A method as a Web-App is developed for auto-generated data replication to provide a twin of the targeted data structure. Shielding against SQLi attacks (WebAppShield) that verifies all users and prevents attackers (SQLi attacks) from entering and or accessing the database, which the machine learning module predicts as "Non-SQLi", has been developed. A special login form has been developed with a special instance of the data validation; this verification process secures the web application from its early stages. The system has been tested and validated, and up to 99% of SQLi attacks have been prevented.

Keywords: SQL injection, attacks, web application, accuracy, database, WebAppShield.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 394
6958 Adaptive Kernel Principal Analysis for Online Feature Extraction

Authors: Mingtao Ding, Zheng Tian, Haixia Xu

Abstract:

The batch nature limits the standard kernel principal component analysis (KPCA) methods in numerous applications, especially for dynamic or large-scale data. In this paper, an efficient adaptive approach is presented for online extraction of the kernel principal components (KPC). The contribution of this paper may be divided into two parts. First, kernel covariance matrix is correctly updated to adapt to the changing characteristics of data. Second, KPC are recursively formulated to overcome the batch nature of standard KPCA.This formulation is derived from the recursive eigen-decomposition of kernel covariance matrix and indicates the KPC variation caused by the new data. The proposed method not only alleviates sub-optimality of the KPCA method for non-stationary data, but also maintains constant update speed and memory usage as the data-size increases. Experiments for simulation data and real applications demonstrate that our approach yields improvements in terms of both computational speed and approximation accuracy.

Keywords: adaptive method, kernel principal component analysis, online extraction, recursive algorithm

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1530
6957 Proposing an Efficient Method for Frequent Pattern Mining

Authors: Vaibhav Kant Singh, Vijay Shah, Yogendra Kumar Jain, Anupam Shukla, A.S. Thoke, Vinay KumarSingh, Chhaya Dule, Vivek Parganiha

Abstract:

Data mining, which is the exploration of knowledge from the large set of data, generated as a result of the various data processing activities. Frequent Pattern Mining is a very important task in data mining. The previous approaches applied to generate frequent set generally adopt candidate generation and pruning techniques for the satisfaction of the desired objective. This paper shows how the different approaches achieve the objective of frequent mining along with the complexities required to perform the job. This paper will also look for hardware approach of cache coherence to improve efficiency of the above process. The process of data mining is helpful in generation of support systems that can help in Management, Bioinformatics, Biotechnology, Medical Science, Statistics, Mathematics, Banking, Networking and other Computer related applications. This paper proposes the use of both upward and downward closure property for the extraction of frequent item sets which reduces the total number of scans required for the generation of Candidate Sets.

Keywords: Data Mining, Candidate Sets, Frequent Item set, Pruning.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1661
6956 Danger Theory and Intelligent Data Processing

Authors: Anjum Iqbal, Mohd Aizaini Maarof

Abstract:

Artificial Immune System (AIS) is relatively naive paradigm for intelligent computations. The inspiration for AIS is derived from natural Immune System (IS). Classically it is believed that IS strives to discriminate between self and non-self. Most of the existing AIS research is based on this approach. Danger Theory (DT) argues this approach and proposes that IS fights against danger producing elements and tolerates others. We, the computational researchers, are not concerned with the arguments among immunologists but try to extract from it novel abstractions for intelligent computation. This paper aims to follow DT inspiration for intelligent data processing. The approach may introduce new avenue in intelligent processing. The data used is system calls data that is potentially significant in intrusion detection applications.

Keywords: artificial immune system, danger theory, intelligent processing, system calls

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1861
6955 Using Artificial Neural Network to Forecast Groundwater Depth in Union County Well

Authors: Zahra Ghadampour, Gholamreza Rakhshandehroo

Abstract:

A concern that researchers usually face in different applications of Artificial Neural Network (ANN) is determination of the size of effective domain in time series. In this paper, trial and error method was used on groundwater depth time series to determine the size of effective domain in the series in an observation well in Union County, New Jersey, U.S. different domains of 20, 40, 60, 80, 100, and 120 preceding day were examined and the 80 days was considered as effective length of the domain. Data sets in different domains were fed to a Feed Forward Back Propagation ANN with one hidden layer and the groundwater depths were forecasted. Root Mean Square Error (RMSE) and the correlation factor (R2) of estimated and observed groundwater depths for all domains were determined. In general, groundwater depth forecast improved, as evidenced by lower RMSEs and higher R2s, when the domain length increased from 20 to 120. However, 80 days was selected as the effective domain because the improvement was less than 1% beyond that. Forecasted ground water depths utilizing measured daily data (set #1) and data averaged over the effective domain (set #2) were compared. It was postulated that more accurate nature of measured daily data was the reason for a better forecast with lower RMSE (0.1027 m compared to 0.255 m) in set #1. However, the size of input data in this set was 80 times the size of input data in set #2; a factor that may increase the computational effort unpredictably. It was concluded that 80 daily data may be successfully utilized to lower the size of input data sets considerably, while maintaining the effective information in the data set.

Keywords: Neural networks, groundwater depth, forecast.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2498
6954 Reasons for Doing Job outside Household and Difficulties Faced by the Working Women of Bangladesh

Authors: Md. Sayeed Akhter, Md. Akhtar Hossain Mazumder, Syeda Afreena Mamun

Abstract:

Bangladesh is a patriarchal and male dominated country. Traditional, cultural, social, and religious values and practices have reinforced the lower status of women accorded to them in society and have limited their opportunities for education, technical and vocational training, and involvement with earning activities outside their households. After independence numbers of women are doing job outside their households. This study attempts to find out the reasons of engaging in earning activities outside households and difficulties faced by upper and lower class working women in Bangladesh. To explore the objectives and research questions of the study descriptive techniques had been used. Survey was conducted among the women who were working in Rajshahi city of Bangladesh and face-to-face interviews were conducted to collect data. Findings of the study illustrates that most of the upper class working women engaged into job because they wanted to utilized their education and to bring solvency in the family, and they spend their income for meeting the needs of all the members of the family. On the other hand, most of the lower class working women involved into earning activities outside their households because they want to bring solvency in their families and spend their income on household expenditure. Both classes became tensed for their children because they had to stay at their working place for long time. Therefore, day care center should be established besides their working place for their children.

Keywords: Working Women, Reasons for Doing Jobs, Working Environment, Difficulties Faced.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1772