Search results for: data infrastructure
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 25643

Search results for: data infrastructure

23783 A Novel Heuristic for Analysis of Large Datasets by Selecting Wrapper-Based Features

Authors: Bushra Zafar, Usman Qamar

Abstract:

Large data sample size and dimensions render the effectiveness of conventional data mining methodologies. A data mining technique are important tools for collection of knowledgeable information from variety of databases and provides supervised learning in the form of classification to design models to describe vital data classes while structure of the classifier is based on class attribute. Classification efficiency and accuracy are often influenced to great extent by noisy and undesirable features in real application data sets. The inherent natures of data set greatly masks its quality analysis and leave us with quite few practical approaches to use. To our knowledge first time, we present a new approach for investigation of structure and quality of datasets by providing a targeted analysis of localization of noisy and irrelevant features of data sets. Machine learning is based primarily on feature selection as pre-processing step which offers us to select few features from number of features as a subset by reducing the space according to certain evaluation criterion. The primary objective of this study is to trim down the scope of the given data sample by searching a small set of important features which may results into good classification performance. For this purpose, a heuristic for wrapper-based feature selection using genetic algorithm and for discriminative feature selection an external classifier are used. Selection of feature based on its number of occurrence in the chosen chromosomes. Sample dataset has been used to demonstrate proposed idea effectively. A proposed method has improved average accuracy of different datasets is about 95%. Experimental results illustrate that proposed algorithm increases the accuracy of prediction of different diseases.

Keywords: data mining, generic algorithm, KNN algorithms, wrapper based feature selection

Procedia PDF Downloads 304
23782 Improve Student Performance Prediction Using Majority Vote Ensemble Model for Higher Education

Authors: Wade Ghribi, Abdelmoty M. Ahmed, Ahmed Said Badawy, Belgacem Bouallegue

Abstract:

In higher education institutions, the most pressing priority is to improve student performance and retention. Large volumes of student data are used in Educational Data Mining techniques to find new hidden information from students' learning behavior, particularly to uncover the early symptom of at-risk pupils. On the other hand, data with noise, outliers, and irrelevant information may provide incorrect conclusions. By identifying features of students' data that have the potential to improve performance prediction results, comparing and identifying the most appropriate ensemble learning technique after preprocessing the data, and optimizing the hyperparameters, this paper aims to develop a reliable students' performance prediction model for Higher Education Institutions. Data was gathered from two different systems: a student information system and an e-learning system for undergraduate students in the College of Computer Science of a Saudi Arabian State University. The cases of 4413 students were used in this article. The process includes data collection, data integration, data preprocessing (such as cleaning, normalization, and transformation), feature selection, pattern extraction, and, finally, model optimization and assessment. Random Forest, Bagging, Stacking, Majority Vote, and two types of Boosting techniques, AdaBoost and XGBoost, are ensemble learning approaches, whereas Decision Tree, Support Vector Machine, and Artificial Neural Network are supervised learning techniques. Hyperparameters for ensemble learning systems will be fine-tuned to provide enhanced performance and optimal output. The findings imply that combining features of students' behavior from e-learning and students' information systems using Majority Vote produced better outcomes than the other ensemble techniques.

Keywords: educational data mining, student performance prediction, e-learning, classification, ensemble learning, higher education

Procedia PDF Downloads 92
23781 Foundation of the Information Model for Connected-Cars

Authors: Hae-Won Seo, Yong-Gu Lee

Abstract:

Recent progress in the next generation of automobile technology is geared towards incorporating information technology into cars. Collectively called smart cars are bringing intelligence to cars that provides comfort, convenience and safety. A branch of smart cars is connected-car system. The key concept in connected-cars is the sharing of driving information among cars through decentralized manner enabling collective intelligence. This paper proposes a foundation of the information model that is necessary to define the driving information for smart-cars. Road conditions are modeled through a unique data structure that unambiguously represent the time variant traffics in the streets. Additionally, the modeled data structure is exemplified in a navigational scenario and usage using UML. Optimal driving route searching is also discussed using the proposed data structure in a dynamically changing road conditions.

Keywords: connected-car, data modeling, route planning, navigation system

Procedia PDF Downloads 360
23780 A Supervised Learning Data Mining Approach for Object Recognition and Classification in High Resolution Satellite Data

Authors: Mais Nijim, Rama Devi Chennuboyina, Waseem Al Aqqad

Abstract:

Advances in spatial and spectral resolution of satellite images have led to tremendous growth in large image databases. The data we acquire through satellites, radars and sensors consists of important geographical information that can be used for remote sensing applications such as region planning, disaster management. Spatial data classification and object recognition are important tasks for many applications. However, classifying objects and identifying them manually from images is a difficult task. Object recognition is often considered as a classification problem, this task can be performed using machine-learning techniques. Despite of many machine-learning algorithms, the classification is done using supervised classifiers such as Support Vector Machines (SVM) as the area of interest is known. We proposed a classification method, which considers neighboring pixels in a region for feature extraction and it evaluates classifications precisely according to neighboring classes for semantic interpretation of region of interest (ROI). A dataset has been created for training and testing purpose; we generated the attributes by considering pixel intensity values and mean values of reflectance. We demonstrated the benefits of using knowledge discovery and data-mining techniques, which can be on image data for accurate information extraction and classification from high spatial resolution remote sensing imagery.

Keywords: remote sensing, object recognition, classification, data mining, waterbody identification, feature extraction

Procedia PDF Downloads 323
23779 Automated Multisensory Data Collection System for Continuous Monitoring of Refrigerating Appliances Recycling Plants

Authors: Georgii Emelianov, Mikhail Polikarpov, Fabian Hübner, Jochen Deuse, Jochen Schiemann

Abstract:

Recycling refrigerating appliances plays a major role in protecting the Earth's atmosphere from ozone depletion and emissions of greenhouse gases. The performance of refrigerator recycling plants in terms of material retention is the subject of strict environmental certifications and is reviewed periodically through specialized audits. The continuous collection of Refrigerator data required for the input-output analysis is still mostly manual, error-prone, and not digitalized. In this paper, we propose an automated data collection system for recycling plants in order to deduce expected material contents in individual end-of-life refrigerating appliances. The system utilizes laser scanner measurements and optical data to extract attributes of individual refrigerators by applying transfer learning with pre-trained vision models and optical character recognition. Based on Recognized features, the system automatically provides material categories and target values of contained material masses, especially foaming and cooling agents. The presented data collection system paves the way for continuous performance monitoring and efficient control of refrigerator recycling plants.

Keywords: automation, data collection, performance monitoring, recycling, refrigerators

Procedia PDF Downloads 148
23778 Sales Patterns Clustering Analysis on Seasonal Product Sales Data

Authors: Soojin Kim, Jiwon Yang, Sungzoon Cho

Abstract:

As a seasonal product is only in demand for a short time, inventory management is critical to profits. Both markdowns and stockouts decrease the return on perishable products; therefore, researchers have been interested in the distribution of seasonal products with the aim of maximizing profits. In this study, we propose a data-driven seasonal product sales pattern analysis method for individual retail outlets based on observed sales data clustering; the proposed method helps in determining distribution strategies.

Keywords: clustering, distribution, sales pattern, seasonal product

Procedia PDF Downloads 580
23777 New Requirements of the Fifth Dimension of War: Planning of Cyber Operation Capabilities

Authors: Mehmet Kargaci

Abstract:

Transformation of technology and strategy has been the main factor for the evolution of war. In addition to land, maritime, air and space domains, cyberspace has become the fifth domain with emerge of internet. The current security environment has become more complex and uncertain than ever before. Moreover, warfare has evaluated from conventional to irregular, asymmetric and hybrid war. Weak actors such as terrorist organizations and non-state actors has increasingly conducted cyber-attacks against strong adversaries. Besides, states has developed cyber capabilities in order to defense critical infrastructure regarding the cyber threats. Cyber warfare will be key in future security environment. Although what to do has been placed in operational plans, how to do has lacked and ignored as to cyber defense and attack. The purpose of the article is to put forward a model for how to conduct cyber capabilities in a conventional war. First, cyber operations capabilities will be discussed. Second put forward the necessities of cyberspace environment and develop a model for how to plan an operation using cyber operation capabilities, finally the assessment of the applicability of cyber operation capabilities and offers will be presented.

Keywords: cyber war, cyber threats, cyber operation capabilities, operation planning

Procedia PDF Downloads 321
23776 Artificial Neural Network Based Model for Detecting Attacks in Smart Grid Cloud

Authors: Sandeep Mehmi, Harsh Verma, A. L. Sangal

Abstract:

Ever since the idea of using computing services as commodity that can be delivered like other utilities e.g. electric and telephone has been floated, the scientific fraternity has diverted their research towards a new area called utility computing. New paradigms like cluster computing and grid computing came into existence while edging closer to utility computing. With the advent of internet the demand of anytime, anywhere access of the resources that could be provisioned dynamically as a service, gave rise to the next generation computing paradigm known as cloud computing. Today, cloud computing has become one of the most aggressively growing computer paradigm, resulting in growing rate of applications in area of IT outsourcing. Besides catering the computational and storage demands, cloud computing has economically benefitted almost all the fields, education, research, entertainment, medical, banking, military operations, weather forecasting, business and finance to name a few. Smart grid is another discipline that direly needs to be benefitted from the cloud computing advantages. Smart grid system is a new technology that has revolutionized the power sector by automating the transmission and distribution system and integration of smart devices. Cloud based smart grid can fulfill the storage requirement of unstructured and uncorrelated data generated by smart sensors as well as computational needs for self-healing, load balancing and demand response features. But, security issues such as confidentiality, integrity, availability, accountability and privacy need to be resolved for the development of smart grid cloud. In recent years, a number of intrusion prevention techniques have been proposed in the cloud, but hackers/intruders still manage to bypass the security of the cloud. Therefore, precise intrusion detection systems need to be developed in order to secure the critical information infrastructure like smart grid cloud. Considering the success of artificial neural networks in building robust intrusion detection, this research proposes an artificial neural network based model for detecting attacks in smart grid cloud.

Keywords: artificial neural networks, cloud computing, intrusion detection systems, security issues, smart grid

Procedia PDF Downloads 304
23775 The Spatial Potential of the Croatian Adriatic Area for the Development of an Indigenous Form of Cruising Tourism - Mini Croatian Cruiser

Authors: Srećko Favro, Dora Mužinić

Abstract:

The eastern coast of the Adriatic Sea has been a significant part of the most important traffic corridors since Antiquity due to its position as the deepest indented bay of the Mediterranean and numerous bays on the coast and is-lands. The central place throughout history was occupied by the central part - Split-Dalmatia County, with its center in Antica in Salona and later in Split. Nowadays, in addition to its traffic and economic importance, this area is also important for tourism, an area where Croatia develops its economy and realizing its economic growth. Nautical tourism is the most important form of the tourist economic sector that uses the geographical features of the Croatian Adriatic water area and achieves the greatest growth based on tour-ist trends in the world (coronavirus - separation from the masses, adventure tourism - own arrangements) and thus opens up the possibility of develop-ment for other parts of the tourist economy. This will be described in the ex-ample of the business of the Split-Dalmatia County shipping company from Krilo Jesenice, which operates as a mini-cruising service provider, the lead-ing form of cruising in Croatia. The advantages that this type of tourism provides to travelers in terms of customized itineraries, high-quality services, an intimate atmosphere, and a unique experience through familiarization with local culture and tradition will be considered. Through direct primary research and analysis of available secondary research data, an attempt will be made to show how traditional Croatian mini cruisers manage to stand out in a competitive tourist environment. Their impact on the local economy, sus-tainability, and environmental protection will be considered, as well as how they are integrated into the tourist offer of other destinations in Croatia. In addition, the challenges and opportunities that arise in the maintenance and development of traditional Croatian mini cruisers will be discussed, includ-ing issues such as infrastructure, staff training, and market trends.

Keywords: croatia, adriatic, cruising, nautical tourism, mini cruise

Procedia PDF Downloads 53
23774 Probability Sampling in Matched Case-Control Study in Drug Abuse

Authors: Surya R. Niraula, Devendra B Chhetry, Girish K. Singh, S. Nagesh, Frederick A. Connell

Abstract:

Background: Although random sampling is generally considered to be the gold standard for population-based research, the majority of drug abuse research is based on non-random sampling despite the well-known limitations of this kind of sampling. Method: We compared the statistical properties of two surveys of drug abuse in the same community: one using snowball sampling of drug users who then identified “friend controls” and the other using a random sample of non-drug users (controls) who then identified “friend cases.” Models to predict drug abuse based on risk factors were developed for each data set using conditional logistic regression. We compared the precision of each model using bootstrapping method and the predictive properties of each model using receiver operating characteristics (ROC) curves. Results: Analysis of 100 random bootstrap samples drawn from the snowball-sample data set showed a wide variation in the standard errors of the beta coefficients of the predictive model, none of which achieved statistical significance. One the other hand, bootstrap analysis of the random-sample data set showed less variation, and did not change the significance of the predictors at the 5% level when compared to the non-bootstrap analysis. Comparison of the area under the ROC curves using the model derived from the random-sample data set was similar when fitted to either data set (0.93, for random-sample data vs. 0.91 for snowball-sample data, p=0.35); however, when the model derived from the snowball-sample data set was fitted to each of the data sets, the areas under the curve were significantly different (0.98 vs. 0.83, p < .001). Conclusion: The proposed method of random sampling of controls appears to be superior from a statistical perspective to snowball sampling and may represent a viable alternative to snowball sampling.

Keywords: drug abuse, matched case-control study, non-probability sampling, probability sampling

Procedia PDF Downloads 480
23773 Transport Hubs as Loci of Multi-Layer Ecosystems of Innovation: Case Study of Airports

Authors: Carolyn Hatch, Laurent Simon

Abstract:

Urban mobility and the transportation industry are undergoing a transformation, shifting from an auto production-consumption model that has dominated since the early 20th century towards new forms of personal and shared multi-modality [1]. This is shaped by key forces such as climate change, which has induced a shift in production and consumption patterns and efforts to decarbonize and improve transport services through, for instance, the integration of vehicle automation, electrification and mobility sharing [2]. Advanced innovation practices and platforms for experimentation and validation of new mobility products and services that are increasingly complex and multi-stakeholder-oriented are shaping this new world of mobility. Transportation hubs – such as airports - are emblematic of these disruptive forces playing out in the mobility industry. Airports are emerging as the core of innovation ecosystems on and around contemporary mobility issues, and increasingly recognized as complex public/private nodes operating in many societal dimensions [3,4]. These include urban development, sustainability transitions, digital experimentation, customer experience, infrastructure development and data exploitation (for instance, airports generate massive and often untapped data flows, with significant potential for use, commercialization and social benefit). Yet airport innovation practices have not been well documented in the innovation literature. This paper addresses this gap by proposing a model of airport innovation that aims to equip airport stakeholders to respond to these new and complex innovation needs in practice. The methodology involves: 1 – a literature review bringing together key research and theory on airport innovation management, open innovation and innovation ecosystems in order to evaluate airport practices through an innovation lens; 2 – an international benchmarking of leading airports and their innovation practices, including such examples as Aéroports de Paris, Schipol in Amsterdam, Changi in Singapore, and others; and 3 – semi-structured interviews with airport managers on key aspects of organizational practice, facilitated through a close partnership with the Airport Council International (ACI), a major stakeholder in this research project. Preliminary results find that the most successful airports are those that have shifted to a multi-stakeholder, platform ecosystem model of innovation. The recent entrance of new actors in airports (Google, Amazon, Accor, Vinci, Airbnb and others) have forced the opening of organizational boundaries to share and exchange knowledge with a broader set of ecosystem players. This has also led to new forms of governance and intermediation by airport actors to connect complex, highly distributed knowledge, along with new kinds of inter-organizational collaboration, co-creation and collective ideation processes. Leading airports in the case study have demonstrated a unique capacity to force traditionally siloed activities to “think together”, “explore together” and “act together”, to share data, contribute expertise and pioneer new governance approaches and collaborative practices. In so doing, they have successfully integrated these many disruptive change pathways and forced their implementation and coordination towards innovative mobility outcomes, with positive societal, environmental and economic impacts. This research has implications for: 1 - innovation theory, 2 - urban and transport policy, and 3 - organizational practice - within the mobility industry and across the economy.

Keywords: airport management, ecosystem, innovation, mobility, platform, transport hubs

Procedia PDF Downloads 168
23772 Impact of Node Density and Transmission Range on the Performance of OLSR and DSDV Routing Protocols in VANET City Scenarios

Authors: Yassine Meraihi, Dalila Acheli, Rabah Meraihi

Abstract:

Vehicular Ad hoc Network (VANET) is a special case of Mobile Ad hoc Network (MANET) used to establish communications and exchange information among nearby vehicles and between vehicles and nearby fixed infrastructure. VANET is seen as a promising technology used to provide safety, efficiency, assistance and comfort to the road users. Routing is an important issue in Vehicular Ad Hoc Network to find and maintain communication between vehicles due to the highly dynamic topology, frequently disconnected network and mobility constraints. This paper evaluates the performance of two most popular proactive routing protocols OLSR and DSDV in real city traffic scenario on the basis of three metrics namely Packet delivery ratio, throughput and average end to end delay by varying vehicles density and transmission range.

Keywords: DSDV, OLSR, quality of service, routing protocols, VANET

Procedia PDF Downloads 455
23771 Evaluating the Effectiveness of Science Teacher Training Programme in National Colleges of Education: a Preliminary Study, Perceptions of Prospective Teachers

Authors: A. S. V Polgampala, F. Huang

Abstract:

This is an overview of what is entailed in an evaluation and issues to be aware of when class observation is being done. This study examined the effects of evaluating teaching practice of a 7-day ‘block teaching’ session in a pre -service science teacher training program at a reputed National College of Education in Sri Lanka. Effects were assessed in three areas: evaluation of the training process, evaluation of the training impact, and evaluation of the training procedure. Data for this study were collected by class observation of 18 teachers during 9th February to 16th of 2017. Prospective teachers of science teaching, the participants of the study were evaluated based on newly introduced format by the NIE. The data collected was analyzed qualitatively using the Miles and Huberman procedure for analyzing qualitative data: data reduction, data display and conclusion drawing/verification. It was observed that the trainees showed their confidence in teaching those competencies and skills. Teacher educators’ dissatisfaction has been a great impact on evaluation process.

Keywords: evaluation, perceptions & perspectives, pre-service, science teachering

Procedia PDF Downloads 299
23770 Detecting Venomous Files in IDS Using an Approach Based on Data Mining Algorithm

Authors: Sukhleen Kaur

Abstract:

In security groundwork, Intrusion Detection System (IDS) has become an important component. The IDS has received increasing attention in recent years. IDS is one of the effective way to detect different kinds of attacks and malicious codes in a network and help us to secure the network. Data mining techniques can be implemented to IDS, which analyses the large amount of data and gives better results. Data mining can contribute to improving intrusion detection by adding a level of focus to anomaly detection. So far the study has been carried out on finding the attacks but this paper detects the malicious files. Some intruders do not attack directly, but they hide some harmful code inside the files or may corrupt those file and attack the system. These files are detected according to some defined parameters which will form two lists of files as normal files and harmful files. After that data mining will be performed. In this paper a hybrid classifier has been used via Naive Bayes and Ripper classification methods. The results show how the uploaded file in the database will be tested against the parameters and then it is characterised as either normal or harmful file and after that the mining is performed. Moreover, when a user tries to mine on harmful file it will generate an exception that mining cannot be made on corrupted or harmful files.

Keywords: data mining, association, classification, clustering, decision tree, intrusion detection system, misuse detection, anomaly detection, naive Bayes, ripper

Procedia PDF Downloads 401
23769 Generalized Approach to Linear Data Transformation

Authors: Abhijith Asok

Abstract:

This paper presents a generalized approach for the simple linear data transformation, Y=bX, through an integration of multidimensional coordinate geometry, vector space theory and polygonal geometry. The scaling is performed by adding an additional ’Dummy Dimension’ to the n-dimensional data, which helps plot two dimensional component-wise straight lines on pairs of dimensions. The end result is a set of scaled extensions of observations in any of the 2n spatial divisions, where n is the total number of applicable dimensions/dataset variables, created by shifting the n-dimensional plane along the ’Dummy Axis’. The derived scaling factor was found to be dependent on the coordinates of the common point of origin for diverging straight lines and the plane of extension, chosen on and perpendicular to the ’Dummy Axis’, respectively. This result indicates the geometrical interpretation of a linear data transformation and hence, opportunities for a more informed choice of the factor ’b’, based on a better choice of these coordinate values. The paper follows on to identify the effect of this transformation on certain popular distance metrics, wherein for many, the distance metric retained the same scaling factor as that of the features.

Keywords: data transformation, dummy dimension, linear transformation, scaling

Procedia PDF Downloads 286
23768 A Study on Bicycle Riding Behavior on Bike-Only Road

Authors: Hyeon Jong Yoo, Jae Hwan Yang, Dong Kyu Kim

Abstract:

Recently, riding a bicycle is recommended as an eco-friendly means of transportation. In Seoul, the mayor has secured budget for extending bicycle infrastructure. As bicycle rental centers are adopted in places, more citizens are using bike rental service on bike-only roads for leisure. However, most of the citizens are not experienced in riding bicycles. They usually do not wear helmets, keep the balance of bicycle riding, and pay attention to nearby occasions. Therefore, in this study, bicycles on Han-river bike-only road were tracked, and their behaviors were analyzed in order to show how dangerously beginner riders are riding. The number of conflicts is calculated to evaluate riding safety on the most crowded bike-only road. As a result, conflicts between beginner riders and fast-driving skilled drivers are frequently observed especially at night, and on weekends. In conclusion, it is suggested that the government should acknowledge citizens the fact that bikes are vehicles and adopt a test for bike driving.

Keywords: bicycles, safety, bike-only road, policy proposal

Procedia PDF Downloads 344
23767 Using Learning Apps in the Classroom

Authors: Janet C. Read

Abstract:

UClan set collaboration with Lingokids to assess the Lingokids learning app's impact on learning outcomes in classrooms in the UK for children with ages ranging from 3 to 5 years. Data gathered during the controlled study with 69 children includes attitudinal data, engagement, and learning scores. Data shows that children enjoyment while learning was higher among those children using the game-based app compared to those children using other traditional methods. It’s worth pointing out that engagement when using the learning app was significantly higher than other traditional methods among older children. According to existing literature, there is a direct correlation between engagement, motivation, and learning. Therefore, this study provides relevant data points to conclude that Lingokids learning app serves its purpose of encouraging learning through playful and interactive content. That being said, we believe that learning outcomes should be assessed with a wider range of methods in further studies. Likewise, it would be beneficial to assess the level of usability and playability of the app in order to evaluate the learning app from other angles.

Keywords: learning app, learning outcomes, rapid test activity, Smileyometer, early childhood education, innovative pedagogy

Procedia PDF Downloads 58
23766 Road Safety in the Great Britain: An Exploratory Data Analysis

Authors: Jatin Kumar Choudhary, Naren Rayala, Abbas Eslami Kiasari, Fahimeh Jafari

Abstract:

The Great Britain has one of the safest road networks in the world. However, the consequences of any death or serious injury are devastating for loved ones, as well as for those who help the severely injured. This paper aims to analyse the Great Britain's road safety situation and show the response measures for areas where the total damage caused by accidents can be significantly and quickly reduced. In this paper, we do an exploratory data analysis using STATS19 data. For the past 30 years, the UK has had a good record in reducing fatalities. The UK ranked third based on the number of road deaths per million inhabitants. There were around 165,000 accidents reported in the Great Britain in 2009 and it has been decreasing every year until 2019 which is under 120,000. The government continues to scale back road deaths empowering responsible road users by identifying and prosecuting the parameters that make the roads less safe.

Keywords: road safety, data analysis, openstreetmap, feature expanding.

Procedia PDF Downloads 118
23765 Intrusion Detection System Using Linear Discriminant Analysis

Authors: Zyad Elkhadir, Khalid Chougdali, Mohammed Benattou

Abstract:

Most of the existing intrusion detection systems works on quantitative network traffic data with many irrelevant and redundant features, which makes detection process more time’s consuming and inaccurate. A several feature extraction methods, such as linear discriminant analysis (LDA), have been proposed. However, LDA suffers from the small sample size (SSS) problem which occurs when the number of the training samples is small compared with the samples dimension. Hence, classical LDA cannot be applied directly for high dimensional data such as network traffic data. In this paper, we propose two solutions to solve SSS problem for LDA and apply them to a network IDS. The first method, reduce the original dimension data using principal component analysis (PCA) and then apply LDA. In the second solution, we propose to use the pseudo inverse to avoid singularity of within-class scatter matrix due to SSS problem. After that, the KNN algorithm is used for classification process. We have chosen two known datasets KDDcup99 and NSLKDD for testing the proposed approaches. Results showed that the classification accuracy of (PCA+LDA) method outperforms clearly the pseudo inverse LDA method when we have large training data.

Keywords: LDA, Pseudoinverse, PCA, IDS, NSL-KDD, KDDcup99

Procedia PDF Downloads 216
23764 On Flow Consolidation Modelling in Urban Congested Areas

Authors: Serban Stere, Stefan Burciu

Abstract:

The challenging and continuously growing competition in the urban freight transport market emphasizes the need for optimal planning of transportation processes in terms of identifying the solution of consolidating traffic flows in congested urban areas. The aim of the present paper is to present the mathematical framework and propose a methodology of combining urban traffic flows between the distribution centers located at the boundary of a congested urban area. The three scenarios regarding traffic flow between consolidation centers that are taken into consideration in the paper are based on the same characteristics of traffic flows. The scenarios differ in terms of the accessibility of the four consolidation centers given by the infrastructure, the connections between them, and the possibility of consolidating traffic flows for one or multiple destinations. Also, synthetical indicators will allow us to compare the scenarios considered and chose the indicated for our distribution system.

Keywords: distribution system, single and multiple destinations, urban consolidation centers, traffic flow consolidation schemes

Procedia PDF Downloads 144
23763 Improving Library Service Quality in Local City of Indonesia

Authors: Prima Fithri, Afri Adnan, Verra Syahmer

Abstract:

Library as a public service should be able to provide excellent and quality service. The criteria that should be available in the library is having the collection which relevant, actual and reliable, qualified and professional employee, delivery system that prompt and appropriate as well as supported by proper infrastructure. The aim of this study is to show the performance as an effort to provide quality of services that appropriate with the needs and desires of user. Then, in this research has been carried out the calculation of the gap between the perceptions and expectations of user about the services of the library. The Sevqual and QFD methods are used in this study. Servqual method for measuring the value of the gap that occurs in the dimensions of service quality and QFD method for determine priority repairment that need to be done to improve the quality of services that occur in the dimensions of service quality. From 97 questionaires, shows that value of the gap that occurs in the dimensions of service quality using by Servqual is 27.7% dimensions of responsiveness. It show how much user expectations are not met by the quality of existing services. Construction of the library and standard library becomes priority improvements that need to be done to improve the quality of service that occurs in the dimensions of service quality using the QFD.

Keywords: library, service quality, service quality, QFD

Procedia PDF Downloads 552
23762 Studies of Rule Induction by STRIM from the Decision Table with Contaminated Attribute Values from Missing Data and Noise — in the Case of Critical Dataset Size —

Authors: Tetsuro Saeki, Yuichi Kato, Shoutarou Mizuno

Abstract:

STRIM (Statistical Test Rule Induction Method) has been proposed as a method to effectively induct if-then rules from the decision table which is considered as a sample set obtained from the population of interest. Its usefulness has been confirmed by simulation experiments specifying rules in advance, and by comparison with conventional methods. However, scope for future development remains before STRIM can be applied to the analysis of real-world data sets. The first requirement is to determine the size of the dataset needed for inducting true rules, since finding statistically significant rules is the core of the method. The second is to examine the capacity of rule induction from datasets with contaminated attribute values created by missing data and noise, since real-world datasets usually contain such contaminated data. This paper examines the first problem theoretically, in connection with the rule length. The second problem is then examined in a simulation experiment, utilizing the critical size of dataset derived from the first step. The experimental results show that STRIM is highly robust in the analysis of datasets with contaminated attribute values, and hence is applicable to realworld data.

Keywords: rule induction, decision table, missing data, noise

Procedia PDF Downloads 379
23761 A Review on Modeling and Optimization of Integration of Renewable Energy Resources (RER) for Minimum Energy Cost, Minimum CO₂ Emissions and Sustainable Development, in Recent Years

Authors: M. M. Wagh, V. V. Kulkarni

Abstract:

The rising economic activities, growing population and improving living standards of world have led to a steady growth in its appetite for quality and quantity of energy services. As the economy expands the electricity demand is going to grow further, increasing the challenges of the more generation and stresses on the utility grids. Appropriate energy model will help in proper utilization of the locally available renewable energy sources such as solar, wind, biomass, small hydro etc. to integrate in the available grid, reducing the investments in energy infrastructure. Further to these new technologies like smart grids, decentralized energy planning, energy management practices, energy efficiency are emerging. In this paper, the attempt has been made to study and review the recent energy planning models, energy forecasting models, and renewable energy integration models. In addition, various modeling techniques and tools are reviewed and discussed.

Keywords: energy modeling, integration of renewable energy, energy modeling tools, energy modeling techniques

Procedia PDF Downloads 325
23760 Machine Learning Strategies for Data Extraction from Unstructured Documents in Financial Services

Authors: Delphine Vendryes, Dushyanth Sekhar, Baojia Tong, Matthew Theisen, Chester Curme

Abstract:

Much of the data that inform the decisions of governments, corporations and individuals are harvested from unstructured documents. Data extraction is defined here as a process that turns non-machine-readable information into a machine-readable format that can be stored, for instance, in a database. In financial services, introducing more automation in data extraction pipelines is a major challenge. Information sought by financial data consumers is often buried within vast bodies of unstructured documents, which have historically required thorough manual extraction. Automated solutions provide faster access to non-machine-readable datasets, in a context where untimely information quickly becomes irrelevant. Data quality standards cannot be compromised, so automation requires high data integrity. This multifaceted task is broken down into smaller steps: ingestion, table parsing (detection and structure recognition), text analysis (entity detection and disambiguation), schema-based record extraction, user feedback incorporation. Selected intermediary steps are phrased as machine learning problems. Solutions leveraging cutting-edge approaches from the fields of computer vision (e.g. table detection) and natural language processing (e.g. entity detection and disambiguation) are proposed.

Keywords: computer vision, entity recognition, finance, information retrieval, machine learning, natural language processing

Procedia PDF Downloads 95
23759 Regression Approach for Optimal Purchase of Hosts Cluster in Fixed Fund for Hadoop Big Data Platform

Authors: Haitao Yang, Jianming Lv, Fei Xu, Xintong Wang, Yilin Huang, Lanting Xia, Xuewu Zhu

Abstract:

Given a fixed fund, purchasing fewer hosts of higher capability or inversely more of lower capability is a must-be-made trade-off in practices for building a Hadoop big data platform. An exploratory study is presented for a Housing Big Data Platform project (HBDP), where typical big data computing is with SQL queries of aggregate, join, and space-time condition selections executed upon massive data from more than 10 million housing units. In HBDP, an empirical formula was introduced to predict the performance of host clusters potential for the intended typical big data computing, and it was shaped via a regression approach. With this empirical formula, it is easy to suggest an optimal cluster configuration. The investigation was based on a typical Hadoop computing ecosystem HDFS+Hive+Spark. A proper metric was raised to measure the performance of Hadoop clusters in HBDP, which was tested and compared with its predicted counterpart, on executing three kinds of typical SQL query tasks. Tests were conducted with respect to factors of CPU benchmark, memory size, virtual host division, and the number of element physical host in cluster. The research has been applied to practical cluster procurement for housing big data computing.

Keywords: Hadoop platform planning, optimal cluster scheme at fixed-fund, performance predicting formula, typical SQL query tasks

Procedia PDF Downloads 216
23758 Model Predictive Controller for Pasteurization Process

Authors: Tesfaye Alamirew Dessie

Abstract:

Our study focuses on developing a Model Predictive Controller (MPC) and evaluating it against a traditional PID for a pasteurization process. Utilizing system identification from the experimental data, the dynamics of the pasteurization process were calculated. Using best fit with data validation, residual, and stability analysis, the quality of several model architectures was evaluated. The validation data fit the auto-regressive with exogenous input (ARX322) model of the pasteurization process by roughly 80.37 percent. The ARX322 model structure was used to create MPC and PID control techniques. After comparing controller performance based on settling time, overshoot percentage, and stability analysis, it was found that MPC controllers outperform PID for those parameters.

Keywords: MPC, PID, ARX, pasteurization

Procedia PDF Downloads 144
23757 Experimental Study on Floating Breakwater Anchored by Piles

Authors: Yessi Nirwana Kurniadi, Nira Yunita Permata

Abstract:

Coastline is vulnerable to coastal erosion which damage infrastructure and buildings. Floating breakwaters are applied in order to minimize material cost but still can reduce wave height. In this paper, we investigated floating breakwater anchored by piles based on experimental study in the laboratory with model scale 1:8. Two type of floating model were tested with several combination wave height, wave period and surface water elevation to determined transmission coefficient. This experimental study proved that floating breakwater with piles can prevent wave height up to 27 cm. The physical model shows that ratio of depth to wave length is less than 0.6 and ratio of model width to wave length is less than 0.3. It is confirmed that if those ratio are less than those value, the transmission coefficient is 0.5. The result also showed that the first type model of floating breakwater can reduce wave height by 60.4 % while the second one can reduce up to 55.56 %.

Keywords: floating breakwater, experimental study, pile, transimission coefficient

Procedia PDF Downloads 518
23756 Point Estimation for the Type II Generalized Logistic Distribution Based on Progressively Censored Data

Authors: Rana Rimawi, Ayman Baklizi

Abstract:

Skewed distributions are important models that are frequently used in applications. Generalized distributions form a class of skewed distributions and gain widespread use in applications because of their flexibility in data analysis. More specifically, the Generalized Logistic Distribution with its different types has received considerable attention recently. In this study, based on progressively type-II censored data, we will consider point estimation in type II Generalized Logistic Distribution (Type II GLD). We will develop several estimators for its unknown parameters, including maximum likelihood estimators (MLE), Bayes estimators and linear estimators (BLUE). The estimators will be compared using simulation based on the criteria of bias and Mean square error (MSE). An illustrative example of a real data set will be given.

Keywords: point estimation, type II generalized logistic distribution, progressive censoring, maximum likelihood estimation

Procedia PDF Downloads 186
23755 Small and Medium-Sized Enterprises, Flash Flooding and Organisational Resilience Capacity: Qualitative Findings on Implications of the Catastrophic 2017 Flash Flood Event in Mandra, Greece

Authors: Antonis Skouloudis, Georgios Deligiannakis, Panagiotis Vouros, Konstantinos Evangelinos, Loannis Nikolaou

Abstract:

On November 15th, 2017, a catastrophic flash flood devastated the city of Mandra in Central Greece, resulting in 24 fatalities and extensive damages to the built environment and infrastructure. It was Greece's deadliest and most destructive flood event for the past 40 years. In this paper, we examine the consequences of this event too small and medium-sized enterprises (SMEs) operating in Mandra during the flood event, which were affected by the floodwaters to varying extents. In this context, we conducted semi-structured interviews with business owners-managers of 45 SMEs located in flood inundated areas and are still active nowadays, based on an interview guide that spanned 27 topics. The topics pertained to the disaster experience of the business and business owners-managers, knowledge and attitudes towards climate change and extreme weather, aspects of disaster preparedness and related assistance needs. Our findings reveal that the vast majority of the affected businesses experienced heavy damages in equipment and infrastructure or total destruction, which resulted in business interruption from several weeks up to several months. Assistance from relatives or friends helped for the damage repairs and business recovery, while state compensations were deemed insufficient compared to the extent of the damages. Most interviewees pinpoint flooding as one of the most critical risks, and many connect it with the climate crisis. However, they are either not willing or unable to apply property-level prevention measures in their businesses due to cost considerations or complex and cumbersome bureaucratic processes. In all cases, the business owners are fully aware of the flood hazard implications, and since the recovery from the event, they have engaged in basic mitigation measures and contingency plans in case of future flood events. Such plans include insurance contracts whenever possible (as the vast majority of the affected SMEs were uninsured at the time of the 2017 event) as well as simple relocations of critical equipment within their property. The study offers fruitful insights on latent drivers and barriers of SMEs' resilience capacity to flash flooding. In this respect, findings such as ours, highlighting tensions that underpin behavioral responses and experiences, can feed into a) bottom-up approaches for devising actionable and practical guidelines, manuals and/or standards on business preparedness to flooding, and, ultimately, b) policy-making for an enabling environment towards a flood-resilient SME sector.

Keywords: flash flood, small and medium-sized enterprises, organizational resilience capacity, disaster preparedness, qualitative study

Procedia PDF Downloads 121
23754 Omni: Data Science Platform for Evaluate Performance of a LoRaWAN Network

Authors: Emanuele A. Solagna, Ricardo S, Tozetto, Roberto dos S. Rabello

Abstract:

Nowadays, physical processes are becoming digitized by the evolution of communication, sensing and storage technologies which promote the development of smart cities. The evolution of this technology has generated multiple challenges related to the generation of big data and the active participation of electronic devices in society. Thus, devices can send information that is captured and processed over large areas, but there is no guarantee that all the obtained data amount will be effectively stored and correctly persisted. Because, depending on the technology which is used, there are parameters that has huge influence on the full delivery of information. This article aims to characterize the project, currently under development, of a platform that based on data science will perform a performance and effectiveness evaluation of an industrial network that implements LoRaWAN technology considering its main parameters configuration relating these parameters to the information loss.

Keywords: Internet of Things, LoRa, LoRaWAN, smart cities

Procedia PDF Downloads 132