World Academy of Science, Engineering and Technology
[Computer and Information Engineering]
Online ISSN : 1307-6892
2028 Performance Assessment of Multi-Level Ensemble for Multi-Class Problems
Authors: Rodolfo Lorbieski, Silvia Modesto Nassar
Abstract:
Many supervised machine learning tasks require decision making across numerous different classes. Multi-class classification has several applications, such as face recognition, text recognition and medical diagnostics. The objective of this article is to analyze an adapted method of Stacking in multi-class problems, which combines ensembles within the ensemble itself. For this purpose, a training similar to Stacking was used, but with three levels, where the final decision-maker (level 2) performs its training by combining outputs from the tree-based pair of meta-classifiers (level 1) from Bayesian families. These are in turn trained by pairs of base classifiers (level 0) of the same family. This strategy seeks to promote diversity among the ensembles forming the meta-classifier level 2. Three performance measures were used: (1) accuracy, (2) area under the ROC curve, and (3) time for three factors: (a) datasets, (b) experiments and (c) levels. To compare the factors, ANOVA three-way test was executed for each performance measure, considering 5 datasets by 25 experiments by 3 levels. A triple interaction between factors was observed only in time. The accuracy and area under the ROC curve presented similar results, showing a double interaction between level and experiment, as well as for the dataset factor. It was concluded that level 2 had an average performance above the other levels and that the proposed method is especially efficient for multi-class problems when compared to binary problems.Keywords: stacking, multi-layers, ensemble, multi-class
Procedia PDF Downloads 2692027 Machine Learning Approach for Stress Detection Using Wireless Physical Activity Tracker
Authors: B. Padmaja, V. V. Rama Prasad, K. V. N. Sunitha, E. Krishna Rao Patro
Abstract:
Stress is a psychological condition that reduces the quality of sleep and affects every facet of life. Constant exposure to stress is detrimental not only for mind but also body. Nevertheless, to cope with stress, one should first identify it. This paper provides an effective method for the cognitive stress level detection by using data provided from a physical activity tracker device Fitbit. This device gathers people’s daily activities of food, weight, sleep, heart rate, and physical activities. In this paper, four major stressors like physical activities, sleep patterns, working hours and change in heart rate are used to assess the stress levels of individuals. The main motive of this system is to use machine learning approach in stress detection with the help of Smartphone sensor technology. Individually, the effect of each stressor is evaluated using logistic regression and then combined model is built and assessed using variants of ordinal logistic regression models like logit, probit and complementary log-log. Then the quality of each model is evaluated using Akaike Information Criterion (AIC) and probit is assessed as the more suitable model for our dataset. This system is experimented and evaluated in a real time environment by taking data from adults working in IT and other sectors in India. The novelty of this work lies in the fact that stress detection system should be less invasive as possible for the users.Keywords: physical activity tracker, sleep pattern, working hours, heart rate, smartphone sensor
Procedia PDF Downloads 2562026 Parameter Tuning of Complex Systems Modeled in Agent Based Modeling and Simulation
Authors: Rabia Korkmaz Tan, Şebnem Bora
Abstract:
The major problem encountered when modeling complex systems with agent-based modeling and simulation techniques is the existence of large parameter spaces. A complex system model cannot be expected to reflect the whole of the real system, but by specifying the most appropriate parameters, the actual system can be represented by the model under certain conditions. When the studies conducted in recent years were reviewed, it has been observed that there are few studies for parameter tuning problem in agent based simulations, and these studies have focused on tuning parameters of a single model. In this study, an approach of parameter tuning is proposed by using metaheuristic algorithms such as Genetic Algorithm (GA), Particle Swarm Optimization (PSO), Artificial Bee Colonies (ABC), Firefly (FA) algorithms. With this hybrid structured study, the parameter tuning problems of the models in the different fields were solved. The new approach offered was tested in two different models, and its achievements in different problems were compared. The simulations and the results reveal that this proposed study is better than the existing parameter tuning studies.Keywords: parameter tuning, agent based modeling and simulation, metaheuristic algorithms, complex systems
Procedia PDF Downloads 2262025 Myanmar Character Recognition Using Eight Direction Chain Code Frequency Features
Authors: Kyi Pyar Zaw, Zin Mar Kyu
Abstract:
Character recognition is the process of converting a text image file into editable and searchable text file. Feature Extraction is the heart of any character recognition system. The character recognition rate may be low or high depending on the extracted features. In the proposed paper, 25 features for one character are used in character recognition. Basically, there are three steps of character recognition such as character segmentation, feature extraction and classification. In segmentation step, horizontal cropping method is used for line segmentation and vertical cropping method is used for character segmentation. In the Feature extraction step, features are extracted in two ways. The first way is that the 8 features are extracted from the entire input character using eight direction chain code frequency extraction. The second way is that the input character is divided into 16 blocks. For each block, although 8 feature values are obtained through eight-direction chain code frequency extraction method, we define the sum of these 8 feature values as a feature for one block. Therefore, 16 features are extracted from that 16 blocks in the second way. We use the number of holes feature to cluster the similar characters. We can recognize the almost Myanmar common characters with various font sizes by using these features. All these 25 features are used in both training part and testing part. In the classification step, the characters are classified by matching the all features of input character with already trained features of characters.Keywords: chain code frequency, character recognition, feature extraction, features matching, segmentation
Procedia PDF Downloads 3202024 SENSE-SEAT: Improving Creativity and Productivity through the Redesign of a Multisensory Technological Office Chair
Authors: Fernando Miguel Campos, Carlos Ferreira, João Pestana, Pedro Campos, Nils Ehrenberg, Wojciech Hydzik
Abstract:
The current trend of organizations offering their workers open-office spaces and co-working offices has been primed for stimulating teamwork and collaboration. However, this is not always valid as these kinds of spaces bring other types of challenges that compromise workers productivity and creativity. We present an approach for improving creativity and productivity at the workspace by redesigning an office chair that incorporates subtle technological elements that help users focus, relax and being more productive and creative. This sheds light on how we can better design interactive furniture for such popular contexts, as we develop this new chair through a multidisciplinary approach using ergonomics, interior design, interaction design, hardware and software engineering and psychology.Keywords: creativity, co-working, ergonomics, human-computer interaction, interaction, interactive furniture, productivity
Procedia PDF Downloads 3302023 Object Detection in Digital Images under Non-Standardized Conditions Using Illumination and Shadow Filtering
Authors: Waqqas-ur-Rehman Butt, Martin Servin, Marion Pause
Abstract:
In recent years, object detection has gained much attention and very encouraging research area in the field of computer vision. The robust object boundaries detection in an image is demanded in numerous applications of human computer interaction and automated surveillance systems. Many methods and approaches have been developed for automatic object detection in various fields, such as automotive, quality control management and environmental services. Inappropriately, to the best of our knowledge, object detection under illumination with shadow consideration has not been well solved yet. Furthermore, this problem is also one of the major hurdles to keeping an object detection method from the practical applications. This paper presents an approach to automatic object detection in images under non-standardized environmental conditions. A key challenge is how to detect the object, particularly under uneven illumination conditions. Image capturing conditions the algorithms need to consider a variety of possible environmental factors as the colour information, lightening and shadows varies from image to image. Existing methods mostly failed to produce the appropriate result due to variation in colour information, lightening effects, threshold specifications, histogram dependencies and colour ranges. To overcome these limitations we propose an object detection algorithm, with pre-processing methods, to reduce the interference caused by shadow and illumination effects without fixed parameters. We use the Y CrCb colour model without any specific colour ranges and predefined threshold values. The segmented object regions are further classified using morphological operations (Erosion and Dilation) and contours. Proposed approach applied on a large image data set acquired under various environmental conditions for wood stack detection. Experiments show the promising result of the proposed approach in comparison with existing methods.Keywords: image processing, illumination equalization, shadow filtering, object detection
Procedia PDF Downloads 2162022 Lip Localization Technique for Myanmar Consonants Recognition Based on Lip Movements
Authors: Thein Thein, Kalyar Myo San
Abstract:
Lip reading system is one of the different supportive technologies for hearing impaired, or elderly people or non-native speakers. For normal hearing persons in noisy environments or in conditions where the audio signal is not available, lip reading techniques can be used to increase their understanding of spoken language. Hearing impaired persons have used lip reading techniques as important tools to find out what was said by other people without hearing voice. Thus, visual speech information is important and become active research area. Using visual information from lip movements can improve the accuracy and robustness of a speech recognition system and the need for lip reading system is ever increasing for every language. However, the recognition of lip movement is a difficult task because of the region of interest (ROI) is nonlinear and noisy. Therefore, this paper proposes method to detect the accurate lips shape and to localize lip movement towards automatic lip tracking by using the combination of Otsu global thresholding technique and Moore Neighborhood Tracing Algorithm. Proposed method shows how accurate lip localization and tracking which is useful for speech recognition. In this work of study and experiments will be carried out the automatic lip localizing the lip shape for Myanmar consonants using the only visual information from lip movements which is useful for visual speech of Myanmar languages.Keywords: lip reading, lip localization, lip tracking, Moore neighborhood tracing algorithm
Procedia PDF Downloads 3522021 RGB Color Based Real Time Traffic Sign Detection and Feature Extraction System
Authors: Kay Thinzar Phu, Lwin Lwin Oo
Abstract:
In an intelligent transport system and advanced driver assistance system, the developing of real-time traffic sign detection and recognition (TSDR) system plays an important part in recent research field. There are many challenges for developing real-time TSDR system due to motion artifacts, variable lighting and weather conditions and situations of traffic signs. Researchers have already proposed various methods to minimize the challenges problem. The aim of the proposed research is to develop an efficient and effective TSDR in real time. This system proposes an adaptive thresholding method based on RGB color for traffic signs detection and new features for traffic signs recognition. In this system, the RGB color thresholding is used to detect the blue and yellow color traffic signs regions. The system performs the shape identify to decide whether the output candidate region is traffic sign or not. Lastly, new features such as termination points, bifurcation points, and 90’ angles are extracted from validated image. This system uses Myanmar Traffic Sign dataset.Keywords: adaptive thresholding based on RGB color, blue color detection, feature extraction, yellow color detection
Procedia PDF Downloads 3132020 Lecture Video Indexing and Retrieval Using Topic Keywords
Authors: B. J. Sandesh, Saurabha Jirgi, S. Vidya, Prakash Eljer, Gowri Srinivasa
Abstract:
In this paper, we propose a framework to help users to search and retrieve the portions in the lecture video of their interest. This is achieved by temporally segmenting and indexing the lecture video using the topic keywords. We use transcribed text from the video and documents relevant to the video topic extracted from the web for this purpose. The keywords for indexing are found by applying the non-negative matrix factorization (NMF) topic modeling techniques on the web documents. Our proposed technique first creates indices on the transcribed documents using the topic keywords, and these are mapped to the video to find the start and end time of the portions of the video for a particular topic. This time information is stored in the index table along with the topic keyword which is used to retrieve the specific portions of the video for the query provided by the users.Keywords: video indexing and retrieval, lecture videos, content based video search, multimodal indexing
Procedia PDF Downloads 2502019 An Integrated Lightweight Naïve Bayes Based Webpage Classification Service for Smartphone Browsers
Authors: Mayank Gupta, Siba Prasad Samal, Vasu Kakkirala
Abstract:
The internet world and its priorities have changed considerably in the last decade. Browsing on smart phones has increased manifold and is set to explode much more. Users spent considerable time browsing different websites, that gives a great deal of insight into user’s preferences. Instead of plain information classifying different aspects of browsing like Bookmarks, History, and Download Manager into useful categories would improve and enhance the user’s experience. Most of the classification solutions are server side that involves maintaining server and other heavy resources. It has security constraints and maybe misses on contextual data during classification. On device, classification solves many such problems, but the challenge is to achieve accuracy on classification with resource constraints. This on device classification can be much more useful in personalization, reducing dependency on cloud connectivity and better privacy/security. This approach provides more relevant results as compared to current standalone solutions because it uses content rendered by browser which is customized by the content provider based on user’s profile. This paper proposes a Naive Bayes based lightweight classification engine targeted for a resource constraint devices. Our solution integrates with Web Browser that in turn triggers classification algorithm. Whenever a user browses a webpage, this solution extracts DOM Tree data from the browser’s rendering engine. This DOM data is a dynamic, contextual and secure data that can’t be replicated. This proposal extracts different features of the webpage that runs on an algorithm to classify into multiple categories. Naive Bayes based engine is chosen in this solution for its inherent advantages in using limited resources compared to other classification algorithms like Support Vector Machine, Neural Networks, etc. Naive Bayes classification requires small memory footprint and less computation suitable for smartphone environment. This solution has a feature to partition the model into multiple chunks that in turn will facilitate less usage of memory instead of loading a complete model. Classification of the webpages done through integrated engine is faster, more relevant and energy efficient than other standalone on device solution. This classification engine has been tested on Samsung Z3 Tizen hardware. The Engine is integrated into Tizen Browser that uses Chromium Rendering Engine. For this solution, extensive dataset is sourced from dmoztools.net and cleaned. This cleaned dataset has 227.5K webpages which are divided into 8 generic categories ('education', 'games', 'health', 'entertainment', 'news', 'shopping', 'sports', 'travel'). Our browser integrated solution has resulted in 15% less memory usage (due to partition method) and 24% less power consumption in comparison with standalone solution. This solution considered 70% of the dataset for training the data model and the rest 30% dataset for testing. An average accuracy of ~96.3% is achieved across the above mentioned 8 categories. This engine can be further extended for suggesting Dynamic tags and using the classification for differential uses cases to enhance browsing experience.Keywords: chromium, lightweight engine, mobile computing, Naive Bayes, Tizen, web browser, webpage classification
Procedia PDF Downloads 1632018 Development of the Academic Model to Predict Student Success at VUT-FSASEC Using Decision Trees
Authors: Langa Hendrick Musawenkosi, Twala Bhekisipho
Abstract:
The success or failure of students is a concern for every academic institution, college, university, governments and students themselves. Several approaches have been researched to address this concern. In this paper, a view is held that when a student enters a university or college or an academic institution, he or she enters an academic environment. The academic environment is unique concept used to develop the solution for making predictions effectively. This paper presents a model to determine the propensity of a student to succeed or fail in the French South African Schneider Electric Education Center (FSASEC) at the Vaal University of Technology (VUT). The Decision Tree algorithm is used to implement the model at FSASEC.Keywords: FSASEC, academic environment model, decision trees, k-nearest neighbor, machine learning, popularity index, support vector machine
Procedia PDF Downloads 2002017 A Study on the Waiting Time for the First Employment of Arts Graduates in Sri Lanka
Authors: Imali T. Jayamanne, K. P. Asoka Ramanayake
Abstract:
Transition from tertiary level education to employment is one of the challenges that many fresh university graduates face after graduation. The transition period or the waiting time to obtain the first employment varies with the socio-economic factors and the general characteristics of a graduate. Compared to other fields of study, Arts graduates in Sri Lanka, have to wait a long time to find their first employment. The objective of this study is to identify the determinants of the transition from higher education to employment of these graduates using survival models. The study is based on a survey that was conducted in the year 2016 on a stratified random sample of Arts graduates from Sri Lankan universities who had graduated in 2012. Among the 469 responses, 36 (8%) waiting times were interval censored and 13 (3%) were right censored. Waiting time for the first employment varied between zero to 51 months. Initially, the log-rank and the Gehan-Wilcoxon tests were performed to identify the significant factors. Gender, ethnicity, GCE Advanced level English grade, civil status, university, class received, degree type, sector of first employment, type of first employment and the educational qualifications required for the first employment were significant at 10%. The Cox proportional hazards model was fitted to model the waiting time for first employment with these significant factors. All factors, except ethnicity and type of employment were significant at 5%. However, since the proportional hazard assumption was violated, the lognormal Accelerated failure time (AFT) model was fitted to model the waiting time for the first employment. The same factors were significant in the AFT model as in Cox proportional model.Keywords: AFT model, first employment, proportional hazard, survey design, waiting time
Procedia PDF Downloads 3122016 Parallelization of Random Accessible Progressive Streaming of Compressed 3D Models over Web
Authors: Aayushi Somani, Siba P. Samal
Abstract:
Three-dimensional (3D) meshes are data structures, which store geometric information of an object or scene, generally in the form of vertices and edges. Current technology in laser scanning and other geometric data acquisition technologies acquire high resolution sampling which leads to high resolution meshes. While high resolution meshes give better quality rendering and hence is used often, the processing, as well as storage of 3D meshes, is currently resource-intensive. At the same time, web applications for data processing have become ubiquitous owing to their accessibility. For 3D meshes, the advancement of 3D web technologies, such as WebGL, WebVR, has enabled high fidelity rendering of huge meshes. However, there exists a gap in ability to stream huge meshes to a native client and browser application due to high network latency. Also, there is an inherent delay of loading WebGL pages due to large and complex models. The focus of our work is to identify the challenges faced when such meshes are streamed into and processed on hand-held devices, owing to its limited resources. One of the solutions that are conventionally used in the graphics community to alleviate resource limitations is mesh compression. Our approach deals with a two-step approach for random accessible progressive compression and its parallel implementation. The first step includes partition of the original mesh to multiple sub-meshes, and then we invoke data parallelism on these sub-meshes for its compression. Subsequent threaded decompression logic is implemented inside the Web Browser Engine with modification of WebGL implementation in Chromium open source engine. This concept can be used to completely revolutionize the way e-commerce and Virtual Reality technology works for consumer electronic devices. These objects can be compressed in the server and can be transmitted over the network. The progressive decompression can be performed on the client device and rendered. Multiple views currently used in e-commerce sites for viewing the same product from different angles can be replaced by a single progressive model for better UX and smoother user experience. Can also be used in WebVR for commonly and most widely used activities like virtual reality shopping, watching movies and playing games. Our experiments and comparison with existing techniques show encouraging results in terms of latency (compressed size is ~10-15% of the original mesh), processing time (20-22% increase over serial implementation) and quality of user experience in web browser.Keywords: 3D compression, 3D mesh, 3D web, chromium, client-server architecture, e-commerce, level of details, parallelization, progressive compression, WebGL, WebVR
Procedia PDF Downloads 1702015 An Enhanced Digital Forensic Model for Internet of Things Forensic
Authors: Tina Wu, Andrew Martin
Abstract:
The expansion of the Internet of Things (IoT) brings a new level of threat. Attacks on IoT are already being used by criminals to form botnets, launch Distributed Denial of Service (DDoS) and distribute malware. This opens a whole new digital forensic arena to develop forensic methodologies in order to have the capability to investigate IoT related crimes. However, existing proposed IoT forensic models are still premature requiring further improvement and validation, many lack details on the acquisition and analysis phase. This paper proposes an enhanced theoretical IoT digital forensic model focused on identifying and acquiring the main sources of evidence in a methodical way. In addition, this paper presents a theoretical acquisition framework of the different stages required in order to be capable of acquiring evidence from IoT devices.Keywords: acquisition, Internet of Things, model, zoning
Procedia PDF Downloads 2712014 Hash Based Block Matching for Digital Evidence Image Files from Forensic Software Tools
Abstract:
Internet use, intelligent communication tools, and social media have all become an integral part of our daily life as a result of rapid developments in information technology. However, this widespread use increases crimes committed in the digital environment. Therefore, digital forensics, dealing with various crimes committed in digital environment, has become an important research topic. It is in the research scope of digital forensics to investigate digital evidences such as computer, cell phone, hard disk, DVD, etc. and to report whether it contains any crime related elements. There are many software and hardware tools developed for use in the digital evidence acquisition process. Today, the most widely used digital evidence investigation tools are based on the principle of finding all the data taken place in digital evidence that is matched with specified criteria and presenting it to the investigator (e.g. text files, files starting with letter A, etc.). Then, digital forensics experts carry out data analysis to figure out whether these data are related to a potential crime. Examination of a 1 TB hard disk may take hours or even days, depending on the expertise and experience of the examiner. In addition, it depends on examiner’s experience, and may change overall result involving in different cases overlooked. In this study, a hash-based matching and digital evidence evaluation method is proposed, and it is aimed to automatically classify the evidence containing criminal elements, thereby shortening the time of the digital evidence examination process and preventing human errors.Keywords: block matching, digital evidence, hash list, evaluation of digital evidence
Procedia PDF Downloads 2552013 Design of Two-Channel Quadrature Mirror Filter Banks Using a Transformation Approach
Authors: Ju-Hong Lee, Yi-Lin Shieh
Abstract:
Two-dimensional (2-D) quadrature mirror filter (QMF) banks have been widely considered for high-quality coding of image and video data at low bit rates. Without implementing subband coding, a 2-D QMF bank is required to have an exactly linear-phase response without magnitude distortion, i.e., the perfect reconstruction (PR) characteristics. The design problem of 2-D QMF banks with the PR characteristics has been considered in the literature for many years. This paper presents a transformation approach for designing 2-D two-channel QMF banks. Under a suitable one-dimensional (1-D) to two-dimensional (2-D) transformation with a specified decimation/interpolation matrix, the analysis and synthesis filters of the QMF bank are composed of 1-D causal and stable digital allpass filters (DAFs) and possess the 2-D doubly complementary half-band (DC-HB) property. This facilitates the design problem of the two-channel QMF banks by finding the real coefficients of the 1-D recursive DAFs. The design problem is formulated based on the minimax phase approximation for the 1-D DAFs. A novel objective function is then derived to obtain an optimization for 1-D minimax phase approximation. As a result, the problem of minimizing the objective function can be simply solved by using the well-known weighted least-squares (WLS) algorithm in the minimax (L∞) optimal sense. The novelty of the proposed design method is that the design procedure is very simple and the designed 2-D QMF bank achieves perfect magnitude response and possesses satisfactory phase response. Simulation results show that the proposed design method provides much better design performance and much less design complexity as compared with the existing techniques.Keywords: Quincunx QMF bank, doubly complementary filter, digital allpass filter, WLS algorithm
Procedia PDF Downloads 2252012 Secure Optimized Ingress Filtering in Future Internet Communication
Authors: Bander Alzahrani, Mohammed Alreshoodi
Abstract:
Information-centric networking (ICN) using architectures such as the Publish-Subscribe Internet Technology (PURSUIT) has been proposed as a new networking model that aims at replacing the current used end-centric networking model of the Internet. This emerged model focuses on what is being exchanged rather than which network entities are exchanging information, which gives the control plane functions such as routing and host location the ability to be specified according to the content items. The forwarding plane of the PURSUIT ICN architecture uses a simple and light mechanism based on Bloom filter technologies to forward the packets. Although this forwarding scheme solve many problems of the today’s Internet such as the growth of the routing table and the scalability issues, it is vulnerable to brute force attacks which are starting point to distributed- denial-of-service (DDoS) attacks. In this work, we design and analyze a novel source-routing and information delivery technique that keeps the simplicity of using Bloom filter-based forwarding while being able to deter different attacks such as denial of service attacks at the ingress of the network. To achieve this, special forwarding nodes called Edge-FW are directly attached to end user nodes and used to perform a security test for malicious injected random packets at the ingress of the path to prevent any possible attack brute force attacks at early stage. In this technique, a core entity of the PURSUIT ICN architecture called topology manager, that is responsible for finding shortest path and creating a forwarding identifiers (FId), uses a cryptographically secure hash function to create a 64-bit hash, h, over the formed FId for authentication purpose to be included in the packet. Our proposal restricts the attacker from injecting packets carrying random FIds with a high amount of filling factor ρ, by optimizing and reducing the maximum allowed filling factor ρm in the network. We optimize the FId to the minimum possible filling factor where ρ ≤ ρm, while it supports longer delivery trees, so the network scalability is not affected by the chosen ρm. With this scheme, the filling factor of any legitimate FId never exceeds the ρm while the filling factor of illegitimate FIds cannot exceed the chosen small value of ρm. Therefore, injecting a packet containing an FId with a large value of filling factor, to achieve higher attack probability, is not possible anymore. The preliminary analysis of this proposal indicates that with the designed scheme, the forwarding function can detect and prevent malicious activities such DDoS attacks at early stage and with very high probability.Keywords: forwarding identifier, filling factor, information centric network, topology manager
Procedia PDF Downloads 1542011 Digital Image Forensics: Discovering the History of Digital Images
Authors: Gurinder Singh, Kulbir Singh
Abstract:
Digital multimedia contents such as image, video, and audio can be tampered easily due to the availability of powerful editing softwares. Multimedia forensics is devoted to analyze these contents by using various digital forensic techniques in order to validate their authenticity. Digital image forensics is dedicated to investigate the reliability of digital images by analyzing the integrity of data and by reconstructing the historical information of an image related to its acquisition phase. In this paper, a survey is carried out on the forgery detection by considering the most recent and promising digital image forensic techniques.Keywords: Computer Forensics, Multimedia Forensics, Image Ballistics, Camera Source Identification, Forgery Detection
Procedia PDF Downloads 2472010 Extracting Opinions from Big Data of Indonesian Customer Reviews Using Hadoop MapReduce
Authors: Veronica S. Moertini, Vinsensius Kevin, Gede Karya
Abstract:
Customer reviews have been collected by many kinds of e-commerce websites selling products, services, hotel rooms, tickets and so on. Each website collects its own customer reviews. The reviews can be crawled, collected from those websites and stored as big data. Text analysis techniques can be used to analyze that data to produce summarized information, such as customer opinions. Then, these opinions can be published by independent service provider websites and used to help customers in choosing the most suitable products or services. As the opinions are analyzed from big data of reviews originated from many websites, it is expected that the results are more trusted and accurate. Indonesian customers write reviews in Indonesian language, which comes with its own structures and uniqueness. We found that most of the reviews are expressed with “daily language”, which is informal, do not follow the correct grammar, have many abbreviations and slangs or non-formal words. Hadoop is an emerging platform aimed for storing and analyzing big data in distributed systems. A Hadoop cluster consists of master and slave nodes/computers operated in a network. Hadoop comes with distributed file system (HDFS) and MapReduce framework for supporting parallel computation. However, MapReduce has weakness (i.e. inefficient) for iterative computations, specifically, the cost of reading/writing data (I/O cost) is high. Given this fact, we conclude that MapReduce function is best adapted for “one-pass” computation. In this research, we develop an efficient technique for extracting or mining opinions from big data of Indonesian reviews, which is based on MapReduce with one-pass computation. In designing the algorithm, we avoid iterative computation and instead adopt a “look up table” technique. The stages of the proposed technique are: (1) Crawling the data reviews from websites; (2) cleaning and finding root words from the raw reviews; (3) computing the frequency of the meaningful opinion words; (4) analyzing customers sentiments towards defined objects. The experiments for evaluating the performance of the technique were conducted on a Hadoop cluster with 14 slave nodes. The results show that the proposed technique (stage 2 to 4) discovers useful opinions, is capable of processing big data efficiently and scalable.Keywords: big data analysis, Hadoop MapReduce, analyzing text data, mining Indonesian reviews
Procedia PDF Downloads 2012009 Principle Components Updates via Matrix Perturbations
Authors: Aiman Elragig, Hanan Dreiwi, Dung Ly, Idriss Elmabrook
Abstract:
This paper highlights a new approach to look at online principle components analysis (OPCA). Given a data matrix X ∈ R,^m x n we characterise the online updates of its covariance as a matrix perturbation problem. Up to the principle components, it turns out that online updates of the batch PCA can be captured by symmetric matrix perturbation of the batch covariance matrix. We have shown that as n→ n0 >> 1, the batch covariance and its update become almost similar. Finally, utilize our new setup of online updates to find a bound on the angle distance of the principle components of X and its update.Keywords: online data updates, covariance matrix, online principle component analysis, matrix perturbation
Procedia PDF Downloads 1952008 Generation of Knowlege with Self-Learning Methods for Ophthalmic Data
Authors: Klaus Peter Scherer, Daniel Knöll, Constantin Rieder
Abstract:
Problem and Purpose: Intelligent systems are available and helpful to support the human being decision process, especially when complex surgical eye interventions are necessary and must be performed. Normally, such a decision support system consists of a knowledge-based module, which is responsible for the real assistance power, given by an explanation and logical reasoning processes. The interview based acquisition and generation of the complex knowledge itself is very crucial, because there are different correlations between the complex parameters. So, in this project (semi)automated self-learning methods are researched and developed for an enhancement of the quality of such a decision support system. Methods: For ophthalmic data sets of real patients in a hospital, advanced data mining procedures seem to be very helpful. Especially subgroup analysis methods are developed, extended and used to analyze and find out the correlations and conditional dependencies between the structured patient data. After finding causal dependencies, a ranking must be performed for the generation of rule-based representations. For this, anonymous patient data are transformed into a special machine language format. The imported data are used as input for algorithms of conditioned probability methods to calculate the parameter distributions concerning a special given goal parameter. Results: In the field of knowledge discovery advanced methods and applications could be performed to produce operation and patient related correlations. So, new knowledge was generated by finding causal relations between the operational equipment, the medical instances and patient specific history by a dependency ranking process. After transformation in association rules logically based representations were available for the clinical experts to evaluate the new knowledge. The structured data sets take account of about 80 parameters as special characteristic features per patient. For different extended patient groups (100, 300, 500), as well one target value as well multi-target values were set for the subgroup analysis. So the newly generated hypotheses could be interpreted regarding the dependency or independency of patient number. Conclusions: The aim and the advantage of such a semi-automatically self-learning process are the extensions of the knowledge base by finding new parameter correlations. The discovered knowledge is transformed into association rules and serves as rule-based representation of the knowledge in the knowledge base. Even more, than one goal parameter of interest can be considered by the semi-automated learning process. With ranking procedures, the most strong premises and also conjunctive associated conditions can be found to conclude the interested goal parameter. So the knowledge, hidden in structured tables or lists can be extracted as rule-based representation. This is a real assistance power for the communication with the clinical experts.Keywords: an expert system, knowledge-based support, ophthalmic decision support, self-learning methods
Procedia PDF Downloads 2532007 Particle Swarm Optimization Algorithm vs. Genetic Algorithm for Image Watermarking Based Discrete Wavelet Transform
Authors: Omaima N. Ahmad AL-Allaf
Abstract:
Over communication networks, images can be easily copied and distributed in an illegal way. The copyright protection for authors and owners is necessary. Therefore, the digital watermarking techniques play an important role as a valid solution for authority problems. Digital image watermarking techniques are used to hide watermarks into images to achieve copyright protection and prevent its illegal copy. Watermarks need to be robust to attacks and maintain data quality. Therefore, we discussed in this paper two approaches for image watermarking, first is based on Particle Swarm Optimization (PSO) and the second approach is based on Genetic Algorithm (GA). Discrete wavelet transformation (DWT) is used with the two approaches separately for embedding process to cover image transformation. Each of PSO and GA is based on co-relation coefficient to detect the high energy coefficient watermark bit in the original image and then hide the watermark in original image. Many experiments were conducted for the two approaches with different values of PSO and GA parameters. From experiments, PSO approach got better results with PSNR equal 53, MSE equal 0.0039. Whereas GA approach got PSNR equal 50.5 and MSE equal 0.0048 when using population size equal to 100, number of iterations equal to 150 and 3×3 block. According to the results, we can note that small block size can affect the quality of image watermarking based PSO/GA because small block size can increase the search area of the watermarking image. Better PSO results were obtained when using swarm size equal to 100.Keywords: image watermarking, genetic algorithm, particle swarm optimization, discrete wavelet transform
Procedia PDF Downloads 2262006 E-Government Development in Nigeria, 'Bank Verification No': An Anti-Corruption Tool
Authors: Ernest C. Nwadinobi, Amanda Peart, Carl Adams
Abstract:
The leading countries like the USA, UK and some of the European countries have moved their focus away from just developing the e-government platform towards just the electronic services which aim at providing access to information to its citizens or customers, but they have gone to make significant backroom changes that can accommodate this electronic service being provided to its customers or citizens. E-government has moved from just providing electronic information to citizens and customers alike to serving their needs. In developing countries like Nigeria, the enablement of e-government is being used as an anti-corruption tool. The introduction of the Bank verification number (BVN) scheme by the Central Bank of Nigeria, has helped the government in not just saving money but also protecting customer’s transaction and enhancing confidence in the banking sector. This has helped curtail the high rate of cyber and financial crime that has been part of the system. The use of BVN as an anti-corruption tool in Nigeria came at a time there was need for openness, accountability, and discipline, after years of robbing the treasury and recklessness in handling finances. As there has not been a defined method for measuring the strength or success of e-government development, in this case BVN, in Nigeria, progress will remain at the same level. The implementation strategy of the BVN in Nigeria has mostly been a quick fix, quick win solution. In fact, there is little or no indication to show evidence of a framework for e-government. Like other leading countries, there is the need for proper implementation of strategy and framework especially towards a customer orientated process, which will accommodate every administrative body of the government institution including private business rather than focusing on a non-flexible organisational structure. The development of e-government must have a strategy and framework for it to work, and this strategy must enclose every public administration and will not be limited to any individual bodies or organization. A defined framework or monitoring method must be put in place to help evaluate and benchmark government development in e-government. This framework must follow the same concept or principles. In censorious analyses of the existing methods, this paper will denote areas that must be included in the existing approach to be able to channel e-government development towards its defined strategic objectives.Keywords: Bank Verification No (BVN), quick-fix, anti-corruption, quick-win
Procedia PDF Downloads 1602005 Data Mining Algorithms Analysis: Case Study of Price Predictions of Lands
Authors: Julio Albuja, David Zaldumbide
Abstract:
Data analysis is an important step before taking a decision about money. The aim of this work is to analyze the factors that influence the final price of the houses through data mining algorithms. To our best knowledge, previous work was researched just to compare results. Furthermore, before using the data of the data set, the Z-Transformation were used to standardize the data in the same range. Hence, the data was classified into two groups to visualize them in a readability format. A decision tree was built, and graphical data is displayed where clearly is easy to see the results and the factors' influence in these graphics. The definitions of these methods are described, as well as the descriptions of the results. Finally, conclusions and recommendations are presented related to the released results that our research showed making it easier to apply these algorithms using a customized data set.Keywords: algorithms, data, decision tree, transformation
Procedia PDF Downloads 3742004 Database Playlists: Croatia's Popular Music in the Mirror of Collective Memory
Authors: Diana Grguric, Robert Svetlacic, Vladimir Simovic
Abstract:
Scientific research analytically explores database playlists by studying the memory culture through Croatian popular radio music. The research is based on the scientific analysis of databases developed on the basis of the playlist of ten Croatian radio stations. The most recent Croatian song on Statehood Day 2008-2013 is analyzed in order to gain insight into their (memory) potential in terms of storing, interpreting and presenting a national identity. The research starts with the general assumption that popular music is an efficient identifier, transmitter, and promoter of national identity. The aim of the scientific research of the database was to analytically reveal specific titles of Croatian popular songs that participate in marking memories and analyzing their symbolic capital to gain insight into the popular music experience of the past and to develop a new method of scientifically based analysis of specific databases.Keywords: specific databases, popular radio music, collective memory, national identity
Procedia PDF Downloads 3562003 [Keynote Talk]: Animation of Objects on the Website by Application of CSS3 Language
Authors: Vladimir Simovic, Matija Varga, Robert Svetlacic
Abstract:
Scientific work analytically explores and demonstrates techniques that can animate objects and geometric characters using CSS3 language by applying proper formatting and positioning of elements. This paper presents examples of optimum application of the CSS3 descriptive language when generating general web animations (e.g., billiards and movement of geometric characters, etc.). The paper presents analytically, the optimal development and animation design with the frames within which the animated objects are. The originally developed content is based on the upgrading of existing CSS3 descriptive language animations with more complex syntax and project-oriented work. The purpose of the developed animations is to provide an overview of the interactive features of CSS3 descriptive language design for computer games and the animation of important analytical data based on the web view. It has been analytically demonstrated that CSS3 as a descriptive language allows inserting of various multimedia elements into websites for public and internal sites.Keywords: web animation recording, KML GML HTML5 forms, Cascading Style Sheets 3, Google Earth Professional
Procedia PDF Downloads 3352002 Pallet Tracking and Cost Optimization of the Flow of Goods in Logistics Operations by Serial Shipping Container Code
Authors: Dominika Crnjac Milic, Martina Martinovic, Vladimir Simovic
Abstract:
The case study method in this paper shows the implementation of Information Technology (IT) and the Serial Shipping Container Code (SSCC) in a Croatian company that deals with logistics operations and provides logistics services in the cold chain segment. This company is aware of the sensitivity of the goods entrusted to them by the user of the service, as well as of the importance of speed and accuracy in providing logistics services. To that end, it has implemented and used the latest IT to ensure the highest standard of high-quality logistics services to its customers. Looking for efficiency and optimization of supply chain management, while maintaining a high level of quality of the products that are sold, today's users of outsourced logistics services are open to the implementation of new IT products that ultimately deliver savings. By analysing the positive results and the difficulties that arise when using this technology, we aim to provide an insight into the potential of this approach of the logistics service provider.Keywords: logistics operations, serial shipping container code, information technology, cost optimization
Procedia PDF Downloads 3602001 An Incremental Refinement Approach to a Development of Dynamic Host Configuration Protocol (DHCP) Using Event-B
Authors: Rajaa Filali, Mohamed Bouhdadi
Abstract:
This paper presents an incremental development of the Dynamic Host Configuration Protocol (DHCP) in Event-B. DHCP is widely used communication protocol, which provides a standard mechanism to obtain configuration parameters. The specification is performed in a stepwise manner and verified through a series of refinements. The Event-B formal method uses the Rodin platform to modeling and verifying some properties of the protocol such as safety, liveness and deadlock freedom. To model and verify the protocol, we use the formal technique Event-B which provides an accessible and rigorous development method. This interaction between modelling and proving reduces the complexity and helps to eliminate misunderstandings, inconsistencies, and specification gaps.Keywords: DHCP protocol, Event-B, refinement, proof obligation, Rodin
Procedia PDF Downloads 2282000 An Intrusion Detection Systems Based on K-Means, K-Medoids and Support Vector Clustering Using Ensemble
Authors: A. Mohammadpour, Ebrahim Najafi Kajabad, Ghazale Ipakchi
Abstract:
Presently, computer networks’ security rise in importance and many studies have also been conducted in this field. By the penetration of the internet networks in different fields, many things need to be done to provide a secure industrial and non-industrial network. Fire walls, appropriate Intrusion Detection Systems (IDS), encryption protocols for information sending and receiving, and use of authentication certificated are among things, which should be considered for system security. The aim of the present study is to use the outcome of several algorithms, which cause decline in IDS errors, in the way that improves system security and prevents additional overload to the system. Finally, regarding the obtained result we can also detect the amount and percentage of more sub attacks. By running the proposed system, which is based on the use of multi-algorithmic outcome and comparing that by the proposed single algorithmic methods, we observed a 78.64% result in attack detection that is improved by 3.14% than the proposed algorithms.Keywords: intrusion detection systems, clustering, k-means, k-medoids, SV clustering, ensemble
Procedia PDF Downloads 2211999 Task Scheduling on Parallel System Using Genetic Algorithm
Authors: Jasbir Singh Gill, Baljit Singh
Abstract:
Scheduling and mapping the application task graph on multiprocessor parallel systems is considered as the most crucial and critical NP-complete problem. Many genetic algorithms have been proposed to solve such problems. In this paper, two genetic approach based algorithms have been designed and developed with or without task duplication. The proposed algorithms work on two fitness functions. The first fitness i.e. task fitness is used to minimize the total finish time of the schedule (schedule length) while the second fitness function i.e. process fitness is concerned with allocating the tasks to the available highly efficient processor from the list of available processors (load balance). Proposed genetic-based algorithms have been experimentally implemented and evaluated with other state-of-art popular and widely used algorithms.Keywords: parallel computing, task scheduling, task duplication, genetic algorithm
Procedia PDF Downloads 349