Search results for: Large Data

8547 Vehicular Ad Hoc Network

Abstract:

A Vehicular Ad-Hoc Network (VANET) is a mobile Ad-Hoc Network that provides connectivity moving device to fixed equipments. Such type of device is equipped with vehicle provides safety for the passengers. In the recent research areas of traffic management there observed the wide scope of design of new methodology of extension of wireless sensor networks and ad-hoc network principal for development of VANET technology. This paper provides the wide research view of the VANET and MANET concept for the researchers to contribute the better optimization technique for the development of effective and fast atomization technique for the large size of data exchange in this complex networks.

Keywords: Ad-Hoc, MANET, Sensors, Security, VANET

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 4501

8546 Identifying Corporate Managerial Topics with Web Pages

Authors: Juan Llopis, Reyes Gonzalez, Jose Gasco

Abstract:

This paper has as its main aim to analyse how corporate web pages can become an essential tool in order to detect strategic trends by firms or sectors, and even a primary source for benchmarking. This technique has made it possible to identify the key issues in the strategic management of the most excellent large Spanish firms and also to describe trends in their long-range planning, a way of working that can be generalised to any country or firm group. More precisely, two objectives were sought. The first one consisted in showing the way in which corporate websites make it possible to obtain direct information about the strategic variables which can define firms. This tool is dynamic (since web pages are constantly updated) as well as direct and reliable, since the information comes from the firm itself, not from comments of third parties (such as journalists, academicians, consultants...). When this information is analysed for a group of firms, one can observe their characteristics in terms of both managerial tasks and business management. As for the second objective, the methodology proposed served to describe the corporate profile of the large Spanish enterprises included in the Ibex35 (the Ibex35 or Iberia Index is the reference index in the Spanish Stock Exchange and gathers periodically the 35 most outstanding Spanish firms). An attempt is therefore made to define the long-range planning that would be characteristic of the largest Spanish firms.

Keywords: Web Pages, Strategic Management, Corporate Description, Large Firms, Spain.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1576

8545 The Impact of System and Data Quality on Organizational Success in the Kingdom of Bahrain

Authors: Amal M. Alrayes

Abstract:

Data and system quality play a central role in organizational success, and the quality of any existing information system has a major influence on the effectiveness of overall system performance. Given the importance of system and data quality to an organization, it is relevant to highlight their importance on organizational performance in the Kingdom of Bahrain. This research aims to discover whether system quality and data quality are related, and to study the impact of system and data quality on organizational success. A theoretical model based on previous research is used to show the relationship between data and system quality, and organizational impact. We hypothesize, first, that system quality is positively associated with organizational impact, secondly that system quality is positively associated with data quality, and finally that data quality is positively associated with organizational impact. A questionnaire was conducted among public and private organizations in the Kingdom of Bahrain. The results show that there is a strong association between data and system quality, that affects organizational success.

Keywords: Data quality, performance, system quality.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2118

8544 Relative Navigation with Laser-Based Intermittent Measurement for Formation Flying Satellites

Authors: Jongwoo Lee, Dae-Eun Kang, Sang-Young Park

Abstract:

This study presents a precise relative navigational method for satellites flying in formation using laser-based intermittent measurement data. The measurement data for the relative navigation between two satellites consist of a relative distance measured by a laser instrument and relative attitude angles measured by attitude determination. The relative navigation solutions are estimated by both the Extended Kalman filter (EKF) and unscented Kalman filter (UKF). The solutions estimated by the EKF may become inaccurate or even diverge as measurement outage time gets longer because the EKF utilizes a linearization approach. However, this study shows that the UKF with the appropriate scaling parameters provides a stable and accurate relative navigation solutions despite the long measurement outage time and large initial error as compared to the relative navigation solutions of the EKF. Various navigation results have been analyzed by adjusting the scaling parameters of the UKF.

Keywords: Satellite relative navigation, laser-based measurement, intermittent measurement, unscented kalman filter.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1101

8543 Integration of Multi-Source Data to Monitor Coral Biodiversity

Authors: K. Jitkue, W. Srisang, C. Yaiprasert, K. Jaroensutasinee, M. Jaroensutasinee

Abstract:

This study aims at using multi-source data to monitor coral biodiversity and coral bleaching. We used coral reef at Racha Islands, Phuket as a study area. There were three sources of data: coral diversity, sensor based data and satellite data.

Keywords: Coral reefs, Remote sensing, Sea surfacetemperatue, Satellite imagery.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1553

8542 Pose-Dependency of Machine Tool Structures: Appearance, Consequences, and Challenges for Lightweight Large-Scale Machines

Authors: S. Apprich, F. Wulle, A. Lechler, A. Pott, A. Verl

Abstract:

Large-scale machine tools for the manufacturing of large work pieces, e.g. blades, casings or gears for wind turbines, feature pose-dependent dynamic behavior. Small structural damping coefficients lead to long decay times for structural vibrations that have negative impacts on the production process. Typically, these vibrations are handled by increasing the stiffness of the structure by adding mass. This is counterproductive to the needs of sustainable manufacturing as it leads to higher resource consumption both in material and in energy. Recent research activities have led to higher resource efficiency by radical mass reduction that is based on controlintegrated active vibration avoidance and damping methods. These control methods depend on information describing the dynamic behavior of the controlled machine tools in order to tune the avoidance or reduction method parameters according to the current state of the machine. This paper presents the appearance, consequences and challenges of the pose-dependent dynamic behavior of lightweight large-scale machine tool structures in production. It starts with the theoretical introduction of the challenges of lightweight machine tool structures resulting from reduced stiffness. The statement of the pose-dependent dynamic behavior is corroborated by the results of the experimental modal analysis of a lightweight test structure. Afterwards, the consequences of the pose-dependent dynamic behavior of lightweight machine tool structures for the use of active control and vibration reduction methods are explained. Based on the state of the art of pose-dependent dynamic machine tool models and the modal investigation of an FE-model of the lightweight test structure, the criteria for a pose-dependent model for use in vibration reduction are derived. The description of the approach for a general posedependent model of the dynamic behavior of large lightweight machine tools that provides the necessary input to the aforementioned vibration avoidance and reduction methods to properly tackle machine vibrations is the outlook of the paper.

Keywords: Dynamic behavior, lightweight, machine tool, pose-dependency.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2844

8541 Decision Support System Based on Data Warehouse

Authors: Yang Bao, LuJing Zhang

Abstract:

Typical Intelligent Decision Support System is 4-based, its design composes of Data Warehouse, Online Analytical Processing, Data Mining and Decision Supporting based on models, which is called Decision Support System Based on Data Warehouse (DSSBDW). This way takes ETL,OLAP and DM as its implementing means, and integrates traditional model-driving DSS and data-driving DSS into a whole. For this kind of problem, this paper analyzes the DSSBDW architecture and DW model, and discusses the following key issues: ETL designing and Realization; metadata managing technology using XML; SQL implementing, optimizing performance, data mapping in OLAP; lastly, it illustrates the designing principle and method of DW in DSSBDW.

Keywords: Decision Support System, Data Warehouse, Data Mining.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3862

8540 Stability Enhancement of a Large-Scale Power System Using Power System Stabilizer Based on Adaptive Neuro Fuzzy Inference System

Authors: Agung Budi Muljono, I Made Ginarsa, I Made Ari Nrartha

Abstract:

A large-scale power system (LSPS) consists of two or more sub-systems connected by inter-connecting transmission. Loading pattern on an LSPS always changes from time to time and varies depend on consumer need. The serious instability problem is appeared in an LSPS due to load fluctuation in all of the bus. Adaptive neuro-fuzzy inference system (ANFIS)-based power system stabilizer (PSS) is presented to cover the stability problem and to enhance the stability of an LSPS. The ANFIS control is presented because the ANFIS control is more effective than Mamdani fuzzy control in the computation aspect. Simulation results show that the presented PSS is able to maintain the stability by decreasing peak overshoot to the value of −2.56 × 10−5 pu for rotor speed deviation Δω2−3. The presented PSS also makes the settling time to achieve at 3.78 s on local mode oscillation. Furthermore, the presented PSS is able to improve the peak overshoot and settling time of Δω3−9 to the value of −0.868 × 10−5 pu and at the time of 3.50 s for inter-area oscillation.

Keywords: ANFIS, large-scale, power system, PSS, stability enhancement.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1194

8539 Methodology of Realization for Supervisor and Simulator Dedicated to a Semiconductor Research and Production Factory

Authors: Hanane Ondella, Pierre Ladet, David Ferrand, Pat Sloan

Abstract:

In the micro and nano-technology industry, the «clean-rooms» dedicated to manufacturing chip, are equipped with the most sophisticated equipment-tools. There use a large number of resources in according to strict specifications for an optimum working and result. The distribution of «utilities» to the production is assured by teams who use a supervision tool. The studies show the interest to control the various parameters of production or/and distribution, in real time, through a reliable and effective supervision tool. This document looks at a large part of the functions that the supervisor must assure, with complementary functionalities to help the diagnosis and simulation that prove very useful in our case where the supervised installations are complexed and in constant evolution.

Keywords: Control-Command, evolution, non regression, performances, real time, simulation, supervision.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1260

8538 A New History Based Method to Handle the Recurring Concept Shifts in Data Streams

Authors: Hossein Morshedlou, Ahmad Abdollahzade Barforoush

Abstract:

Recent developments in storage technology and networking architectures have made it possible for broad areas of applications to rely on data streams for quick response and accurate decision making. Data streams are generated from events of real world so existence of associations, which are among the occurrence of these events in real world, among concepts of data streams is logical. Extraction of these hidden associations can be useful for prediction of subsequent concepts in concept shifting data streams. In this paper we present a new method for learning association among concepts of data stream and prediction of what the next concept will be. Knowing the next concept, an informed update of data model will be possible. The results of conducted experiments show that the proposed method is proper for classification of concept shifting data streams.

Keywords: Data Stream, Classification, Concept Shift, History.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1278

8537 Incremental Learning of Independent Topic Analysis

Authors: Takahiro Nishigaki, Katsumi Nitta, Takashi Onoda

Abstract:

In this paper, we present a method of applying Independent Topic Analysis (ITA) to increasing the number of document data. The number of document data has been increasing since the spread of the Internet. ITA was presented as one method to analyze the document data. ITA is a method for extracting the independent topics from the document data by using the Independent Component Analysis (ICA). ICA is a technique in the signal processing; however, it is difficult to apply the ITA to increasing number of document data. Because ITA must use the all document data so temporal and spatial cost is very high. Therefore, we present Incremental ITA which extracts the independent topics from increasing number of document data. Incremental ITA is a method of updating the independent topics when the document data is added after extracted the independent topics from a just previous the data. In addition, Incremental ITA updates the independent topics when the document data is added. And we show the result applied Incremental ITA to benchmark datasets.

Keywords: Text mining, topic extraction, independent, incremental, independent component analysis.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1058

8536 An efficient Activity Network Reduction Algorithm based on the Label Correcting Tracing Algorithm

Authors: Weng Ming Chu

Abstract:

When faced with stochastic networks with an uncertain duration for their activities, the securing of network completion time becomes problematical, not only because of the non-identical pdf of duration for each node, but also because of the interdependence of network paths. As evidenced by Adlakha & Kulkarni [1], many methods and algorithms have been put forward in attempt to resolve this issue, but most have encountered this same large-size network problem. Therefore, in this research, we focus on network reduction through a Series/Parallel combined mechanism. Our suggested algorithm, named the Activity Network Reduction Algorithm (ANRA), can efficiently transfer a large-size network into an S/P Irreducible Network (SPIN). SPIN can enhance stochastic network analysis, as well as serve as the judgment of symmetry for the Graph Theory.

Keywords: Series/Parallel network, Stochastic network, Network reduction, Interdictive Graph, Complexity Index.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1379

8535 Powerful Tool to Expand Business Intelligence: Text Mining

Authors: Li Gao, Elizabeth Chang, Song Han

Abstract:

With the extensive inclusion of document, especially text, in the business systems, data mining does not cover the full scope of Business Intelligence. Data mining cannot deliver its impact on extracting useful details from the large collection of unstructured and semi-structured written materials based on natural languages. The most pressing issue is to draw the potential business intelligence from text. In order to gain competitive advantages for the business, it is necessary to develop the new powerful tool, text mining, to expand the scope of business intelligence. In this paper, we will work out the strong points of text mining in extracting business intelligence from huge amount of textual information sources within business systems. We will apply text mining to each stage of Business Intelligence systems to prove that text mining is the powerful tool to expand the scope of BI. After reviewing basic definitions and some related technologies, we will discuss the relationship and the benefits of these to text mining. Some examples and applications of text mining will also be given. The motivation behind is to develop new approach to effective and efficient textual information analysis. Thus we can expand the scope of Business Intelligence using the powerful tool, text mining.

Keywords: Business intelligence, document warehouse, text mining.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2660

8534 Evolutionary Feature Selection for Text Documents using the SVM

Authors: Daniel I. Morariu, Lucian N. Vintan, Volker Tresp

Abstract:

Text categorization is the problem of classifying text documents into a set of predefined classes. After a preprocessing step, the documents are typically represented as large sparse vectors. When training classifiers on large collections of documents, both the time and memory restrictions can be quite prohibitive. This justifies the application of feature selection methods to reduce the dimensionality of the document-representation vector. In this paper, we present three feature selection methods: Information Gain, Support Vector Machine feature selection called (SVM_FS) and Genetic Algorithm with SVM (called GA_SVM). We show that the best results were obtained with GA_SVM method for a relatively small dimension of the feature vector.

Keywords: Feature Selection, Learning with Kernels, Support Vector Machine, Genetic Algorithm, and Classification.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1706

8533 A Framework for Data Mining Based Multi-Agent: An Application to Spatial Data

Authors: H. Baazaoui Zghal, S. Faiz, H. Ben Ghezala

Abstract:

Data mining is an extraordinarily demanding field referring to extraction of implicit knowledge and relationships, which are not explicitly stored in databases. A wide variety of methods of data mining have been introduced (classification, characterization, generalization...). Each one of these methods includes more than algorithm. A system of data mining implies different user categories,, which mean that the user-s behavior must be a component of the system. The problem at this level is to know which algorithm of which method to employ for an exploratory end, which one for a decisional end, and how can they collaborate and communicate. Agent paradigm presents a new way of conception and realizing of data mining system. The purpose is to combine different algorithms of data mining to prepare elements for decision-makers, benefiting from the possibilities offered by the multi-agent systems. In this paper the agent framework for data mining is introduced, and its overall architecture and functionality are presented. The validation is made on spatial data. Principal results will be presented.

Keywords: Databases, data mining, multi-agent, spatial datamart.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2045

8532 Feature Selection Methods for an Improved SVM Classifier

Authors: Daniel Morariu, Lucian N. Vintan, Volker Tresp

Abstract:

Text categorization is the problem of classifying text documents into a set of predefined classes. After a preprocessing step, the documents are typically represented as large sparse vectors. When training classifiers on large collections of documents, both the time and memory restrictions can be quite prohibitive. This justifies the application of feature selection methods to reduce the dimensionality of the document-representation vector. In this paper, three feature selection methods are evaluated: Random Selection, Information Gain (IG) and Support Vector Machine feature selection (called SVM_FS). We show that the best results were obtained with SVM_FS method for a relatively small dimension of the feature vector. Also we present a novel method to better correlate SVM kernel-s parameters (Polynomial or Gaussian kernel).

Keywords: Feature Selection, Learning with Kernels, SupportVector Machine, and Classification.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1829

8531 Assessing the Effect of Grid Connection of Large-Scale Wind Farms on Power System Small-Signal Angular Stability

Authors: Wenjuan Du, Jingtian Bi, Tong Wang, Haifeng Wang

Abstract:

Grid connection of a large-scale wind farm affects power system small-signal angular stability in two aspects. Firstly, connection of the wind farm brings about the change of load flow and configuration of a power system. Secondly, the dynamic interaction is introduced by the wind farm with the synchronous generators (SGs) in the power system. This paper proposes a method to assess the two aspects of the effect of the wind farm on power system small-signal angular stability. The effect of the change of load flow/system configuration brought about by the wind farm can be examined separately by displacing wind farms with constant power sources, then the effect of the dynamic interaction of the wind farm with the SGs can be also computed individually. Thus, a clearer picture and better understanding on the power system small-signal angular stability as affected by grid connection of the large-scale wind farm are provided. In the paper, an example power system with grid connection of a wind farm is presented to demonstrate the proposed approach.

Keywords: power system small-signal angular stability, power system low-frequency oscillations, electromechanical oscillation modes, wind farms, double fed induction generator (DFIG)

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1819

8530 Latent Topic Based Medical Data Classification

Authors: Jian-hua Yeh, Shi-yi Kuo

Abstract:

This paper discusses the classification process for medical data. In this paper, we use the data from ACM KDDCup 2008 to demonstrate our classification process based on latent topic discovery. In this data set, the target set and outliers are quite different in their nature: target set is only 0.6% size in total, while the outliers consist of 99.4% of the data set. We use this data set as an example to show how we dealt with this extremely biased data set with latent topic discovery and noise reduction techniques. Our experiment faces two major challenge: (1) extremely distributed outliers, and (2) positive samples are far smaller than negative ones. We try to propose a suitable process flow to deal with these issues and get a best AUC result of 0.98.

Keywords: classification, latent topics, outlier adjustment, feature scaling

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1642

8529 Improving Fault Resilience and Reconstruction of Overlay Multicast Tree Using Leaving Time of Participants

Authors: Bhed Bahadur Bista

Abstract:

Network layer multicast, i.e. IP multicast, even after many years of research, development and standardization, is not deployed in large scale due to both technical (e.g. upgrading of routers) and political (e.g. policy making and negotiation) issues. Researchers looked for alternatives and proposed application/overlay multicast where multicast functions are handled by end hosts, not network layer routers. Member hosts wishing to receive multicast data form a multicast delivery tree. The intermediate hosts in the tree act as routers also, i.e. they forward data to the lower hosts in the tree. Unlike IP multicast, where a router cannot leave the tree until all members below it leave, in overlay multicast any member can leave the tree at any time thus disjoining the tree and disrupting the data dissemination. All the disrupted hosts have to rejoin the tree. This characteristic of the overlay multicast causes multicast tree unstable, data loss and rejoin overhead. In this paper, we propose that each node sets its leaving time from the tree and sends join request to a number of nodes in the tree. The nodes in the tree will reject the request if their leaving time is earlier than the requesting node otherwise they will accept the request. The node can join at one of the accepting nodes. This makes the tree more stable as the nodes will join the tree according to their leaving time, earliest leaving time node being at the leaf of the tree. Some intermediate nodes may not follow their leaving time and leave earlier than their leaving time thus disrupting the tree. For this, we propose a proactive recovery mechanism so that disrupted nodes can rejoin the tree at predetermined nodes immediately. We have shown by simulation that there is less overhead when joining the multicast tree and the recovery time of the disrupted nodes is much less than the previous works. Keywords

Keywords: Network layer multicast, Fault Resilience, IP multicast

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1387

8528 A Parallel Algorithm for 2-D Cylindrical Geometry Transport Equation with Interface Corrections

Authors: Wei Jun-xia, Yuan Guang-wei, Yang Shu-lin, Shen Wei-dong

Abstract:

In order to make conventional implicit algorithm to be applicable in large scale parallel computers , an interface prediction and correction of discontinuous finite element method is presented to solve time-dependent neutron transport equations under 2-D cylindrical geometry. Domain decomposition is adopted in the computational domain.The numerical experiments show that our parallel algorithm with explicit prediction and implicit correction has good precision, parallelism and simplicity. Especially, it can reach perfect speedup even on hundreds of processors for large-scale problems.

Keywords: Transport Equation, Discontinuous Finite Element, Domain Decomposition, Interface Prediction And Correction

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1665

8527 Dynamic Features Selection for Heart Disease Classification

Authors: Walid MOUDANI

Abstract:

The healthcare environment is generally perceived as being information rich yet knowledge poor. However, there is a lack of effective analysis tools to discover hidden relationships and trends in data. In fact, valuable knowledge can be discovered from application of data mining techniques in healthcare system. In this study, a proficient methodology for the extraction of significant patterns from the Coronary Heart Disease warehouses for heart attack prediction, which unfortunately continues to be a leading cause of mortality in the whole world, has been presented. For this purpose, we propose to enumerate dynamically the optimal subsets of the reduced features of high interest by using rough sets technique associated to dynamic programming. Therefore, we propose to validate the classification using Random Forest (RF) decision tree to identify the risky heart disease cases. This work is based on a large amount of data collected from several clinical institutions based on the medical profile of patient. Moreover, the experts- knowledge in this field has been taken into consideration in order to define the disease, its risk factors, and to establish significant knowledge relationships among the medical factors. A computer-aided system is developed for this purpose based on a population of 525 adults. The performance of the proposed model is analyzed and evaluated based on set of benchmark techniques applied in this classification problem.

Keywords: Multi-Classifier Decisions Tree, Features Reduction, Dynamic Programming, Rough Sets.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2532

8526 Data Transformation Services (DTS): Creating Data Mart by Consolidating Multi-Source Enterprise Operational Data

Authors: J. D. D. Daniel, K. N. Goh, S. M. Yusop

Abstract:

Trends in business intelligence, e-commerce and remote access make it necessary and practical to store data in different ways on multiple systems with different operating systems. As business evolve and grow, they require efficient computerized solution to perform data update and to access data from diverse enterprise business applications. The objective of this paper is to demonstrate the capability of DTS [1] as a database solution for automatic data transfer and update in solving business problem. This DTS package is developed for the sales of variety of plants and eventually expanded into commercial supply and landscaping business. Dimension data modeling is used in DTS package to extract, transform and load data from heterogeneous database systems such as MySQL, Microsoft Access and Oracle that consolidates into a Data Mart residing in SQL Server. Hence, the data transfer from various databases is scheduled to run automatically every quarter of the year to review the efficient sales analysis. Therefore, DTS is absolutely an attractive solution for automatic data transfer and update which meeting today-s business needs.

Keywords: Data Transformation Services (DTS), ObjectLinking and Embedding Database (OLEDB), Data Mart, OnlineAnalytical Processing (OLAP), Online Transactional Processing(OLTP).

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2038

8525 Extraction of Data from Web Pages: A Vision Based Approach

Authors: P. S. Hiremath, Siddu P. Algur

Abstract:

With the explosive growth of information sources available on the World Wide Web, it has become increasingly difficult to identify the relevant pieces of information, since web pages are often cluttered with irrelevant content like advertisements, navigation-panels, copyright notices etc., surrounding the main content of the web page. Hence, tools for the mining of data regions, data records and data items need to be developed in order to provide value-added services. Currently available automatic techniques to mine data regions from web pages are still unsatisfactory because of their poor performance and tag-dependence. In this paper a novel method to extract data items from the web pages automatically is proposed. It comprises of two steps: (1) Identification and Extraction of the data regions based on visual clues information. (2) Identification of data records and extraction of data items from a data region. For step1, a novel and more effective method is proposed based on visual clues, which finds the data regions formed by all types of tags using visual clues. For step2 a more effective method namely, Extraction of Data Items from web Pages (EDIP), is adopted to mine data items. The EDIP technique is a list-based approach in which the list is a linear data structure. The proposed technique is able to mine the non-contiguous data records and can correctly identify data regions, irrespective of the type of tag in which it is bound. Our experimental results show that the proposed technique performs better than the existing techniques.

Keywords: Web data records, web data regions, web mining.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1901

8524 Visual-Graphical Methods for Exploring Longitudinal Data

Authors: H. W. Ker

Abstract:

Longitudinal data typically have the characteristics of changes over time, nonlinear growth patterns, between-subjects variability, and the within errors exhibiting heteroscedasticity and dependence. The data exploration is more complicated than that of cross-sectional data. The purpose of this paper is to organize/integrate of various visual-graphical techniques to explore longitudinal data. From the application of the proposed methods, investigators can answer the research questions include characterizing or describing the growth patterns at both group and individual level, identifying the time points where important changes occur and unusual subjects, selecting suitable statistical models, and suggesting possible within-error variance.

Keywords: Data exploration, exploratory analysis, HLMs/LMEs, longitudinal data, visual-graphical methods.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2094

8523 A Materialized Approach to the Integration of XML Documents: the OSIX System

Authors: H. Ahmad, S. Kermanshahani, A. Simonet, M. Simonet

Abstract:

The data exchanged on the Web are of different nature from those treated by the classical database management systems; these data are called semi-structured data since they do not have a regular and static structure like data found in a relational database; their schema is dynamic and may contain missing data or types. Therefore, the needs for developing further techniques and algorithms to exploit and integrate such data, and extract relevant information for the user have been raised. In this paper we present the system OSIX (Osiris based System for Integration of XML Sources). This system has a Data Warehouse model designed for the integration of semi-structured data and more precisely for the integration of XML documents. The architecture of OSIX relies on the Osiris system, a DL-based model designed for the representation and management of databases and knowledge bases. Osiris is a viewbased data model whose indexing system supports semantic query optimization. We show that the problem of query processing on a XML source is optimized by the indexing approach proposed by Osiris.

Keywords: Data integration, semi-structured data, views, XML.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1590

8522 Information Retrieval: A Comparative Study of Textual Indexing Using an Oriented Object Database (db4o) and the Inverted File

Authors: Mohammed Erritali

Abstract:

The growth in the volume of text data such as books and articles in libraries for centuries has imposed to establish effective mechanisms to locate them. Early techniques such as abstraction, indexing and the use of classification categories have marked the birth of a new field of research called "Information Retrieval". Information Retrieval (IR) can be defined as the task of defining models and systems whose purpose is to facilitate access to a set of documents in electronic form (corpus) to allow a user to find the relevant ones for him, that is to say, the contents which matches with the information needs of the user. Most of the models of information retrieval use a specific data structure to index a corpus which is called "inverted file" or "reverse index". This inverted file collects information on all terms over the corpus documents specifying the identifiers of documents that contain the term in question, the frequency of each term in the documents of the corpus, the positions of the occurrences of the word... In this paper we use an oriented object database (db4o) instead of the inverted file, that is to say, instead to search a term in the inverted file, we will search it in the db4o database. The purpose of this work is to make a comparative study to see if the oriented object databases may be competing for the inverse index in terms of access speed and resource consumption using a large volume of data.

Keywords: Information Retrieval, indexation, oriented object database (db4o), inverted file.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1734

8521 Artificial Neural Network-Based Short-Term Load Forecasting for Mymensingh Area of Bangladesh

Authors: S. M. Anowarul Haque, Md. Asiful Islam

Abstract:

Electrical load forecasting is considered to be one of the most indispensable parts of a modern-day electrical power system. To ensure a reliable and efficient supply of electric energy, special emphasis should have been put on the predictive feature of electricity supply. Artificial Neural Network-based approaches have emerged to be a significant area of interest for electric load forecasting research. This paper proposed an Artificial Neural Network model based on the particle swarm optimization algorithm for improved electric load forecasting for Mymensingh, Bangladesh. The forecasting model is developed and simulated on the MATLAB environment with a large number of training datasets. The model is trained based on eight input parameters including historical load and weather data. The predicted load data are then compared with an available dataset for validation. The proposed neural network model is proved to be more reliable in terms of day-wise load forecasting for Mymensingh, Bangladesh.

Keywords: Load forecasting, artificial neural network, particle swarm optimization.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 686

8520 Embedding a Large Amount of Information Using High Secure Neural Based Steganography Algorithm

Authors: Nameer N. EL-Emam

Abstract:

In this paper, we construct and implement a new Steganography algorithm based on learning system to hide a large amount of information into color BMP image. We have used adaptive image filtering and adaptive non-uniform image segmentation with bits replacement on the appropriate pixels. These pixels are selected randomly rather than sequentially by using new concept defined by main cases with sub cases for each byte in one pixel. According to the steps of design, we have been concluded 16 main cases with their sub cases that covere all aspects of the input information into color bitmap image. High security layers have been proposed through four layers of security to make it difficult to break the encryption of the input information and confuse steganalysis too. Learning system has been introduces at the fourth layer of security through neural network. This layer is used to increase the difficulties of the statistical attacks. Our results against statistical and visual attacks are discussed before and after using the learning system and we make comparison with the previous Steganography algorithm. We show that our algorithm can embed efficiently a large amount of information that has been reached to 75% of the image size (replace 18 bits for each pixel as a maximum) with high quality of the output.

Keywords: Adaptive image segmentation, hiding with high capacity, hiding with high security, neural networks, Steganography.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1989

8519 Co-Articulation between Consonant and Vowel in Cantonese Syllables

Authors: Wai-Sum Lee

Abstract:

This study investigates C-V and V-C co-articulation in Cantonese monosyllables of the CV, VC or CVC structure, with C = one of the three stop consonants [p, t, k] and V = one of the three corner vowels [i, a, u]. Five repetitions of each test syllable on a randomized list were elicited from Cantonese young adult speakers in their early-20s. A research tool, EMA AG500, was used to record the synchronized audio signals and articulatory data at three different locations of the tongue – tongue tip, tongue middle, and tongue back – and the positions of the upper and lower lips during the test syllables. The main findings based on the articulatory data collected from two male Cantonese speakers are as follows: (i) For the syllable-initial [p-], strong co-articulation is observed when [p-] preceding the high vowel [i] or [u], but not the low vowel [a]. As for the syllable-final [-p], it is strongly co-articulated with the preceding vowel, even when the vowel is [a]. (ii) The co-articulation between the initial [t-] and the following vowel of any type is weak. In the syllable-final position, the degree of co-articulatory resistance of [-t] is also large when following the vowel [u], but [-t] is largely co-articulated with the preceding vowel when the vowel is [i] or [a]. (iii) The strength of co-articulation differs when the initial [k-] precedes the different types of vowel. A stronger co-articulation between [k-] and [i] than between [k-] and [u], and the strength of co-articulation is much reduced between [k-] and [a]. However, in the syllable-final position, there is strong co-articulation between [-k] and the preceding vowel [a]. (iv) Among the three types of stop consonants in the syllable-initial position, the decreasing degree of co-articulatory resistance (CR) is [t-] > [k-] > [p-], and the degree of CR is reduced during all three types of stop in the syllable-final position. In general, the data on co-articulation between consonant and vowel in the Cantonese monosyllables are similar to those in other languages reported in previous studies.

Keywords: Cantonese, co-articulation, consonant, vowel.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1123

8518 Data-Driven Decision-Making in Digital Entrepreneurship

Authors: Abeba Nigussie Turi, Xiangming Samuel Li

Abstract:

Data-driven business models are more typical for established businesses than early-stage startups that strive to penetrate a market. This paper provided an extensive discussion on the principles of data analytics for early-stage digital entrepreneurial businesses. Here, we developed data-driven decision-making (DDDM) framework that applies to startups prone to multifaceted barriers in the form of poor data access, technical and financial constraints, to state some. The startup DDDM framework proposed in this paper is novel in its form encompassing startup data analytics enablers and metrics aligning with startups' business models ranging from customer-centric product development to servitization which is the future of modern digital entrepreneurship.

Keywords: Startup data analytics, data-driven decision-making, data acquisition, data generation, digital entrepreneurship.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 827