Search results for: symbolic data analysis
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 41315

Search results for: symbolic data analysis

41225 A Review on Existing Challenges of Data Mining and Future Research Perspectives

Authors: Hema Bhardwaj, D. Srinivasa Rao

Abstract:

Technology for analysing, processing, and extracting meaningful data from enormous and complicated datasets can be termed as "big data." The technique of big data mining and big data analysis is extremely helpful for business movements such as making decisions, building organisational plans, researching the market efficiently, improving sales, etc., because typical management tools cannot handle such complicated datasets. Special computational and statistical issues, such as measurement errors, noise accumulation, spurious correlation, and storage and scalability limitations, are brought on by big data. These unique problems call for new computational and statistical paradigms. This research paper offers an overview of the literature on big data mining, its process, along with problems and difficulties, with a focus on the unique characteristics of big data. Organizations have several difficulties when undertaking data mining, which has an impact on their decision-making. Every day, terabytes of data are produced, yet only around 1% of that data is really analyzed. The idea of the mining and analysis of data and knowledge discovery techniques that have recently been created with practical application systems is presented in this study. This article's conclusion also includes a list of issues and difficulties for further research in the area. The report discusses the management's main big data and data mining challenges.

Keywords: big data, data mining, data analysis, knowledge discovery techniques, data mining challenges

Procedia PDF Downloads 96
41224 Data and Spatial Analysis for Economy and Education of 28 E.U. Member-States for 2014

Authors: Alexiou Dimitra, Fragkaki Maria

Abstract:

The objective of the paper is the study of geographic, economic and educational variables and their contribution to determine the position of each member-state among the EU-28 countries based on the values of seven variables as given by Eurostat. The Data Analysis methods of Multiple Factorial Correspondence Analysis (MFCA) Principal Component Analysis and Factor Analysis have been used. The cross tabulation tables of data consist of the values of seven variables for the 28 countries for 2014. The data are manipulated using the CHIC Analysis V 1.1 software package. The results of this program using MFCA and Ascending Hierarchical Classification are given in arithmetic and graphical form. For comparison reasons with the same data the Factor procedure of Statistical package IBM SPSS 20 has been used. The numerical and graphical results presented with tables and graphs, demonstrate the agreement between the two methods. The most important result is the study of the relation between the 28 countries and the position of each country in groups or clouds, which are formed according to the values of the corresponding variables.

Keywords: Multiple Factorial Correspondence Analysis, Principal Component Analysis, Factor Analysis, E.U.-28 countries, Statistical package IBM SPSS 20, CHIC Analysis V 1.1 Software, Eurostat.eu Statistics

Procedia PDF Downloads 501
41223 Survey on Arabic Sentiment Analysis in Twitter

Authors: Sarah O. Alhumoud, Mawaheb I. Altuwaijri, Tarfa M. Albuhairi, Wejdan M. Alohaideb

Abstract:

Large-scale data stream analysis has become one of the important business and research priorities lately. Social networks like Twitter and other micro-blogging platforms hold an enormous amount of data that is large in volume, velocity and variety. Extracting valuable information and trends out of these data would aid in a better understanding and decision-making. Multiple analysis techniques are deployed for English content. Moreover, one of the languages that produce a large amount of data over social networks and is least analyzed is the Arabic language. The proposed paper is a survey on the research efforts to analyze the Arabic content in Twitter focusing on the tools and methods used to extract the sentiments for the Arabic content on Twitter.

Keywords: big data, social networks, sentiment analysis, twitter

Procedia PDF Downloads 561
41222 Data Mining Meets Educational Analysis: Opportunities and Challenges for Research

Authors: Carla Silva

Abstract:

Recent development of information and communication technology enables us to acquire, collect, analyse data in various fields of socioeconomic – technological systems. Along with the increase of economic globalization and the evolution of information technology, data mining has become an important approach for economic data analysis. As a result, there has been a critical need for automated approaches to effective and efficient usage of massive amount of educational data, in order to support institutions to a strategic planning and investment decision-making. In this article, we will address data from several different perspectives and define the applied data to sciences. Many believe that 'big data' will transform business, government, and other aspects of the economy. We discuss how new data may impact educational policy and educational research. Large scale administrative data sets and proprietary private sector data can greatly improve the way we measure, track, and describe educational activity and educational impact. We also consider whether the big data predictive modeling tools that have emerged in statistics and computer science may prove useful in educational and furthermore in economics. Finally, we highlight a number of challenges and opportunities for future research.

Keywords: data mining, research analysis, investment decision-making, educational research

Procedia PDF Downloads 343
41221 Distinguishing Substance from Spectacle in Violent Extremist Propaganda through Frame Analysis

Authors: John Hardy

Abstract:

Over the last decade, the world has witnessed an unprecedented rise in the quality and availability of violent extremist propaganda. This phenomenon has been fueled primarily by three interrelated trends: rapid adoption of online content mediums by creators of violent extremist propaganda, increasing sophistication of violent extremist content production, and greater coordination of content and action across violent extremist organizations. In particular, the self-styled ‘Islamic State’ attracted widespread attention from its supporters and detractors alike by mixing shocking video and imagery content in with substantive ideological and political content. Although this practice was widely condemned for its brutality, it proved to be effective at engaging with a variety of international audiences and encouraging potential supporters to seek further information. The reasons for the noteworthy success of this kind of shock-value propaganda content remain unclear, despite many governments’ attempts to produce counterpropaganda. This study examines violent extremist propaganda distributed by five terrorist organizations between 2010 and 2016, using material released by the ‎Al Hayat Media Center of the Islamic State, Boko Haram, Al Qaeda, Al Qaeda in the Arabian Peninsula, and Al Qaeda in the Islamic Maghreb. The time period covers all issues of the infamous publications Inspire and Dabiq, as well as the most shocking video content released by the Islamic State and its affiliates. The study uses frame analysis to distinguish thematic from symbolic content in violent extremist propaganda by contrasting the ways that substantive ideology issues were framed against the use of symbols and violence to garner attention and to stylize propaganda. The results demonstrate that thematic content focuses significantly on diagnostic frames, which explain violent extremist groups’ causes, and prognostic frames, which propose solutions to addressing or rectifying the cause shared by groups and their sympathizers. Conversely, symbolic violence is primarily stylistic and rarely linked to thematic issues or motivational framing. Frame analysis provides a useful preliminary tool in disentangling substantive ideological and political content from stylistic brutality in violent extremist propaganda. This provides governments and researchers a method for better understanding the framing and content used to design narratives and propaganda materials used to promote violent extremism around the world. Increased capacity to process and understand violent extremist narratives will further enable governments and non-governmental organizations to develop effective counternarratives which promote non-violent solutions to extremists’ grievances.

Keywords: countering violent extremism, counternarratives, frame analysis, propaganda, terrorism, violent extremism

Procedia PDF Downloads 166
41220 Analysis of Big Data

Authors: Sandeep Sharma, Sarabjit Singh

Abstract:

As per the user demand and growth trends of large free data the storage solutions are now becoming more challenge-able to protect, store and to retrieve data. The days are not so far when the storage companies and organizations are start saying 'no' to store our valuable data or they will start charging a huge amount for its storage and protection. On the other hand as per the environmental conditions it becomes challenge-able to maintain and establish new data warehouses and data centers to protect global warming threats. A challenge of small data is over now, the challenges are big that how to manage the exponential growth of data. In this paper we have analyzed the growth trend of big data and its future implications. We have also focused on the impact of the unstructured data on various concerns and we have also suggested some possible remedies to streamline big data.

Keywords: big data, unstructured data, volume, variety, velocity

Procedia PDF Downloads 534
41219 Femicide in the News: Jewish and Arab Victims and Culprits in the Israeli Hebrew Media

Authors: Ina Filkobski, Eran Shor

Abstract:

This article explores how newspapers cover murder of women by family members and intimate partners. Three major Israeli newspapers were compared in order to analyse the coverage of Jewish and Arab victims and culprits and to examine whether and in what ways the media contribute to the construction of symbolic boundaries between minority and dominant social groups. A sample of some 459 articles that were published between 2013 and 2015 was studied using a systematic qualitative content analysis. Our findings suggest that the treatment of murder cases by the media varies according to the ethnicity of both victims and culprits. The murder of Jews by family members or intimate partners was framed as a shocking and unusual event, a result of the individual personality or pathology of the culprit. Conversely, when Arabs were the killers, murders were often explained by focusing on the culture of the ethnic group, described as traditional, violent, and patriarchal. In two-thirds of the cases in which Arabs were involved, so-called ‘honor killing’ or other cultural explanations were proposed as the motive for the murder. This was often the case even before a suspect was detected, while police investigation was at its very early stages, and often despite forceful denials from victims’ families. In case of Jewish culprits, more than half of the articles in our sample suggested mental disorder to explain the acts and cultural explanations were almost entirely absent. Beyond the emphasis on psychological vs. cultural explanations, newspaper articles also tend to provide much more detail about Jewish culprits than about Arabs. Such detailed examinations convey a desire to make sense of the event by understanding the supposedly unique and unorthodox nature of the killer. The detailed accounts were usually absent from the reports on Arab killers. Thus, even if reports do not explicitly offer cultural motivations for the murder, the fact that reports often remain laconic leaves people to draw their own conclusions, which would then be likely based on existing cognitive scripts and previous reports on family murders among Arabs. Such treatment contributes to the notion that Arab and Muslim cultures, religions, and nationalities are essentially misogynistic and adhere to norms of honor and shame that are radically different from those of modern societies, such as the Jewish-Israeli one. Murder within the family is one of the most dramatic occurrences in the social world, and in societies that see themselves as modern it is a taboo; an ultimate signifier of danger. We suggest that representations of murder provide a valuable prism for examining the construction of group boundaries. Our analysis, therefore, contributes to the scholarly effort to understand the creation and reinforcement of symbolic boundaries between ‘society’ and its ‘others’ by systematically tracing the media constructions of ‘otherness’. While our analysis focuses on Israel, studies on the United States, Canada, and various European countries with ethnically and racially heterogeneous populations, make it clear that the stigmatisation and exclusion of visible, religious, and language minorities are not unique to the Israeli case.

Keywords: comparative study of media coverege of minority and majority groups, construction of symbolic group boundaries, murder of women by family members and intimate partners, Israel, Jews, Arabs

Procedia PDF Downloads 162
41218 Symbolic Computation via Grobner Basis

Authors: Haohao Wang

Abstract:

The purpose of this paper is to find elimination ideals via Grobner basis. We first introduce the concept of Grobner bases, and then, we provide computational algorithms to applications for curves and surfaces.

Keywords: curves, surfaces, Grobner basis, elimination

Procedia PDF Downloads 290
41217 Pattern Recognition Using Feature Based Die-Map Clustering in the Semiconductor Manufacturing Process

Authors: Seung Hwan Park, Cheng-Sool Park, Jun Seok Kim, Youngji Yoo, Daewoong An, Jun-Geol Baek

Abstract:

Depending on the big data analysis becomes important, yield prediction using data from the semiconductor process is essential. In general, yield prediction and analysis of the causes of the failure are closely related. The purpose of this study is to analyze pattern affects the final test results using a die map based clustering. Many researches have been conducted using die data from the semiconductor test process. However, analysis has limitation as the test data is less directly related to the final test results. Therefore, this study proposes a framework for analysis through clustering using more detailed data than existing die data. This study consists of three phases. In the first phase, die map is created through fail bit data in each sub-area of die. In the second phase, clustering using map data is performed. And the third stage is to find patterns that affect final test result. Finally, the proposed three steps are applied to actual industrial data and experimental results showed the potential field application.

Keywords: die-map clustering, feature extraction, pattern recognition, semiconductor manufacturing process

Procedia PDF Downloads 390
41216 Turkey in Minds: Cognitive and Social Representation of "East" and "West"

Authors: Feyzan Tuzkaya, Nihan S. Soylu, Caglar Solak, Mehmet Peker, Hilal Peker, Kemal Ozeralp, Ceren Mete, Ezgi Mehmetoglu, Mehmet Karasu, Cihan Elci, Ece Akca, Melek Goregenli

Abstract:

Perception, evaluation and representation of the environment have been the subject of many disciplines including psychology, geography and architecture. In environmental and social psychology literature there are several evidences which suggest that cognitive representations about a place consisted of not only geographic items but also social and cultural. Mental representations of residence area or a country is influenced and determined by social-demographics, the physical and social context. Thus, all mental representations of a given place are also social representations. Cognitive maps are the main and common instruments that are used to identify spatial images and the difference between physical and subjective environments. The aim of the current study is investigating the mental and social representations of Turkey in university students’ minds. Data was collected from 249 university students from different departments (i.e. psychology, geography, history, tourism departments) of Ege University. Participants were requested to reflect Turkey in their mind onto the paper drawing sketch maps. According to the results, cognitive maps showed geographic aspects of Turkey as well as the context of symbolic, cultural and political reality of Turkey. That is to say, these maps had many symbolic and verbal items related to critics on social and cultural problems, ongoing ethnic and political conflicts, and actual political agenda of Turkey. Additionally, one of main differentiations in these representations appeared in terms of the East and West side of the Turkey, and the representations of the East and West was varied correspondingly participants’ cultural background, their ethnic values, and where they have born. The results of the study were discussed in environmental and social psychological perspective considering cultural and social values of Turkey and current political circumstances of the country.

Keywords: cognitive maps, East, West, politics, social representations, Turkey

Procedia PDF Downloads 396
41215 The Economic Limitations of Defining Data Ownership Rights

Authors: Kacper Tomasz Kröber-Mulawa

Abstract:

This paper will address the topic of data ownership from an economic perspective, and examples of economic limitations of data property rights will be provided, which have been identified using methods and approaches of economic analysis of law. To properly build a background for the economic focus, in the beginning a short perspective of data and data ownership in the EU’s legal system will be provided. It will include a short introduction to its political and social importance and highlight relevant viewpoints. This will stress the importance of a Single Market for data but also far-reaching regulations of data governance and privacy (including the distinction of personal and non-personal data, data held by public bodies and private businesses). The main discussion of this paper will build upon the briefly referred to legal basis as well as methods and approaches of economic analysis of law.

Keywords: antitrust, data, data ownership, digital economy, property rights

Procedia PDF Downloads 65
41214 Iot Device Cost Effective Storage Architecture and Real-Time Data Analysis/Data Privacy Framework

Authors: Femi Elegbeleye, Omobayo Esan, Muienge Mbodila, Patrick Bowe

Abstract:

This paper focused on cost effective storage architecture using fog and cloud data storage gateway and presented the design of the framework for the data privacy model and data analytics framework on a real-time analysis when using machine learning method. The paper began with the system analysis, system architecture and its component design, as well as the overall system operations. The several results obtained from this study on data privacy model shows that when two or more data privacy model is combined we tend to have a more stronger privacy to our data, and when fog storage gateway have several advantages over using the traditional cloud storage, from our result shows fog has reduced latency/delay, low bandwidth consumption, and energy usage when been compare with cloud storage, therefore, fog storage will help to lessen excessive cost. This paper dwelt more on the system descriptions, the researchers focused on the research design and framework design for the data privacy model, data storage, and real-time analytics. This paper also shows the major system components and their framework specification. And lastly, the overall research system architecture was shown, its structure, and its interrelationships.

Keywords: IoT, fog, cloud, data analysis, data privacy

Procedia PDF Downloads 86
41213 Cloud Design for Storing Large Amount of Data

Authors: M. Strémy, P. Závacký, P. Cuninka, M. Juhás

Abstract:

Main goal of this paper is to introduce our design of private cloud for storing large amount of data, especially pictures, and to provide good technological backend for data analysis based on parallel processing and business intelligence. We have tested hypervisors, cloud management tools, storage for storing all data and Hadoop to provide data analysis on unstructured data. Providing high availability, virtual network management, logical separation of projects and also rapid deployment of physical servers to our environment was also needed.

Keywords: cloud, glusterfs, hadoop, juju, kvm, maas, openstack, virtualization

Procedia PDF Downloads 343
41212 Data Integration with Geographic Information System Tools for Rural Environmental Monitoring

Authors: Tamas Jancso, Andrea Podor, Eva Nagyne Hajnal, Peter Udvardy, Gabor Nagy, Attila Varga, Meng Qingyan

Abstract:

The paper deals with the conditions and circumstances of integration of remotely sensed data for rural environmental monitoring purposes. The main task is to make decisions during the integration process when we have data sources with different resolution, location, spectral channels, and dimension. In order to have exact knowledge about the integration and data fusion possibilities, it is necessary to know the properties (metadata) that characterize the data. The paper explains the joining of these data sources using their attribute data through a sample project. The resulted product will be used for rural environmental analysis.

Keywords: remote sensing, GIS, metadata, integration, environmental analysis

Procedia PDF Downloads 107
41211 From CBGB to F21: The Ramone's Band T-Shirt and Its Representations in the Mainstream Culture

Authors: Cláudia Pereira, Lívia Boeschenstein

Abstract:

This article aims to present an analysis of rock band t-shirts as an element that claims a certain identity in modern-contemporary culture. This work focuses on the study of t-shirts that display the name, related elements and the logo of punk band The Ramones, because of its strong presence in the collective mind along the last decades. As we shall see, it is possible to observe a phenomenon of symbolic transition from the original cultural place of that object. At first, it was a piece of cloth that had been part of a specific subculture and then it became just a generic item diluted by the mainstream. This symbolic transitional phenomenon is significant in many ways and will be discussed furthermore. For the analysis, we begin with a brief introduction to the history of the band, followed by the study about the vintage rock band T-shirts and their meanings. From there, we will turn to a historical contextualization of band T-shirts as a subcultural item and to its redefinition after the appropriation made by the mainstream. To guide this reasoning, it will be used theories about the styles, subcultures and youth culture and about material culture from an anthropological perspective. In addition, we shall see the theories and concepts of social representations in order to understand the ways of using the Ramones’s T-shirt as a representative element of a fashionable style. This T-shirt, after being resignified by the standardization and the massive consumption, no longer symbolizes the punk movement, its behavioral motivations and original policies. Also has little to do with the rage the working class suburbs of London or New York. It seems to be a mute and vague sign of a restricted rebellion, foreseen and framed establishing a stylistic contrast to the designer clothes and good behavior predicted by establishment. It's an item that composes a specific style available on the market, but at the same time is accepted by the mainstream and provides a subcultural association that has some prestige in society. Another perspective is that of resignification loop. As the same way that punk resignified the conventional goods for their own social standards, fashion resignifies what was said to be an object of a subculture and absorbs in their own mass culture standards. Therefore, outsiders to the punk phenomenon wearing Ramones’s T-shirts can be perceived negatively by subcultural members, but at the same time are well received by those who are partially unaware or completely out of subcultural context. For the general public, the stamp of the Ramones’s logo happens to be appreciated as a diffuse allusion to a punk style, since its original meaning has being entirely neutralized.

Keywords: social representations, subcultures, material culture, punk

Procedia PDF Downloads 372
41210 Origins of the Tattoo: Decoding the Ancient Meanings of Terrestrial Body Art to Establish a Connection between the Natural World and Humans Today

Authors: Sangeet Anand

Abstract:

Body art and tattooing have long been practiced as a form of self-expression for centuries, and this study studies and analyzes the pertinence of tattoo culture in our everyday lives and ancient past. Individuals of different cultures represent ideas, practices, and elements of their cultures through symbolic representation. These symbols come in all shapes and sizes and can be as simple as the makeup you put on every day to something more permanent such as a tattoo. In the long run, these individuals who choose to display art on their bodies are seeking to express their individuality. In addition, these visuals are ultimately a reflection of our own appropriate cultures deem as beautiful, important, and powerful to the human eye. They make us known to the world and give us a plausible identity in an ever-changing world. We have lived through and seen a rise in hippie culture today. This type of bodily decoration displayed by this fad has made it seem as though body art is a visual language that is relatively new. But quite to the contrary, it is not. Through cultural symbolic exploration, we can answer key questions to ideas that have been raised for centuries. Through careful, in-depth interviews, this study takes a broad subject matter-art, and symbolism-and culminates it into a deeper philosophical connection between the world and its past. The basic methodologies used in this sociocultural study include interview questionnaires and textual analysis, which encompass a subject and interviewer as well as source material. The major findings of this study contain a distinct connection between cultural heritage and the day-to-day likings of an individual. The participant that was studied during this project demonstrated a clear passion for hobbies that were practiced even by her ancestors. We can conclude, through these findings, that there is a deeper cultural connection between modern day humans, the first humans, and the surrounding environments. Our symbols today are a direct reflection of the elements of nature that our human ancestors were exposed to, and, through cultural acceptance, we can adorn ourselves with these representations to help others identify our pasts. Body art embraces the different aspects of different cultures and holds significance, tells stories, and persists, even as the human population rapidly integrates. With this pattern, our human descendents will continue to represent their cultures and identities in the future. Body art is an integral element in understanding how and why people identify with certain aspects of life over others and broaden the scope for conducting more analysis cross-culturally.

Keywords: natural, symbolism, tattoo, terrestrial

Procedia PDF Downloads 97
41209 Analysis of Genomics Big Data in Cloud Computing Using Fuzzy Logic

Authors: Mohammad Vahed, Ana Sadeghitohidi, Majid Vahed, Hiroki Takahashi

Abstract:

In the genomics field, the huge amounts of data have produced by the next-generation sequencers (NGS). Data volumes are very rapidly growing, as it is postulated that more than one billion bases will be produced per year in 2020. The growth rate of produced data is much faster than Moore's law in computer technology. This makes it more difficult to deal with genomics data, such as storing data, searching information, and finding the hidden information. It is required to develop the analysis platform for genomics big data. Cloud computing newly developed enables us to deal with big data more efficiently. Hadoop is one of the frameworks distributed computing and relies upon the core of a Big Data as a Service (BDaaS). Although many services have adopted this technology, e.g. amazon, there are a few applications in the biology field. Here, we propose a new algorithm to more efficiently deal with the genomics big data, e.g. sequencing data. Our algorithm consists of two parts: First is that BDaaS is applied for handling the data more efficiently. Second is that the hybrid method of MapReduce and Fuzzy logic is applied for data processing. This step can be parallelized in implementation. Our algorithm has great potential in computational analysis of genomics big data, e.g. de novo genome assembly and sequence similarity search. We will discuss our algorithm and its feasibility.

Keywords: big data, fuzzy logic, MapReduce, Hadoop, cloud computing

Procedia PDF Downloads 286
41208 Analysis of Different Classification Techniques Using WEKA for Diabetic Disease

Authors: Usama Ahmed

Abstract:

Data mining is the process of analyze data which are used to predict helpful information. It is the field of research which solve various type of problem. In data mining, classification is an important technique to classify different kind of data. Diabetes is most common disease. This paper implements different classification technique using Waikato Environment for Knowledge Analysis (WEKA) on diabetes dataset and find which algorithm is suitable for working. The best classification algorithm based on diabetic data is Naïve Bayes. The accuracy of Naïve Bayes is 76.31% and take 0.06 seconds to build the model.

Keywords: data mining, classification, diabetes, WEKA

Procedia PDF Downloads 134
41207 Estimation of Missing Values in Aggregate Level Spatial Data

Authors: Amitha Puranik, V. S. Binu, Seena Biju

Abstract:

Missing data is a common problem in spatial analysis especially at the aggregate level. Missing can either occur in covariate or in response variable or in both in a given location. Many missing data techniques are available to estimate the missing data values but not all of these methods can be applied on spatial data since the data are autocorrelated. Hence there is a need to develop a method that estimates the missing values in both response variable and covariates in spatial data by taking account of the spatial autocorrelation. The present study aims to develop a model to estimate the missing data points at the aggregate level in spatial data by accounting for (a) Spatial autocorrelation of the response variable (b) Spatial autocorrelation of covariates and (c) Correlation between covariates and the response variable. Estimating the missing values of spatial data requires a model that explicitly account for the spatial autocorrelation. The proposed model not only accounts for spatial autocorrelation but also utilizes the correlation that exists between covariates, within covariates and between a response variable and covariates. The precise estimation of the missing data points in spatial data will result in an increased precision of the estimated effects of independent variables on the response variable in spatial regression analysis.

Keywords: spatial regression, missing data estimation, spatial autocorrelation, simulation analysis

Procedia PDF Downloads 364
41206 Fuzzy Optimization Multi-Objective Clustering Ensemble Model for Multi-Source Data Analysis

Authors: C. B. Le, V. N. Pham

Abstract:

In modern data analysis, multi-source data appears more and more in real applications. Multi-source data clustering has emerged as a important issue in the data mining and machine learning community. Different data sources provide information about different data. Therefore, multi-source data linking is essential to improve clustering performance. However, in practice multi-source data is often heterogeneous, uncertain, and large. This issue is considered a major challenge from multi-source data. Ensemble is a versatile machine learning model in which learning techniques can work in parallel, with big data. Clustering ensemble has been shown to outperform any standard clustering algorithm in terms of accuracy and robustness. However, most of the traditional clustering ensemble approaches are based on single-objective function and single-source data. This paper proposes a new clustering ensemble method for multi-source data analysis. The fuzzy optimized multi-objective clustering ensemble method is called FOMOCE. Firstly, a clustering ensemble mathematical model based on the structure of multi-objective clustering function, multi-source data, and dark knowledge is introduced. Then, rules for extracting dark knowledge from the input data, clustering algorithms, and base clusterings are designed and applied. Finally, a clustering ensemble algorithm is proposed for multi-source data analysis. The experiments were performed on the standard sample data set. The experimental results demonstrate the superior performance of the FOMOCE method compared to the existing clustering ensemble methods and multi-source clustering methods.

Keywords: clustering ensemble, multi-source, multi-objective, fuzzy clustering

Procedia PDF Downloads 172
41205 Analysis of Cyber Activities of Potential Business Customers Using Neo4j Graph Databases

Authors: Suglo Tohari Luri

Abstract:

Data analysis is an important aspect of business performance. With the application of artificial intelligence within databases, selecting a suitable database engine for an application design is also very crucial for business data analysis. The application of business intelligence (BI) software into some relational databases such as Neo4j has proved highly effective in terms of customer data analysis. Yet what remains of great concern is the fact that not all business organizations have the neo4j business intelligence software applications to implement for customer data analysis. Further, those with the BI software lack personnel with the requisite expertise to use it effectively with the neo4j database. The purpose of this research is to demonstrate how the Neo4j program code alone can be applied for the analysis of e-commerce website customer visits. As the neo4j database engine is optimized for handling and managing data relationships with the capability of building high performance and scalable systems to handle connected data nodes, it will ensure that business owners who advertise their products at websites using neo4j as a database are able to determine the number of visitors so as to know which products are visited at routine intervals for the necessary decision making. It will also help in knowing the best customer segments in relation to specific goods so as to place more emphasis on their advertisement on the said websites.

Keywords: data, engine, intelligence, customer, neo4j, database

Procedia PDF Downloads 185
41204 Analysis and Forecasting of Bitcoin Price Using Exogenous Data

Authors: J-C. Leneveu, A. Chereau, L. Mansart, T. Mesbah, M. Wyka

Abstract:

Extracting and interpreting information from Big Data represent a stake for years to come in several sectors such as finance. Currently, numerous methods are used (such as Technical Analysis) to try to understand and to anticipate market behavior, with mixed results because it still seems impossible to exactly predict a financial trend. The increase of available data on Internet and their diversity represent a great opportunity for the financial world. Indeed, it is possible, along with these standard financial data, to focus on exogenous data to take into account more macroeconomic factors. Coupling the interpretation of these data with standard methods could allow obtaining more precise trend predictions. In this paper, in order to observe the influence of exogenous data price independent of other usual effects occurring in classical markets, behaviors of Bitcoin users are introduced in a model reconstituting Bitcoin value, which is elaborated and tested for prediction purposes.

Keywords: big data, bitcoin, data mining, social network, financial trends, exogenous data, global economy, behavioral finance

Procedia PDF Downloads 348
41203 Estimating the Life-Distribution Parameters of Weibull-Life PV Systems Utilizing Non-Parametric Analysis

Authors: Saleem Z. Ramadan

Abstract:

In this paper, a model is proposed to determine the life distribution parameters of the useful life region for the PV system utilizing a combination of non-parametric and linear regression analysis for the failure data of these systems. Results showed that this method is dependable for analyzing failure time data for such reliable systems when the data is scarce.

Keywords: masking, bathtub model, reliability, non-parametric analysis, useful life

Procedia PDF Downloads 548
41202 The Extent of Big Data Analysis by the External Auditors

Authors: Iyad Ismail, Fathilatul Abdul Hamid

Abstract:

This research was mainly investigated to recognize the extent of big data analysis by external auditors. This paper adopts grounded theory as a framework for conducting a series of semi-structured interviews with eighteen external auditors. The research findings comprised the availability extent of big data and big data analysis usage by the external auditors in Palestine, Gaza Strip. Considering the study's outcomes leads to a series of auditing procedures in order to improve the external auditing techniques, which leads to high-quality audit process. Also, this research is crucial for auditing firms by giving an insight into the mechanisms of auditing firms to identify the most important strategies that help in achieving competitive audit quality. These results are aims to instruct the auditing academic and professional institutions in developing techniques for external auditors in order to the big data analysis. This paper provides appropriate information for the decision-making process and a source of future information which affects technological auditing.

Keywords: big data analysis, external auditors, audit reliance, internal audit function

Procedia PDF Downloads 57
41201 Enhance the Power of Sentiment Analysis

Authors: Yu Zhang, Pedro Desouza

Abstract:

Since big data has become substantially more accessible and manageable due to the development of powerful tools for dealing with unstructured data, people are eager to mine information from social media resources that could not be handled in the past. Sentiment analysis, as a novel branch of text mining, has in the last decade become increasingly important in marketing analysis, customer risk prediction and other fields. Scientists and researchers have undertaken significant work in creating and improving their sentiment models. In this paper, we present a concept of selecting appropriate classifiers based on the features and qualities of data sources by comparing the performances of five classifiers with three popular social media data sources: Twitter, Amazon Customer Reviews, and Movie Reviews. We introduced a couple of innovative models that outperform traditional sentiment classifiers for these data sources, and provide insights on how to further improve the predictive power of sentiment analysis. The modelling and testing work was done in R and Greenplum in-database analytic tools.

Keywords: sentiment analysis, social media, Twitter, Amazon, data mining, machine learning, text mining

Procedia PDF Downloads 338
41200 Analysis of Cooperative Learning Behavior Based on the Data of Students' Movement

Authors: Wang Lin, Li Zhiqiang

Abstract:

The purpose of this paper is to analyze the cooperative learning behavior pattern based on the data of students' movement. The study firstly reviewed the cooperative learning theory and its research status, and briefly introduced the k-means clustering algorithm. Then, it used clustering algorithm and mathematical statistics theory to analyze the activity rhythm of individual student and groups in different functional areas, according to the movement data provided by 10 first-year graduate students. It also focused on the analysis of students' behavior in the learning area and explored the law of cooperative learning behavior. The research result showed that the cooperative learning behavior analysis method based on movement data proposed in this paper is feasible. From the results of data analysis, the characteristics of behavior of students and their cooperative learning behavior patterns could be found.

Keywords: behavior pattern, cooperative learning, data analyze, k-means clustering algorithm

Procedia PDF Downloads 172
41199 Big Data Analysis with Rhipe

Authors: Byung Ho Jung, Ji Eun Shin, Dong Hoon Lim

Abstract:

Rhipe that integrates R and Hadoop environment made it possible to process and analyze massive amounts of data using a distributed processing environment. In this paper, we implemented multiple regression analysis using Rhipe with various data sizes of actual data. Experimental results for comparing the performance of our Rhipe with stats and biglm packages available on bigmemory, showed that our Rhipe was more fast than other packages owing to paralleling processing with increasing the number of map tasks as the size of data increases. We also compared the computing speeds of pseudo-distributed and fully-distributed modes for configuring Hadoop cluster. The results showed that fully-distributed mode was faster than pseudo-distributed mode, and computing speeds of fully-distributed mode were faster as the number of data nodes increases.

Keywords: big data, Hadoop, Parallel regression analysis, R, Rhipe

Procedia PDF Downloads 489
41198 What the Future Holds for Social Media Data Analysis

Authors: P. Wlodarczak, J. Soar, M. Ally

Abstract:

The dramatic rise in the use of Social Media (SM) platforms such as Facebook and Twitter provide access to an unprecedented amount of user data. Users may post reviews on products and services they bought, write about their interests, share ideas or give their opinions and views on political issues. There is a growing interest in the analysis of SM data from organisations for detecting new trends, obtaining user opinions on their products and services or finding out about their online reputations. A recent research trend in SM analysis is making predictions based on sentiment analysis of SM. Often indicators of historic SM data are represented as time series and correlated with a variety of real world phenomena like the outcome of elections, the development of financial indicators, box office revenue and disease outbreaks. This paper examines the current state of research in the area of SM mining and predictive analysis and gives an overview of the analysis methods using opinion mining and machine learning techniques.

Keywords: social media, text mining, knowledge discovery, predictive analysis, machine learning

Procedia PDF Downloads 415
41197 The Role of Ideophones: Phonological and Morphological Characteristics in Literature

Authors: Cristina Bahón Arnaiz

Abstract:

Many Asian languages, such as Korean and Japanese, are well-known for their wide use of sound symbolic words or ideophones. This is a very particular characteristic which enriches its lexicon hugely. Ideophones are a class of sound symbolic words that utilize sound symbolism to express aspects, states, emotions, or conditions that can be experienced through the senses, such as shape, color, smell, action or movement. Ideophones have very particular characteristics in terms of sound symbolism and morphology, which distinguish them from other words. The phonological characteristics of ideophones are vowel ablaut or vowel gradation and consonant mutation. In the case of Korean, there are light vowels and dark vowels. Depending on the type of vowel that is used, the meaning will slightly change. Consonant mutation, also known as consonant ablaut, contributes to the level of intensity, emphasis, and volume of an expression. In addition to these phonological characteristics, there is one main morphological singularity, which is reduplication and it carries the meaning of continuity, repetition, intensity, emphasis, and plurality. All these characteristics play an important role in both linguistics and literature as they enhance the meaning of what is trying to be expressed with incredible semantic detail, expressiveness, and rhythm. The following study will analyze the ideophones used in a single paragraph of a Korean novel, which add incredible yet subtle detail to the meaning of the words, and advance the expressiveness and rhythm of the text. The results from analyzing one paragraph from a novel, after presenting the phonological and morphological characteristics of Korean ideophones, will evidence the important role that ideophones play in literature. 

Keywords: ideophones, mimetic words, phonomimes, phenomimes, psychomimes, sound symbolism

Procedia PDF Downloads 136
41196 Detection Efficient Enterprises via Data Envelopment Analysis

Authors: S. Turkan

Abstract:

In this paper, the Turkey’s Top 500 Industrial Enterprises data in 2014 were analyzed by data envelopment analysis. Data envelopment analysis is used to detect efficient decision-making units such as universities, hospitals, schools etc. by using inputs and outputs. The decision-making units in this study are enterprises. To detect efficient enterprises, some financial ratios are determined as inputs and outputs. For this reason, financial indicators related to productivity of enterprises are considered. The efficient foreign weighted owned capital enterprises are detected via super efficiency model. According to the results, it is said that Mercedes-Benz is the most efficient foreign weighted owned capital enterprise in Turkey.

Keywords: data envelopment analysis, super efficiency, logistic regression, financial ratios

Procedia PDF Downloads 315