Search results for: exploratory data analysis
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 42679

Search results for: exploratory data analysis

42559 A Multivariate Exploratory Data Analysis of a Crisis Text Messaging Service in Order to Analyse the Impact of the COVID-19 Pandemic on Mental Health in Ireland

Authors: Hamda Ajmal, Karen Young, Ruth Melia, John Bogue, Mary O'Sullivan, Jim Duggan, Hannah Wood

Abstract:

The Covid-19 pandemic led to a range of public health mitigation strategies in order to suppress the SARS-CoV-2 virus. The drastic changes in everyday life due to lockdowns had the potential for a significant negative impact on public mental health, and a key public health goal is to now assess the evidence from available Irish datasets to provide useful insights on this issue. Text-50808 is an online text-based mental health support service, established in Ireland in 2020, and can provide a measure of revealed distress and mental health concerns across the population. The aim of this study is to explore statistical associations between public mental health in Ireland and the Covid-19 pandemic. Uniquely, this study combines two measures of emotional wellbeing in Ireland: (1) weekly text volume at Text-50808, and (2) emotional wellbeing indicators reported by respondents of the Amárach public opinion survey, carried out on behalf of the Department of Health, Ireland. For this analysis, a multivariate graphical exploratory data analysis (EDA) was performed on the Text-50808 dataset dated from 15th June 2020 to 30th June 2021. This was followed by time-series analysis of key mental health indicators including: (1) the percentage of daily/weekly texts at Text-50808 that mention Covid-19 related issues; (2) the weekly percentage of people experiencing anxiety, boredom, enjoyment, happiness, worry, fear and stress in Amárach survey; and Covid-19 related factors: (3) daily new Covid-19 case numbers; (4) daily stringency index capturing the effect of government non-pharmaceutical interventions (NPIs) in Ireland. The cross-correlation function was applied to measure the relationship between the different time series. EDA of the Text-50808 dataset reveals significant peaks in the volume of texts on days prior to level 3 lockdown and level 5 lockdown in October 2020, and full level 5 lockdown in December 2020. A significantly high positive correlation was observed between the percentage of texts at Text-50808 that reported Covid-19 related issues and the percentage of respondents experiencing anxiety, worry and boredom (at a lag of 1 week) in Amárach survey data. There is a significant negative correlation between percentage of texts with Covid-19 related issues and percentage of respondents experiencing happiness in Amárach survey. Daily percentage of texts at Text-50808 that reported Covid-19 related issues to have a weak positive correlation with daily new Covid-19 cases in Ireland at a lag of 10 days and with daily stringency index of NPIs in Ireland at a lag of 2 days. The sudden peaks in text volume at Text-50808 immediately prior to new restrictions in Ireland indicate an association between a rise in mental health concerns following the announcement of new restrictions. There is also a high correlation between emotional wellbeing variables in the Amárach dataset and the number of weekly texts at Text-50808, and this confirms that Text-50808 reflects overall public sentiment. This analysis confirms the benefits of the texting service as a community surveillance tool for mental health in the population. This initial EDA will be extended to use multivariate modeling to predict the effect of additional Covid-19 related factors on public mental health in Ireland.

Keywords: COVID-19 pandemic, data analysis, digital health, mental health, public health, digital health

Procedia PDF Downloads 146
42558 Data Mining Algorithms Analysis: Case Study of Price Predictions of Lands

Authors: Julio Albuja, David Zaldumbide

Abstract:

Data analysis is an important step before taking a decision about money. The aim of this work is to analyze the factors that influence the final price of the houses through data mining algorithms. To our best knowledge, previous work was researched just to compare results. Furthermore, before using the data of the data set, the Z-Transformation were used to standardize the data in the same range. Hence, the data was classified into two groups to visualize them in a readability format. A decision tree was built, and graphical data is displayed where clearly is easy to see the results and the factors' influence in these graphics. The definitions of these methods are described, as well as the descriptions of the results. Finally, conclusions and recommendations are presented related to the released results that our research showed making it easier to apply these algorithms using a customized data set.

Keywords: algorithms, data, decision tree, transformation

Procedia PDF Downloads 377
42557 Correlation Analysis of Energy Use, Architectural Design and Residential Lifestyle in Japan Smart Community

Authors: Tran Le Na, Didit Novianto, Yoshiaki Ushifusa, Weijun Gao

Abstract:

This paper introduces the characteristics of Japanese residential lifestyle and Japanese Architectural housing design, meanwhile, summarizes the results from an analysis of energy use of 12 households in electric-only multi dwellings in Higashida Smart Community, Kitakyushu, Japan. Using hourly load and daily load data collected from smart meter, we explore correlations of energy use in households according to the incentive of different levels of architectural characteristics and lifestyle, following three factors: Space (Living room, Kitchen, Bedroom, Bathroom), Time (daytime and night time, weekdays and weekend) and User (Elderly, Parents, Kids). The energy consumption reports demonstrated that the essential demand of household’s response to variable factors. From that exploratory analysis, we can define the role of housing equipment layout and spatial layout in residential housing design. Likewise, determining preferred spaces and time use can help to optimize energy consumption in households. This paper contributes to the application of Smart Home Energy Management System in Smart Community in Japan and provides a good experience to other countries.

Keywords: smart community, energy efficiency, architectural housing design, residential lifestyle

Procedia PDF Downloads 206
42556 Validation of the Career Motivation Scale among Chinese University and Vocational College Teachers

Authors: Wei Zhang, Lifen Zhao

Abstract:

The present study aims to translate and validate the Career Motivation Scale among Chinese university and vocational college teachers. Exploratory factor analysis supported a three-factor structure that was consistent with the original structure of career motivation: career insight, career identity, and career resilience. Confirmatory factor analysis showed that a second-order three-factor model with correlated measurement errors best fit the data. Configural, metric, and scalar invariance models were tested, demonstrating that the Chinese version of the Career Motivation Scale did not differ across groups of school type, educational level, and working years in current institutions. The concurrent validity of the Chinese Career Motivation Scale was confirmed by its significant correlations with work engagement, career adaptability, career satisfaction, job crafting, and intention to quit. The results of the study indicated that the Chinese Career Motivation Scale was a valid and reliable measure of career motivation among university and vocational college teachers in China.

Keywords: career motivation scale, Chinese University, vocational college teachers, measurement invariance, validation

Procedia PDF Downloads 136
42555 A Modular Framework for Enabling Analysis for Educators with Different Levels of Data Mining Skills

Authors: Kyle De Freitas, Margaret Bernard

Abstract:

Enabling data mining analysis among a wider audience of educators is an active area of research within the educational data mining (EDM) community. The paper proposes a framework for developing an environment that caters for educators who have little technical data mining skills as well as for more advanced users with some data mining expertise. This framework architecture was developed through the review of the strengths and weaknesses of existing models in the literature. The proposed framework provides a modular architecture for future researchers to focus on the development of specific areas within the EDM process. Finally, the paper also highlights a strategy of enabling analysis through either the use of predefined questions or a guided data mining process and highlights how the developed questions and analysis conducted can be reused and extended over time.

Keywords: educational data mining, learning management system, learning analytics, EDM framework

Procedia PDF Downloads 330
42554 System Dietadhoc® - A Fusion of Human-Centred Design and Agile Development for the Explainability of AI Techniques Based on Nutritional and Clinical Data

Authors: Michelangelo Sofo, Giuseppe Labianca

Abstract:

In recent years, the scientific community's interest in the exploratory analysis of biomedical data has increased exponentially. Considering the field of research of nutritional biologists, the curative process, based on the analysis of clinical data, is a very delicate operation due to the fact that there are multiple solutions for the management of pathologies in the food sector (for example can recall intolerances and allergies, management of cholesterol metabolism, diabetic pathologies, arterial hypertension, up to obesity and breathing and sleep problems). In this regard, in this research work a system was created capable of evaluating various dietary regimes for specific patient pathologies. The system is founded on a mathematical-numerical model and has been created tailored for the real working needs of an expert in human nutrition using the human-centered design (ISO 9241-210), therefore it is in step with continuous scientific progress in the field and evolves through the experience of managed clinical cases (machine learning process). DietAdhoc® is a decision support system nutrition specialists for patients of both sexes (from 18 years of age) developed with an agile methodology. Its task consists in drawing up the biomedical and clinical profile of the specific patient by applying two algorithmic optimization approaches on nutritional data and a symbolic solution, obtained by transforming the relational database underlying the system into a deductive database. For all three solution approaches, particular emphasis has been given to the explainability of the suggested clinical decisions through flexible and customizable user interfaces. Furthermore, the system has multiple software modules based on time series and visual analytics techniques that allow to evaluate the complete picture of the situation and the evolution of the diet assigned for specific pathologies.

Keywords: medical decision support, physiological data extraction, data driven diagnosis, human centered AI, symbiotic AI paradigm

Procedia PDF Downloads 31
42553 AI-Based Technologies in International Arbitration: An Exploratory Study on the Practicability of Applying AI Tools in International Arbitration

Authors: Annabelle Onyefulu-Kingston

Abstract:

One of the major purposes of AI today is to evaluate and analyze millions of micro and macro data in order to determine what is relevant in a particular case and proffer it in an adequate manner. Microdata, as far as it relates to AI in international arbitration, is the millions of key issues specifically mentioned by either one or both parties or by their counsels, arbitrators, or arbitral tribunals in arbitral proceedings. This can be qualifications of expert witness and admissibility of evidence, amongst others. Macro data, on the other hand, refers to data derived from the resolution of the dispute and, consequently, the final and binding award. A notable example of this includes the rationale of the award and specific and general damages awarded, amongst others. This paper aims to critically evaluate and analyze the possibility of technological inclusion in international arbitration. This research will be imploring the qualitative method by evaluating existing literature on the consequence of applying AI to both micro and macro data in international arbitration, and how this can be of assistance to parties, counsels, and arbitrators.

Keywords: AI-based technologies, algorithms, arbitrators, international arbitration

Procedia PDF Downloads 103
42552 Understanding Mathematics Achievements among U. S. Middle School Students: A Bayesian Multilevel Modeling Analysis with Informative Priors

Authors: Jing Yuan, Hongwei Yang

Abstract:

This paper aims to understand U.S. middle school students’ mathematics achievements by examining relevant student and school-level predictors. Through a variance component analysis, the study first identifies evidence supporting the use of multilevel modeling. Then, a multilevel analysis is performed under Bayesian statistical inference where prior information is incorporated into the modeling process. During the analysis, independent variables are entered sequentially in the order of theoretical importance to create a hierarchy of models. By evaluating each model using Bayesian fit indices, a best-fit and most parsimonious model is selected where Bayesian statistical inference is performed for the purpose of result interpretation and discussion. The primary dataset for Bayesian modeling is derived from the Program for International Student Assessment (PISA) in 2012 with a secondary PISA dataset from 2003 analyzed under the traditional ordinary least squares method to provide the information needed to specify informative priors for a subset of the model parameters. The dependent variable is a composite measure of mathematics literacy, calculated from an exploratory factor analysis of all five PISA 2012 mathematics achievement plausible values for which multiple evidences are found supporting data unidimensionality. The independent variables include demographics variables and content-specific variables: mathematics efficacy, teacher-student ratio, proportion of girls in the school, etc. Finally, the entire analysis is performed using the MCMCpack and MCMCglmm packages in R.

Keywords: Bayesian multilevel modeling, mathematics education, PISA, multilevel

Procedia PDF Downloads 339
42551 Storyboarding for VR: Towards A Conceptual Framework for Transitioning Traditional Storyboarded Narrative Sequences to Immersive 3D VR Experiences

Authors: Sorin Oancea

Abstract:

More than half a century after Ivan Sutherland’s seminal essay, ‘The Ultimate Display’ (1965), 3D Virtual Reality is still an emergent and exploratory medium in terms of its narrative potential, production methodology, and market penetration. Traditionally positioned in front of the screen/canvas as a ‘window-on-the-world’, the storyboarder and animation director transcend the medium and its narrative reality entirely while designing a linear cinematic sequence. This paper proposes a gradual transition from the traditional linear sequence design process based on a transcendent position of the storyboarder and animation director to an increasingly immersed one characterized by a sense of unmediated presence and immanence. Employing a quaitative analysis of the current exploratory storyboarding processes for 3D VR, this research uses a practice-based methodology based on producing a short-form 3D VR narrative experience to derive its findings. The original contribution to knowledge is charting an empirically derived conceptual framework for VR storyboarding and animation directing, with the documented reflective and reflexive process as a map for directorial transitioning between converging mediums by articulating the new VR lexical categories and expounding links to allied performative arts, such as film and theatre.

Keywords: storyboarding, immersive, virtual reality, transitioning

Procedia PDF Downloads 105
42550 Quantile Coherence Analysis: Application to Precipitation Data

Authors: Yaeji Lim, Hee-Seok Oh

Abstract:

The coherence analysis measures the linear time-invariant relationship between two data sets and has been studied various fields such as signal processing, engineering, and medical science. However classical coherence analysis tends to be sensitive to outliers and focuses only on mean relationship. In this paper, we generalized cross periodogram to quantile cross periodogram and provide richer inter-relationship between two data sets. This is a general version of Laplace cross periodogram. We prove its asymptotic distribution under the long range process and compare them with ordinary coherence through numerical examples. We also present real data example to confirm the usefulness of quantile coherence analysis.

Keywords: coherence, cross periodogram, spectrum, quantile

Procedia PDF Downloads 395
42549 Investigating Real Ship Accidents with Descriptive Analysis in Turkey

Authors: İsmail Karaca, Ömer Söner

Abstract:

The use of advanced methods has been increasing day by day in the maritime sector, which is one of the sectors least affected by the COVID-19 pandemic. It is aimed to minimize accidents, especially by using advanced methods in the investigation of marine accidents. This research aimed to conduct an exploratory statistical analysis of particular ship accidents in the Transport Safety Investigation Center of Turkey database. 46 ship accidents, which occurred between 2010-2018, have been selected from the database. In addition to the availability of a reliable and comprehensive database, taking advantage of the robust statistical models for investigation is critical to improving the safety of ships. Thus, descriptive analysis has been used in the research to identify causes and conditional factors related to different types of ship accidents. The research outcomes underline the fact that environmental factors and day and night ratio have great influence on ship safety.

Keywords: descriptive analysis, maritime industry, maritime safety, ship accident statistics

Procedia PDF Downloads 142
42548 Modeling and Statistical Analysis of a Soap Production Mix in Bejoy Manufacturing Industry, Anambra State, Nigeria

Authors: Okolie Chukwulozie Paul, Iwenofu Chinwe Onyedika, Sinebe Jude Ebieladoh, M. C. Nwosu

Abstract:

The research work is based on the statistical analysis of the processing data. The essence is to analyze the data statistically and to generate a design model for the production mix of soap manufacturing products in Bejoy manufacturing company Nkpologwu, Aguata Local Government Area, Anambra state, Nigeria. The statistical analysis shows the statistical analysis and the correlation of the data. T test, Partial correlation and bi-variate correlation were used to understand what the data portrays. The design model developed was used to model the data production yield and the correlation of the variables show that the R2 is 98.7%. However, the results confirm that the data is fit for further analysis and modeling. This was proved by the correlation and the R-squared.

Keywords: General Linear Model, correlation, variables, pearson, significance, T-test, soap, production mix and statistic

Procedia PDF Downloads 451
42547 COVID-19 Case: A Definition of Infodemia through Online Italian Journalism

Authors: Concetta Papapicco

Abstract:

The spreading of new Coronavirus (COVID-19) in addition to becoming a global phenomenon, following the declaration of a pandemic state, has generated excessive access to information, sometimes not thoroughly screened, which makes it difficult to navigate a given topic because of the difficulty of finding reliable sources. As a result, there is a high level of contagion, understood as the spread of the virus, but also as the spread of information in a viral and harmful way, which prompted the World Health Organization to coin the term Infodemia to give 'a name' the phenomenon of excessive information. With neologism 'Infodemia', the World Health Organization (OMS) wanted, in these days when fear of the coronavirus is raging, point out that perhaps the greatest danger of global society in the age of social media. This phenomenon is the distortion of reality in the rumble of echoes and comments of the global community on real or often invented facts. The general purpose of the exploratory study is to investigate how the coronavirus situation is described from journalistic communication. Starting from La Repubblica online, as a reference journalistic magazine, as a specific objective, the research aims to understand the way in which journalistic communication describes the phenomenon of the COVID-19 virus spread, the spread of contagion and restrictive measures of social distancing in the Italian context. The study starts from the hypothesis that if the circulation of information helps to create a social representation of the phenomenon, the excessive accessibility to sources of information (Infodemia) can be modulated by the 'how' the phenomenon is described by the journalists. The methodology proposed, in fact, in the exploratory study is a quanti-qualitative (mixed) method. A Content Analysis with the SketchEngine software is carried out first. In support of the Content Analysis, a Diatextual Analysis was carried out. The Diatextual Analysis is a qualitative analysis useful to detect in the analyzed texts, that is the online articles of La Repubblica on the topic of coronavirus, Subjectivity, Argomentativity, and Mode. The research focuses mainly on 'Mode' or 'How' are the events related to coronavirus in the online articles of La Repubblica about COVID-19 phenomenon. The results show the presence of the contrast vision about COVID-19 situation in Italy.

Keywords: coronavirus, Italian infodemia, La Republica online, mix method

Procedia PDF Downloads 127
42546 Saving Energy at a Wastewater Treatment Plant through Electrical and Production Data Analysis

Authors: Adriano Araujo Carvalho, Arturo Alatrista Corrales

Abstract:

This paper intends to show how electrical energy consumption and production data analysis were used to find opportunities to save energy at Taboada wastewater treatment plant in Callao, Peru. In order to access the data, it was used independent data networks for both electrical and process instruments, which were taken to analyze under an ISO 50001 energy audit, which considered, thus, Energy Performance Indexes for each process and a step-by-step guide presented in this text. Due to the use of aforementioned methodology and data mining techniques applied on information gathered through electronic multimeters (conveniently placed on substation switchboards connected to a cloud network), it was possible to identify thoroughly the performance of each process and thus, evidence saving opportunities which were previously hidden before. The data analysis brought both costs and energy reduction, allowing the plant to save significant resources and to be certified under ISO 50001.

Keywords: energy and production data analysis, energy management, ISO 50001, wastewater treatment plant energy analysis

Procedia PDF Downloads 201
42545 MyAds: A Social Adaptive System for Online Advertisment from Hypotheses to Implementation

Authors: Dana A. Al Qudah, Alexandra I. Critea, Rizik M. H. Al Sayyed, Amer Obeidah

Abstract:

Online advertisement is one of the major incomes for many companies; it has a role in the overall business flow and affects the consumer behavior directly. Unfortunately most users tend to block their ads or ignore them. MyAds is a social adaptive hypermedia system for online advertising and its main goal is to explore how to make online ads more acceptable. In order to achieve such a goal, various technologies and techniques are used. This paper presents a theoretical framework as well as the system architecture for MyAds that was designed based on a set of hypotheses and an exploratory study. The system then was implemented and a pilot experiment was conducted to validate it. The main outcomes suggest that the system has provided personalized ads for users. The main implications suggest that the system can be used for further testing and validating.

Keywords: adaptive hypermedia, e-advertisement, social, hypotheses, exploratory study, framework

Procedia PDF Downloads 415
42544 Prompt Design for Code Generation in Data Analysis Using Large Language Models

Authors: Lu Song Ma Li Zhi

Abstract:

With the rapid advancement of artificial intelligence technology, large language models (LLMs) have become a milestone in the field of natural language processing, demonstrating remarkable capabilities in semantic understanding, intelligent question answering, and text generation. These models are gradually penetrating various industries, particularly showcasing significant application potential in the data analysis domain. However, retraining or fine-tuning these models requires substantial computational resources and ample downstream task datasets, which poses a significant challenge for many enterprises and research institutions. Without modifying the internal parameters of the large models, prompt engineering techniques can rapidly adapt these models to new domains. This paper proposes a prompt design strategy aimed at leveraging the capabilities of large language models to automate the generation of data analysis code. By carefully designing prompts, data analysis requirements can be described in natural language, which the large language model can then understand and convert into executable data analysis code, thereby greatly enhancing the efficiency and convenience of data analysis. This strategy not only lowers the threshold for using large models but also significantly improves the accuracy and efficiency of data analysis. Our approach includes requirements for the precision of natural language descriptions, coverage of diverse data analysis needs, and mechanisms for immediate feedback and adjustment. Experimental results show that with this prompt design strategy, large language models perform exceptionally well in multiple data analysis tasks, generating high-quality code and significantly shortening the data analysis cycle. This method provides an efficient and convenient tool for the data analysis field and demonstrates the enormous potential of large language models in practical applications.

Keywords: large language models, prompt design, data analysis, code generation

Procedia PDF Downloads 48
42543 Career Anchors and Domain Specialization in Management Education: A Deviation Analysis

Authors: Santosh Kumar Sharma

Abstract:

In view of management education with special reference to India, it has been noted that students have deviations between their career anchors and domain of specialization. As a consequence, they face problems in their summer internships and placements in the corporate sector. Eventually, they either change their career track or leave the management profession, which is a serious concern from the perspective of human capital. However, there is no substantial literature in the given context. Therefore, the present study contributes to the global discourse of management education and its spillover effect on human resource management. The objective of the present study is to analyze the deviation between career anchors and domain specialization with reference to management education in India. The present study is exploratory in nature, wherein data has been collected from a significant number of post-graduate students who are pursuing management education from a premium business school in India, followed by descriptive analysis. The present research contributes to the professional development of management students from the perspective of human capital, which is eventually related to various factors of the Indian economy.

Keywords: India, management education, domain specialization, placements

Procedia PDF Downloads 91
42542 A Method of Detecting the Difference in Two States of Brain Using Statistical Analysis of EEG Raw Data

Authors: Digvijaysingh S. Bana, Kiran R. Trivedi

Abstract:

This paper introduces various methods for the alpha wave to detect the difference between two states of brain. One healthy subject participated in the experiment. EEG was measured on the forehead above the eye (FP1 Position) with reference and ground electrode are on the ear clip. The data samples are obtained in the form of EEG raw data. The time duration of reading is of one minute. Various test are being performed on the alpha band EEG raw data.The readings are performed in different time duration of the entire day. The statistical analysis is being carried out on the EEG sample data in the form of various tests.

Keywords: electroencephalogram(EEG), biometrics, authentication, EEG raw data

Procedia PDF Downloads 469
42541 Urban Ecotourism Development in Borderlands: An Exploratory Study of Xishuangbanna Dai Autonomous Prefecture, China

Authors: Min Liu, Thanapauge Chamaratana

Abstract:

Integrating ecotourism into urban borderlands holds significant potential for promoting sustainable development, enhancing cross-border cooperation, and preserving cultural and natural heritage. This study aims to evaluate the current status and strategic measures for sustainable ecotourism development in the border urban areas of Xishuangbanna, leveraging the unique opportunities and challenges presented by its policy and geographical location. Employing a qualitative research approach, the exploratory study utilizes documentary research, observation, and in-depth interviews with 20 key stakeholders, including local government officials, tourism operators, community members, and tourists. Content analysis is conducted to interpret the collected data. The findings reveal that Xishuangbanna holds significant potential for ecotourism due to its rich biodiversity, cultural heritage, and strategic location along the Belt and Road Initiative route. The integration of ecotourism can drive economic growth, create employment opportunities, and foster a deeper appreciation for conservation efforts. By promoting ecotourism practices, the region can attract environmentally conscious travelers, thereby contributing to global sustainability goals. However, challenges such as inadequate infrastructure, limited community involvement, and environmental concerns are also identified. The study recommends enhancing ecotourism development in urban borderlands through integrated planning, stakeholder collaboration, and sustainable practices. These measures are essential to ensure long-term benefits for both the local community and the environment. Moreover, the study underscores the importance of a holistic approach to ecotourism development, which balances economic, social, and environmental priorities to achieve sustainable outcomes for urban borderlands.

Keywords: ecotourism, sustainable tourism, urban, borderland

Procedia PDF Downloads 34
42540 Application of Blockchain Technology in Geological Field

Authors: Mengdi Zhang, Zhenji Gao, Ning Kang, Rongmei Liu

Abstract:

Management and application of geological big data is an important part of China's national big data strategy. With the implementation of a national big data strategy, geological big data management becomes more and more critical. At present, there are still a lot of technology barriers as well as cognition chaos in many aspects of geological big data management and application, such as data sharing, intellectual property protection, and application technology. Therefore, it’s a key task to make better use of new technologies for deeper delving and wider application of geological big data. In this paper, we briefly introduce the basic principle of blockchain technology at the beginning and then make an analysis of the application dilemma of geological data. Based on the current analysis, we bring forward some feasible patterns and scenarios for the blockchain application in geological big data and put forward serval suggestions for future work in geological big data management.

Keywords: blockchain, intellectual property protection, geological data, big data management

Procedia PDF Downloads 97
42539 A Review on Existing Challenges of Data Mining and Future Research Perspectives

Authors: Hema Bhardwaj, D. Srinivasa Rao

Abstract:

Technology for analysing, processing, and extracting meaningful data from enormous and complicated datasets can be termed as "big data." The technique of big data mining and big data analysis is extremely helpful for business movements such as making decisions, building organisational plans, researching the market efficiently, improving sales, etc., because typical management tools cannot handle such complicated datasets. Special computational and statistical issues, such as measurement errors, noise accumulation, spurious correlation, and storage and scalability limitations, are brought on by big data. These unique problems call for new computational and statistical paradigms. This research paper offers an overview of the literature on big data mining, its process, along with problems and difficulties, with a focus on the unique characteristics of big data. Organizations have several difficulties when undertaking data mining, which has an impact on their decision-making. Every day, terabytes of data are produced, yet only around 1% of that data is really analyzed. The idea of the mining and analysis of data and knowledge discovery techniques that have recently been created with practical application systems is presented in this study. This article's conclusion also includes a list of issues and difficulties for further research in the area. The report discusses the management's main big data and data mining challenges.

Keywords: big data, data mining, data analysis, knowledge discovery techniques, data mining challenges

Procedia PDF Downloads 114
42538 Truthful or Untruthful Social Media Posts: Applying Statement Analysis to Decode online Deception

Authors: Christa L. Arnold, Margaret C. Stewart

Abstract:

This research shares the results of an exploratory study examining Statement Analysis (SA) to detect deception in online truthful and untruthful social media posts. Applying a Law Enforcement methodology SA, used in criminal interview statements, this research analyzes what is stated to assist in evaluating written deceptive information. Preliminary findings reveal qualitative and quantitative nuances for SA in online deception detection and uncover insights regarding digital deceptive behavior. Thus far, findings reveal truthful statements tend to differ from untruthful statements in both content and quality.

Keywords: deception detection, online deception, social media content, statement analysis

Procedia PDF Downloads 68
42537 Field Environment Sensing and Modeling for Pears towards Precision Agriculture

Authors: Tatsuya Yamazaki, Kazuya Miyakawa, Tomohiko Sugiyama, Toshitaka Iwatani

Abstract:

The introduction of sensor technologies into agriculture is a necessary step to realize Precision Agriculture. Although sensing methodologies themselves have been prevailing owing to miniaturization and reduction in costs of sensors, there are some difficulties to analyze and understand the sensing data. Targeting at pears ’Le Lectier’, which is particular to Niigata in Japan, cultivation environmental data have been collected at pear fields by eight sorts of sensors: field temperature, field humidity, rain gauge, soil water potential, soil temperature, soil moisture, inner-bag temperature, and inner-bag humidity sensors. With regard to the inner-bag temperature and humidity sensors, they are used to measure the environment inside the fruit bag used for pre-harvest bagging of pears. In this experiment, three kinds of fruit bags were used for the pre-harvest bagging. After over 100 days continuous measurement, volumes of sensing data have been collected. Firstly, correlation analysis among sensing data measured by respective sensors reveals that one sensor can replace another sensor so that more efficient and cost-saving sensing systems can be proposed to pear farmers. Secondly, differences in characteristic and performance of the three kinds of fruit bags are clarified by the measurement results by the inner-bag environmental sensing. It is found that characteristic and performance of the inner-bags significantly differ from each other by statistical analysis. Lastly, a relational model between the sensing data and the pear outlook quality is established by use of Structural Equation Model (SEM). Here, the pear outlook quality is related with existence of stain, blob, scratch, and so on caused by physiological impair or diseases. Conceptually SEM is a combination of exploratory factor analysis and multiple regression. By using SEM, a model is constructed to connect independent and dependent variables. The proposed SEM model relates the measured sensing data and the pear outlook quality determined on the basis of farmer judgement. In particularly, it is found that the inner-bag humidity variable relatively affects the pear outlook quality. Therefore, inner-bag humidity sensing might help the farmers to control the pear outlook quality. These results are supported by a large quantity of inner-bag humidity data measured over the years 2014, 2015, and 2016. The experimental and analytical results in this research contribute to spreading Precision Agriculture technologies among the farmers growing ’Le Lectier’.

Keywords: precision agriculture, pre-harvest bagging, sensor fusion, structural equation model

Procedia PDF Downloads 318
42536 EFL Teacher Cognition and Learner Autonomy: An Exploratory Study into Algerian Teachers’ Understanding of Learner Autonomy

Authors: Linda Ghout

Abstract:

The main aim of the present case study was to explore EFL teachers’ understanding of learner autonomy. Thus, it sought to uncover how teachers at the de Department of English, University of Béjaia, Algeria view the process of language learning, their learners’ roles, their own roles and their practices to promote learner autonomy. For data collection, firstly, a questionnaire was designed and administered to all the teachers in the department. Secondly, interviews were conducted with some volunteers for the sake of clarifying emerging issues and digging deeper into some of the teachers’ answers to the questionnaire. The analysis revealed interesting data pertaining to the teachers’ cognition and its effects on their teaching practices. With regard to their views of language learning, it seems that the participants hold discrete views which are in opposition with the principles of learner autonomy. The teachers seemed to have a limited knowledge of the characteristics of autonomous learners and autonomy- based methodology. When it comes to teachers’ practices to promote autonomy in their classes, the majority reported that the most effective way is to ask students to search for information on their own. However, in defining their roles in the EFL learning process, most of the respondents claimed that teachers should play the role of facilitators.

Keywords: English, learner autonomy, learning process, teacher cognition

Procedia PDF Downloads 392
42535 Data and Spatial Analysis for Economy and Education of 28 E.U. Member-States for 2014

Authors: Alexiou Dimitra, Fragkaki Maria

Abstract:

The objective of the paper is the study of geographic, economic and educational variables and their contribution to determine the position of each member-state among the EU-28 countries based on the values of seven variables as given by Eurostat. The Data Analysis methods of Multiple Factorial Correspondence Analysis (MFCA) Principal Component Analysis and Factor Analysis have been used. The cross tabulation tables of data consist of the values of seven variables for the 28 countries for 2014. The data are manipulated using the CHIC Analysis V 1.1 software package. The results of this program using MFCA and Ascending Hierarchical Classification are given in arithmetic and graphical form. For comparison reasons with the same data the Factor procedure of Statistical package IBM SPSS 20 has been used. The numerical and graphical results presented with tables and graphs, demonstrate the agreement between the two methods. The most important result is the study of the relation between the 28 countries and the position of each country in groups or clouds, which are formed according to the values of the corresponding variables.

Keywords: Multiple Factorial Correspondence Analysis, Principal Component Analysis, Factor Analysis, E.U.-28 countries, Statistical package IBM SPSS 20, CHIC Analysis V 1.1 Software, Eurostat.eu Statistics

Procedia PDF Downloads 516
42534 Analysis of Big Data

Authors: Sandeep Sharma, Sarabjit Singh

Abstract:

As per the user demand and growth trends of large free data the storage solutions are now becoming more challenge-able to protect, store and to retrieve data. The days are not so far when the storage companies and organizations are start saying 'no' to store our valuable data or they will start charging a huge amount for its storage and protection. On the other hand as per the environmental conditions it becomes challenge-able to maintain and establish new data warehouses and data centers to protect global warming threats. A challenge of small data is over now, the challenges are big that how to manage the exponential growth of data. In this paper we have analyzed the growth trend of big data and its future implications. We have also focused on the impact of the unstructured data on various concerns and we have also suggested some possible remedies to streamline big data.

Keywords: big data, unstructured data, volume, variety, velocity

Procedia PDF Downloads 552
42533 Survey on Arabic Sentiment Analysis in Twitter

Authors: Sarah O. Alhumoud, Mawaheb I. Altuwaijri, Tarfa M. Albuhairi, Wejdan M. Alohaideb

Abstract:

Large-scale data stream analysis has become one of the important business and research priorities lately. Social networks like Twitter and other micro-blogging platforms hold an enormous amount of data that is large in volume, velocity and variety. Extracting valuable information and trends out of these data would aid in a better understanding and decision-making. Multiple analysis techniques are deployed for English content. Moreover, one of the languages that produce a large amount of data over social networks and is least analyzed is the Arabic language. The proposed paper is a survey on the research efforts to analyze the Arabic content in Twitter focusing on the tools and methods used to extract the sentiments for the Arabic content on Twitter.

Keywords: big data, social networks, sentiment analysis, twitter

Procedia PDF Downloads 585
42532 Data Mining Meets Educational Analysis: Opportunities and Challenges for Research

Authors: Carla Silva

Abstract:

Recent development of information and communication technology enables us to acquire, collect, analyse data in various fields of socioeconomic – technological systems. Along with the increase of economic globalization and the evolution of information technology, data mining has become an important approach for economic data analysis. As a result, there has been a critical need for automated approaches to effective and efficient usage of massive amount of educational data, in order to support institutions to a strategic planning and investment decision-making. In this article, we will address data from several different perspectives and define the applied data to sciences. Many believe that 'big data' will transform business, government, and other aspects of the economy. We discuss how new data may impact educational policy and educational research. Large scale administrative data sets and proprietary private sector data can greatly improve the way we measure, track, and describe educational activity and educational impact. We also consider whether the big data predictive modeling tools that have emerged in statistics and computer science may prove useful in educational and furthermore in economics. Finally, we highlight a number of challenges and opportunities for future research.

Keywords: data mining, research analysis, investment decision-making, educational research

Procedia PDF Downloads 362
42531 Philippine English: An Exploratory Mixed-Methods Inquiry on Digital Immigrants and Digital Natives' Variety

Authors: Lesley Karen Penera

Abstract:

Despite the countless that has been drawn to investigate Philippine English for a myriad of reasons, none was known to have ventured on a probe of its grammatical features as used in a technology-driven linguistic landscape by two generations in the digital age. Propelled by the assumption of an emerging Philippine English variety, this paper determined the grammatical features that characterize the digital native-immigrants’ Philippine English. It also ascertained whether mistake or deviation instigated the use of the features, and established this variety’s level of comprehensibility. This exploratory mixed-methods inquiry employed some qualitative and quantitative data drawn from a social networking site, the digital native-immigrant group, and the comprehensibility-raters who were selected through non-random purposive sampling. The study yields 8 grammatical features, mostly deemed results of deviation, yet the texts characterized by such features were mostly rated with excellent comprehensibility. This substantiates some of the grammatical features identified in earlier studies, provides evidentiary proof that the digital groups’ Philippine English is not bound by the standard of syntactic accuracy and corroborates the assertion on language’s manipulability as an instrument fashioned to satisfy the users’ need for successful communication in actual instances for use of English past the walls of any university where the variety is cultivated. The same could also be rationalized by some respondents’ position on grammar and accuracy to be less vital than one’s facility to communicate effectively.

Keywords: comprehensibility, deviation, digital immigrants, digital natives, mistake, Philippine English variety

Procedia PDF Downloads 164
42530 Consumer Behavior Towards Online Shopping in Kuwait: A Quantitative Analysis

Authors: Mitra Arami

Abstract:

The main objective of this paper is to identify the factors that influence Kuwaiti consumers’ behavior towards online shopping. A survey was conducted among B2C e-commerce customers using a structured self-administered questionnaire. The findings of this study show that B2C e-commerce customer behavior in Kuwait is strongly influenced by customer entertainment but weakly influenced by customer trust. While the overall research project involves exploratory research using mixed methods, the focus of this paper is on a quantitative analysis of responses obtained from a survey of Kuwaiti customers, with the design of the questionnaire instrument being based on the findings of a qualitative analysis. The main findings of the analysis include a list of key factors that affect Kuwait online shoppers, and quantitative indications of the relative strengths of the various relationships. This study provides a basis for further research and more in depth studies to find the scope of online shopping in Kuwait especially, the influence of hedonic and utilitarian motivations on user engagement.

Keywords: e-commerce, online shopping, customer behavior, quantitative analysis, Kuwait

Procedia PDF Downloads 385