Search results for: Twitter data clustering
24533 Multivariate Assessment of Mathematics Test Scores of Students in Qatar
Authors: Ali Rashash Alzahrani, Elizabeth Stojanovski
Abstract:
Data on various aspects of education are collected at the institutional and government level regularly. In Australia, for example, students at various levels of schooling undertake examinations in numeracy and literacy as part of NAPLAN testing, enabling longitudinal assessment of such data as well as comparisons between schools and states within Australia. Another source of educational data collected internationally is via the PISA study which collects data from several countries when students are approximately 15 years of age and enables comparisons in the performance of science, mathematics and English between countries as well as ranking of countries based on performance in these standardised tests. As well as student and school outcomes based on the tests taken as part of the PISA study, there is a wealth of other data collected in the study including parental demographics data and data related to teaching strategies used by educators. Overall, an abundance of educational data is available which has the potential to be used to help improve educational attainment and teaching of content in order to improve learning outcomes. A multivariate assessment of such data enables multiple variables to be considered simultaneously and will be used in the present study to help develop profiles of students based on performance in mathematics using data obtained from the PISA study.Keywords: cluster analysis, education, mathematics, profiles
Procedia PDF Downloads 12324532 A Risk Assessment Tool for the Contamination of Aflatoxins on Dried Figs Based on Machine Learning Algorithms
Authors: Kottaridi Klimentia, Demopoulos Vasilis, Sidiropoulos Anastasios, Ihara Diego, Nikolaidis Vasileios, Antonopoulos Dimitrios
Abstract:
Aflatoxins are highly poisonous and carcinogenic compounds produced by species of the genus Aspergillus spp. that can infect a variety of agricultural foods, including dried figs. Biological and environmental factors, such as population, pathogenicity, and aflatoxinogenic capacity of the strains, topography, soil, and climate parameters of the fig orchards, are believed to have a strong effect on aflatoxin levels. Existing methods for aflatoxin detection and measurement, such as high performance liquid chromatography (HPLC), and enzyme-linked immunosorbent assay (ELISA), can provide accurate results, but the procedures are usually time-consuming, sample-destructive, and expensive. Predicting aflatoxin levels prior to crop harvest is useful for minimizing the health and financial impact of a contaminated crop. Consequently, there is interest in developing a tool that predicts aflatoxin levels based on topography and soil analysis data of fig orchards. This paper describes the development of a risk assessment tool for the contamination of aflatoxin on dried figs, based on the location and altitude of the fig orchards, the population of the fungus Aspergillus spp. in the soil, and soil parameters such as pH, saturation percentage (SP), electrical conductivity (EC), organic matter, particle size analysis (sand, silt, clay), the concentration of the exchangeable cations (Ca, Mg, K, Na), extractable P, and trace of elements (B, Fe, Mn, Zn and Cu), by employing machine learning methods. In particular, our proposed method integrates three machine learning techniques, i.e., dimensionality reduction on the original dataset (principal component analysis), metric learning (Mahalanobis metric for clustering), and k-nearest neighbors learning algorithm (KNN), into an enhanced model, with mean performance equal to 85% by terms of the Pearson correlation coefficient (PCC) between observed and predicted values.Keywords: aflatoxins, Aspergillus spp., dried figs, k-nearest neighbors, machine learning, prediction
Procedia PDF Downloads 18224531 Analysis Of Non-uniform Characteristics Of Small Underwater Targets Based On Clustering
Authors: Tianyang Xu
Abstract:
Small underwater targets generally have a non-centrosymmetric geometry, and the acoustic scattering field of the target has spatial inhomogeneity under active sonar detection conditions. In view of the above problems, this paper takes the hemispherical cylindrical shell as the research object, and considers the angle continuity implied in the echo characteristics, and proposes a cluster-driven research method for the non-uniform characteristics of target echo angle. First, the target echo features are extracted, and feature vectors are constructed. Secondly, the t-SNE algorithm is used to improve the internal connection of the feature vector in the low-dimensional feature space and to construct the visual feature space. Finally, the implicit angular relationship between echo features is extracted under unsupervised condition by cluster analysis. The reconstruction results of the local geometric structure of the target corresponding to different categories show that the method can effectively divide the angle interval of the local structure of the target according to the natural acoustic scattering characteristics of the target.Keywords: underwater target;, non-uniform characteristics;, cluster-driven method;, acoustic scattering characteristics
Procedia PDF Downloads 12724530 Dataset Quality Index:Development of Composite Indicator Based on Standard Data Quality Indicators
Authors: Sakda Loetpiparwanich, Preecha Vichitthamaros
Abstract:
Nowadays, poor data quality is considered one of the majority costs for a data project. The data project with data quality awareness almost as much time to data quality processes while data project without data quality awareness negatively impacts financial resources, efficiency, productivity, and credibility. One of the processes that take a long time is defining the expectations and measurements of data quality because the expectation is different up to the purpose of each data project. Especially, big data project that maybe involves with many datasets and stakeholders, that take a long time to discuss and define quality expectations and measurements. Therefore, this study aimed at developing meaningful indicators to describe overall data quality for each dataset to quick comparison and priority. The objectives of this study were to: (1) Develop a practical data quality indicators and measurements, (2) Develop data quality dimensions based on statistical characteristics and (3) Develop Composite Indicator that can describe overall data quality for each dataset. The sample consisted of more than 500 datasets from public sources obtained by random sampling. After datasets were collected, there are five steps to develop the Dataset Quality Index (SDQI). First, we define standard data quality expectations. Second, we find any indicators that can measure directly to data within datasets. Thirdly, each indicator aggregates to dimension using factor analysis. Next, the indicators and dimensions were weighted by an effort for data preparing process and usability. Finally, the dimensions aggregate to Composite Indicator. The results of these analyses showed that: (1) The developed useful indicators and measurements contained ten indicators. (2) the developed data quality dimension based on statistical characteristics, we found that ten indicators can be reduced to 4 dimensions. (3) The developed Composite Indicator, we found that the SDQI can describe overall datasets quality of each dataset and can separate into 3 Level as Good Quality, Acceptable Quality, and Poor Quality. The conclusion, the SDQI provide an overall description of data quality within datasets and meaningful composition. We can use SQDI to assess for all data in the data project, effort estimation, and priority. The SDQI also work well with Agile Method by using SDQI to assessment in the first sprint. After passing the initial evaluation, we can add more specific data quality indicators into the next sprint.Keywords: data quality, dataset quality, data quality management, composite indicator, factor analysis, principal component analysis
Procedia PDF Downloads 13824529 Predictive Analysis for Big Data: Extension of Classification and Regression Trees Algorithm
Authors: Ameur Abdelkader, Abed Bouarfa Hafida
Abstract:
Since its inception, predictive analysis has revolutionized the IT industry through its robustness and decision-making facilities. It involves the application of a set of data processing techniques and algorithms in order to create predictive models. Its principle is based on finding relationships between explanatory variables and the predicted variables. Past occurrences are exploited to predict and to derive the unknown outcome. With the advent of big data, many studies have suggested the use of predictive analytics in order to process and analyze big data. Nevertheless, they have been curbed by the limits of classical methods of predictive analysis in case of a large amount of data. In fact, because of their volumes, their nature (semi or unstructured) and their variety, it is impossible to analyze efficiently big data via classical methods of predictive analysis. The authors attribute this weakness to the fact that predictive analysis algorithms do not allow the parallelization and distribution of calculation. In this paper, we propose to extend the predictive analysis algorithm, Classification And Regression Trees (CART), in order to adapt it for big data analysis. The major changes of this algorithm are presented and then a version of the extended algorithm is defined in order to make it applicable for a huge quantity of data.Keywords: predictive analysis, big data, predictive analysis algorithms, CART algorithm
Procedia PDF Downloads 13924528 Canopy Temperature Acquired from Daytime and Nighttime Aerial Data as an Indicator of Trees’ Health Status
Authors: Agata Zakrzewska, Dominik Kopeć, Adrian Ochtyra
Abstract:
The growing number of new cameras, sensors, and research methods allow for a broader application of thermal data in remote sensing vegetation studies. The aim of this research was to check whether it is possible to use thermal infrared data with a spectral range (3.6-4.9 μm) obtained during the day and the night to assess the health condition of selected species of deciduous trees in an urban environment. For this purpose, research was carried out in the city center of Warsaw (Poland) in 2020. During the airborne data acquisition, thermal data, laser scanning, and orthophoto map images were collected. Synchronously with airborne data, ground reference data were obtained for 617 studied species (Acer platanoides, Acer pseudoplatanus, Aesculus hippocastanum, Tilia cordata, and Tilia × euchlora) in different health condition states. The results were as follows: (i) healthy trees are cooler than trees in poor condition and dying both in the daytime and nighttime data; (ii) the difference in the canopy temperatures between healthy and dying trees was 1.06oC of mean value on the nighttime data and 3.28oC of mean value on the daytime data; (iii) condition classes significantly differentiate on both daytime and nighttime thermal data, but only on daytime data all condition classes differed statistically significantly from each other. In conclusion, the aerial thermal data can be considered as an alternative to hyperspectral data, a method of assessing the health condition of trees in an urban environment. Especially data obtained during the day, which can differentiate condition classes better than data obtained at night. The method based on thermal infrared and laser scanning data fusion could be a quick and efficient solution for identifying trees in poor health that should be visually checked in the field.Keywords: middle wave infrared, thermal imagery, tree discoloration, urban trees
Procedia PDF Downloads 11424527 Analyzing the Efficiency of Initiatives Taken against Disinformation during Election Campaigns: Case Study of Young Voters
Authors: Fatima-Zohra Ghedir
Abstract:
Social media platforms have been actively working on solutions and combined their efforts with media, policy makers, educators and researchers to protect citizens and prevent interferences in information, political discourses and elections. Facebook, for instance, deleted fake accounts, implemented fake accounts and fake content detection algorithms, partnered with news agencies to manually fact check content and changed its newsfeeds display. Twitter and Instagram regularly communicate on their efforts and notify their users of improvements and safety guidelines. More funds have been allocated to media literacy programs to empower citizens in prevision of the coming elections. This paper investigates the efficiency of these initiatives and analyzes the metrics to measure their success or failure. The objective is also to determine the segments of population more prone to fall in disinformation traps during the elections despite the measures taken over the last four years. This study will also examine the groups who were positively impacted by these measures. This paper relies on both desk and field methodologies. For this study, a survey was administered to French students aged between 17 and 29 years old. Semi-guided interviews were conducted on a similar audience. The analysis of the survey and of the interviews show that respondents were exposed to the initiatives described above and are aware of the existence of disinformation issues. However, they do not understand what disinformation really entails or means. For instance, for most of them, disinformation is synonymous of the opposite point of view without taking into account the truthfulness of the content. Besides, they still consume and believe the information shared by their friends and family, with little questioning about the ways their closed ones get informed.Keywords: democratic elections, disinformation, foreign interference, social media, success metrics
Procedia PDF Downloads 10624526 End to End Monitoring in Oracle Fusion Middleware for Data Verification
Authors: Syed Kashif Ali, Usman Javaid, Abdullah Chohan
Abstract:
In large enterprises multiple departments use different sort of information systems and databases according to their needs. These systems are independent and heterogeneous in nature and sharing information/data between these systems is not an easy task. The usage of middleware technologies have made data sharing between systems very easy. However, monitoring the exchange of data/information for verification purposes between target and source systems is often complex or impossible for maintenance department due to security/access privileges on target and source systems. In this paper, we are intended to present our experience of an end to end data monitoring approach at middle ware level implemented in Oracle BPEL for data verification without any help of monitoring tool.Keywords: service level agreement, SOA, BPEL, oracle fusion middleware, web service monitoring
Procedia PDF Downloads 47824525 WiFi Data Offloading: Bundling Method in a Canvas Business Model
Authors: Majid Mokhtarnia, Alireza Amini
Abstract:
Mobile operators deal with increasing in the data traffic as a critical issue. As a result, a vital responsibility of the operators is to deal with such a trend in order to create added values. This paper addresses a bundling method in a Canvas business model in a WiFi Data Offloading (WDO) strategy by which some elements of the model may be affected. In the proposed method, it is supposed to sell a number of data packages for subscribers in which there are some packages with a free given volume of data-offloaded WiFi complimentary. The paper on hands analyses this method in the views of attractiveness and profitability. The results demonstrate that the quality of implementation of the WDO strongly affects the final result and helps the decision maker to make the best one.Keywords: bundling, canvas business model, telecommunication, WiFi data offloading
Procedia PDF Downloads 19824524 Distributed Perceptually Important Point Identification for Time Series Data Mining
Authors: Tak-Chung Fu, Ying-Kit Hung, Fu-Lai Chung
Abstract:
In the field of time series data mining, the concept of the Perceptually Important Point (PIP) identification process is first introduced in 2001. This process originally works for financial time series pattern matching and it is then found suitable for time series dimensionality reduction and representation. Its strength is on preserving the overall shape of the time series by identifying the salient points in it. With the rise of Big Data, time series data contributes a major proportion, especially on the data which generates by sensors in the Internet of Things (IoT) environment. According to the nature of PIP identification and the successful cases, it is worth to further explore the opportunity to apply PIP in time series ‘Big Data’. However, the performance of PIP identification is always considered as the limitation when dealing with ‘Big’ time series data. In this paper, two distributed versions of PIP identification based on the Specialized Binary (SB) Tree are proposed. The proposed approaches solve the bottleneck when running the PIP identification process in a standalone computer. Improvement in term of speed is obtained by the distributed versions.Keywords: distributed computing, performance analysis, Perceptually Important Point identification, time series data mining
Procedia PDF Downloads 43124523 From News Breakers to News Followers: The Influence of Facebook on the Coverage of the January 2010 Crisis in Jos
Authors: T. Obateru, Samuel Olaniran
Abstract:
In an era when the new media is affording easy access to packaging and dissemination of information, the social media have become a popular avenue for sharing information for good or ill. It is evident that the traditional role of journalists as ‘news breakers’ is fast being eroded. People now share information on happenings via the social media like Facebook, Twitter and the rest, such that journalists themselves now get leads on happenings from such sources. Beyond the access to information provided by the new media is the erosion of the gatekeeping role of journalists who by their training and calling, are supposed to handle information with responsibility. Thus, sensitive information that journalists would normally filter is randomly shared by social media activists. This was the experience of journalists in Jos, Plateau State in January 2010 when another of the recurring ethnoreligious crisis that engulfed the state resulted in another widespread killing, vandalism, looting, and displacements. Considered as one of the high points of crises in the state, journalists who had the duty of covering the crisis also relied on some of these sources to get their bearing on the violence. This paper examined the role of Facebook in the work of journalists who covered the 2010 crisis. Taking the gatekeeping perspective, it interrogated the extent to which Facebook impacted their professional duty positively or negatively vis-à-vis the peace journalism model. It employed survey to elicit information from 50 journalists who covered the crisis using questionnaire as instrument. The paper revealed that the dissemination of hate information via mobile phones and social media, especially Facebook, aggravated the crisis situation. Journalists became news followers rather than news breakers because a lot of them were put on their toes by information (many of which were inaccurate or false) circulated on Facebook. It recommended that journalists must remain true to their calling by upholding their ‘gatekeeping’ role of disseminating only accurate and responsible information if they would remain the main source of credible information on which their audience rely.Keywords: crisis, ethnoreligious, Facebook, journalists
Procedia PDF Downloads 29324522 Times2D: A Time-Frequency Method for Time Series Forecasting
Authors: Reza Nematirad, Anil Pahwa, Balasubramaniam Natarajan
Abstract:
Time series data consist of successive data points collected over a period of time. Accurate prediction of future values is essential for informed decision-making in several real-world applications, including electricity load demand forecasting, lifetime estimation of industrial machinery, traffic planning, weather prediction, and the stock market. Due to their critical relevance and wide application, there has been considerable interest in time series forecasting in recent years. However, the proliferation of sensors and IoT devices, real-time monitoring systems, and high-frequency trading data introduce significant intricate temporal variations, rapid changes, noise, and non-linearities, making time series forecasting more challenging. Classical methods such as Autoregressive integrated moving average (ARIMA) and Exponential Smoothing aim to extract pre-defined temporal variations, such as trends and seasonality. While these methods are effective for capturing well-defined seasonal patterns and trends, they often struggle with more complex, non-linear patterns present in real-world time series data. In recent years, deep learning has made significant contributions to time series forecasting. Recurrent Neural Networks (RNNs) and their variants, such as Long short-term memory (LSTMs) and Gated Recurrent Units (GRUs), have been widely adopted for modeling sequential data. However, they often suffer from the locality, making it difficult to capture local trends and rapid fluctuations. Convolutional Neural Networks (CNNs), particularly Temporal Convolutional Networks (TCNs), leverage convolutional layers to capture temporal dependencies by applying convolutional filters along the temporal dimension. Despite their advantages, TCNs struggle with capturing relationships between distant time points due to the locality of one-dimensional convolution kernels. Transformers have revolutionized time series forecasting with their powerful attention mechanisms, effectively capturing long-term dependencies and relationships between distant time points. However, the attention mechanism may struggle to discern dependencies directly from scattered time points due to intricate temporal patterns. Lastly, Multi-Layer Perceptrons (MLPs) have also been employed, with models like N-BEATS and LightTS demonstrating success. Despite this, MLPs often face high volatility and computational complexity challenges in long-horizon forecasting. To address intricate temporal variations in time series data, this study introduces Times2D, a novel framework that parallelly integrates 2D spectrogram and derivative heatmap techniques. The spectrogram focuses on the frequency domain, capturing periodicity, while the derivative patterns emphasize the time domain, highlighting sharp fluctuations and turning points. This 2D transformation enables the utilization of powerful computer vision techniques to capture various intricate temporal variations. To evaluate the performance of Times2D, extensive experiments were conducted on standard time series datasets and compared with various state-of-the-art algorithms, including DLinear (2023), TimesNet (2023), Non-stationary Transformer (2022), PatchTST (2023), N-HiTS (2023), Crossformer (2023), MICN (2023), LightTS (2022), FEDformer (2022), FiLM (2022), SCINet (2022a), Autoformer (2021), and Informer (2021) under the same modeling conditions. The initial results demonstrated that Times2D achieves consistent state-of-the-art performance in both short-term and long-term forecasting tasks. Furthermore, the generality of the Times2D framework allows it to be applied to various tasks such as time series imputation, clustering, classification, and anomaly detection, offering potential benefits in any domain that involves sequential data analysis.Keywords: derivative patterns, spectrogram, time series forecasting, times2D, 2D representation
Procedia PDF Downloads 4124521 Analysing Techniques for Fusing Multimodal Data in Predictive Scenarios Using Convolutional Neural Networks
Authors: Philipp Ruf, Massiwa Chabbi, Christoph Reich, Djaffar Ould-Abdeslam
Abstract:
In recent years, convolutional neural networks (CNN) have demonstrated high performance in image analysis, but oftentimes, there is only structured data available regarding a specific problem. By interpreting structured data as images, CNNs can effectively learn and extract valuable insights from tabular data, leading to improved predictive accuracy and uncovering hidden patterns that may not be apparent in traditional structured data analysis. In applying a single neural network for analyzing multimodal data, e.g., both structured and unstructured information, significant advantages in terms of time complexity and energy efficiency can be achieved. Converting structured data into images and merging them with existing visual material offers a promising solution for applying CNN in multimodal datasets, as they often occur in a medical context. By employing suitable preprocessing techniques, structured data is transformed into image representations, where the respective features are expressed as different formations of colors and shapes. In an additional step, these representations are fused with existing images to incorporate both types of information. This final image is finally analyzed using a CNN.Keywords: CNN, image processing, tabular data, mixed dataset, data transformation, multimodal fusion
Procedia PDF Downloads 12224520 Knowledge Discovery and Data Mining Techniques in Textile Industry
Authors: Filiz Ersoz, Taner Ersoz, Erkin Guler
Abstract:
This paper addresses the issues and technique for textile industry using data mining techniques. Data mining has been applied to the stitching of garments products that were obtained from a textile company. Data mining techniques were applied to the data obtained from the CHAID algorithm, CART algorithm, Regression Analysis and, Artificial Neural Networks. Classification technique based analyses were used while data mining and decision model about the production per person and variables affecting about production were found by this method. In the study, the results show that as the daily working time increases, the production per person also decreases. In addition, the relationship between total daily working and production per person shows a negative result and the production per person show the highest and negative relationship.Keywords: data mining, textile production, decision trees, classification
Procedia PDF Downloads 34924519 Investigation of Delivery of Triple Play Data in GE-PON Fiber to the Home Network
Authors: Ashima Anurag Sharma
Abstract:
Optical fiber based networks can deliver performance that can support the increasing demands for high speed connections. One of the new technologies that have emerged in recent years is Passive Optical Networks. This research paper is targeted to show the simultaneous delivery of triple play service (data, voice, and video). The comparison between various data rates is presented. It is demonstrated that as we increase the data rate, number of users to be decreases due to increase in bit error rate.Keywords: BER, PON, TDMPON, GPON, CWDM, OLT, ONT
Procedia PDF Downloads 52724518 Microarray Gene Expression Data Dimensionality Reduction Using PCA
Authors: Fuad M. Alkoot
Abstract:
Different experimental technologies such as microarray sequencing have been proposed to generate high-resolution genetic data, in order to understand the complex dynamic interactions between complex diseases and the biological system components of genes and gene products. However, the generated samples have a very large dimension reaching thousands. Therefore, hindering all attempts to design a classifier system that can identify diseases based on such data. Additionally, the high overlap in the class distributions makes the task more difficult. The data we experiment with is generated for the identification of autism. It includes 142 samples, which is small compared to the large dimension of the data. The classifier systems trained on this data yield very low classification rates that are almost equivalent to a guess. We aim at reducing the data dimension and improve it for classification. Here, we experiment with applying a multistage PCA on the genetic data to reduce its dimensionality. Results show a significant improvement in the classification rates which increases the possibility of building an automated system for autism detection.Keywords: PCA, gene expression, dimensionality reduction, classification, autism
Procedia PDF Downloads 55924517 Spatial Analysis and Determinants of Number of Antenatal Health Care Visit Among Pregnant Women in Ethiopia: Application of Spatial Multilevel Count Regression Models
Authors: Muluwerk Ayele Derebe
Abstract:
Background: Antenatal care (ANC) is an essential element in the continuum of reproductive health care for preventing preventable pregnancy-related morbidity and mortality. Objective: The aim of this study is to assess the spatial pattern and predictors of ANC visits in Ethiopia. Method: This study was done using Ethiopian Demographic and Health Survey data of 2016 among 7,174 pregnant women aged 15-49 years which was a nationwide community-based cross-sectional survey. Spatial analysis was done using Getis-Ord Gi* statistics to identify hot and cold spot areas of ANC visits. Multilevel glmmTMB packages adjusted for spatial effects were used in R software. Spatial multilevel count regression was conducted to identify predictors of antenatal care visits for pregnant women, and proportional change in variance was done to uncover the effect of individual and community-level factors of ANC visits. Results: The distribution of ANC visits was spatially clustered Moran’s I = 0.271, p<.0.001, ICC = 0.497, p<0.001). The highest spatial outlier areas of ANC visit was found in Amhara (South Wollo, Weast Gojjam, North Shewa), Oromo (west Arsi and East Harariga), Tigray (Central Tigray) and Benishangul-Gumuz (Asosa and Metekel) regions. The data was found with excess zeros (34.6%) and over-dispersed. The expected ANC visit of pregnant women with pregnancy complications was higher at 0.7868 [ARR= 2.1964, 95% CI: 1.8605, 2.5928, p-value <0.0001] compared to pregnant women who had no pregnancy complications. The expected ANC visit of a pregnant woman who lived in a rural area was 1.2254 times higher [ARR=3.4057, 95% CI: 2.1462, 5.4041, p-value <0.0001] as compared to a pregnant woman who lived in an urban. The study found dissimilar clusters with a low number of zero counts for a mean number of ANC visits surrounded by clusters with a higher number of counts of an average number of ANC visits when other variables held constant. Conclusion: This study found that the number of ANC visits in Ethiopia had a spatial pattern associated with socioeconomic, demographic, and geographic risk factors. Spatial clustering of ANC visits exists in all regions of Ethiopia. The predictor age of the mother, religion, mother’s education, husband’s education, mother's occupation, husband's occupation, signs of pregnancy complication, wealth index and marital status had a strong association with the number of ANC visits by each individual. At the community level, place of residence, region, age of the mother, sex of the household head, signs of pregnancy complications and distance to health facility factors had a strong association with the number of ANC visits.Keywords: Ethiopia, ANC, spatial, multilevel, zero inflated Poisson
Procedia PDF Downloads 7324516 Analysis of Relationship between Social Media Conversation and Mainstream Coverage to Mobilize Social Movement
Authors: Sakulsri Srisaracam
Abstract:
Social media has become an important source of information for the public and the media profession. Some social issues raised on social media are picked up by journalists to report on other platforms. This relationship between social media and mainstream media can sometimes drive public debate or stimulate social movements. The question to examine is in what situations can social media conversations raise awareness and stimulate change on public issues. This study addresses the communication patterns of social media conversations driving covert issues into mainstream media and leading to social advocacy movements. In methodological terms, the study findings are based on a content analysis of Facebook, Twitter, news websites and television media reports on three different case studies – saving Bryde’s whale, protests against a government proposal to downsize the Office of Knowledge Management and Development in Thailand, and a dengue fever campaign. These case studies were chosen because they represent issues that most members of the public do not pay much attention to but social media conversations stimulated public debate and calls to action. This study found: 1) Collective social media conversations can stimulate public debate and encourage change at three levels – awareness, public debate, and action of policy and social change. The level depends on the communication patterns of online users and media coverage. 2) Patterns of communication have to be designed to combine social media conversations, online opinion leaders, mainstream media coverage and call to both online and offline action to motivate social change. Thus, this result suggests that social media is a powerful platform for collective communication and setting the agenda on public issues for mainstream media. However, for social change to succeed, social media should be used to mobilize online movements to move offline too.Keywords: public issues, mainstream media, social media, social movement
Procedia PDF Downloads 28024515 Data Science-Based Key Factor Analysis and Risk Prediction of Diabetic
Authors: Fei Gao, Rodolfo C. Raga Jr.
Abstract:
This research proposal will ascertain the major risk factors for diabetes and to design a predictive model for risk assessment. The project aims to improve diabetes early detection and management by utilizing data science techniques, which may improve patient outcomes and healthcare efficiency. The phase relation values of each attribute were used to analyze and choose the attributes that might influence the examiner's survival probability using Diabetes Health Indicators Dataset from Kaggle’s data as the research data. We compare and evaluate eight machine learning algorithms. Our investigation begins with comprehensive data preprocessing, including feature engineering and dimensionality reduction, aimed at enhancing data quality. The dataset, comprising health indicators and medical data, serves as a foundation for training and testing these algorithms. A rigorous cross-validation process is applied, and we assess their performance using five key metrics like accuracy, precision, recall, F1-score, and area under the receiver operating characteristic curve (AUC-ROC). After analyzing the data characteristics, investigate their impact on the likelihood of diabetes and develop corresponding risk indicators.Keywords: diabetes, risk factors, predictive model, risk assessment, data science techniques, early detection, data analysis, Kaggle
Procedia PDF Downloads 7324514 A Methodology to Integrate Data in the Company Based on the Semantic Standard in the Context of Industry 4.0
Authors: Chang Qin, Daham Mustafa, Abderrahmane Khiat, Pierre Bienert, Paulo Zanini
Abstract:
Nowadays, companies are facing lots of challenges in the process of digital transformation, which can be a complex and costly undertaking. Digital transformation involves the collection and analysis of large amounts of data, which can create challenges around data management and governance. Furthermore, it is also challenged to integrate data from multiple systems and technologies. Although with these pains, companies are still pursuing digitalization because by embracing advanced technologies, companies can improve efficiency, quality, decision-making, and customer experience while also creating different business models and revenue streams. In this paper, the issue that data is stored in data silos with different schema and structures is focused. The conventional approaches to addressing this issue involve utilizing data warehousing, data integration tools, data standardization, and business intelligence tools. However, these approaches primarily focus on the grammar and structure of the data and neglect the importance of semantic modeling and semantic standardization, which are essential for achieving data interoperability. In this session, the challenge of data silos in Industry 4.0 is addressed by developing a semantic modeling approach compliant with Asset Administration Shell (AAS) models as an efficient standard for communication in Industry 4.0. The paper highlights how our approach can facilitate the data mapping process and semantic lifting according to existing industry standards such as ECLASS and other industrial dictionaries. It also incorporates the Asset Administration Shell technology to model and map the company’s data and utilize a knowledge graph for data storage and exploration.Keywords: data interoperability in industry 4.0, digital integration, industrial dictionary, semantic modeling
Procedia PDF Downloads 9324513 Big Data Analytics and Data Security in the Cloud via Fully Homomorphic Encryption
Authors: Waziri Victor Onomza, John K. Alhassan, Idris Ismaila, Noel Dogonyaro Moses
Abstract:
This paper describes the problem of building secure computational services for encrypted information in the Cloud Computing without decrypting the encrypted data; therefore, it meets the yearning of computational encryption algorithmic aspiration model that could enhance the security of big data for privacy, confidentiality, availability of the users. The cryptographic model applied for the computational process of the encrypted data is the Fully Homomorphic Encryption Scheme. We contribute theoretical presentations in high-level computational processes that are based on number theory and algebra that can easily be integrated and leveraged in the Cloud computing with detail theoretic mathematical concepts to the fully homomorphic encryption models. This contribution enhances the full implementation of big data analytics based cryptographic security algorithm.Keywords: big data analytics, security, privacy, bootstrapping, homomorphic, homomorphic encryption scheme
Procedia PDF Downloads 37624512 Longevity of Soybean Seeds Submitted to Different Mechanized Harvesting Conditions
Authors: Rute Faria, Digo Moraes, Amanda Santos, Dione Morais, Maria Sartori
Abstract:
Seed vigor is a fundamental component for the good performance of the entire soybean production process. Seeds with mechanical damage at harvest time will be more susceptible to fungal and insect attack during storage, which will invariably reduce their vigor to the field, compromising uniformity and final stand performance. Harvesters, even the most modern ones, when not properly regulated or operated, can cause irreversible damages to the seeds, compromising even their commercialization. Therefore, the control of an efficient harvest is necessary in order to guarantee a good quality final product. In this work, the damage caused by two different harvesters (one rented, and another one) was evaluated, traveling in two speeds (4 and 8 km / h). The design was completely randomized in 2 x 2 factorial, with four replications. To evaluate the physiological quality seed germination and vigor tests were carried out over a period of six months. A multivariate analysis of Principal Components (PCA) and clustering allowed us to verify that the leased machine had better performance in the incidence of immediate damages in the seeds, but after a storage period of 6 months the vigor of these seeds reduced more than own machine evidencing that such a machine would bring more damages to the seeds.Keywords: Glycine max (L.), cluster analysis, PCA, vigor
Procedia PDF Downloads 25424511 Identification of Text Domains and Register Variation through the Analysis of Lexical Distribution in a Bangla Mass Media Text Corpus
Authors: Mahul Bhattacharyya, Niladri Sekhar Dash
Abstract:
The present research paper is an experimental attempt to investigate the nature of variation in the register in three major text domains, namely, social, cultural, and political texts collected from the corpus of Bangla printed mass media texts. This present study uses a corpus of a moderate amount of Bangla mass media text that contains nearly one million words collected from different media sources like newspapers, magazines, advertisements, periodicals, etc. The analysis of corpus data reveals that each text has certain lexical properties that not only control their identity but also mark their uniqueness across the domains. At first, the subject domains of the texts are classified into two parameters namely, ‘Genre' and 'Text Type'. Next, some empirical investigations are made to understand how the domains vary from each other in terms of lexical properties like both function and content words. Here the method of comparative-cum-contrastive matching of lexical load across domains is invoked through word frequency count to track how domain-specific words and terms may be marked as decisive indicators in the act of specifying the textual contexts and subject domains. The study shows that the common lexical stock that percolates across all text domains are quite dicey in nature as their lexicological identity does not have any bearing in the act of specifying subject domains. Therefore, it becomes necessary for language users to anchor upon certain domain-specific lexical items to recognize a text that belongs to a specific text domain. The eventual findings of this study confirm that texts belonging to different subject domains in Bangla news text corpus clearly differ on the parameters of lexical load, lexical choice, lexical clustering, lexical collocation. In fact, based on these parameters, along with some statistical calculations, it is possible to classify mass media texts into different types to mark their relation with regard to the domains they should actually belong. The advantage of this analysis lies in the proper identification of the linguistic factors which will give language users a better insight into the method they employ in text comprehension, as well as construct a systemic frame for designing text identification strategy for language learners. The availability of huge amount of Bangla media text data is useful for achieving accurate conclusions with a certain amount of reliability and authenticity. This kind of corpus-based analysis is quite relevant for a resource-poor language like Bangla, as no attempt has ever been made to understand how the structure and texture of Bangla mass media texts vary due to certain linguistic and extra-linguistic constraints that are actively operational to specific text domains. Since mass media language is assumed to be the most 'recent representation' of the actual use of the language, this study is expected to show how the Bangla news texts reflect the thoughts of the society and how they leave a strong impact on the thought process of the speech community.Keywords: Bangla, corpus, discourse, domains, lexical choice, mass media, register, variation
Procedia PDF Downloads 17324510 Protecting Privacy and Data Security in Online Business
Authors: Bilquis Ferdousi
Abstract:
With the exponential growth of the online business, the threat to consumers’ privacy and data security has become a serious challenge. This literature review-based study focuses on a better understanding of those threats and what legislative measures have been taken to address those challenges. Research shows that people are increasingly involved in online business using different digital devices and platforms, although this practice varies based on age groups. The threat to consumers’ privacy and data security is a serious hindrance in developing trust among consumers in online businesses. There are some legislative measures taken at the federal and state level to protect consumers’ privacy and data security. The study was based on an extensive review of current literature on protecting consumers’ privacy and data security and legislative measures that have been taken.Keywords: privacy, data security, legislation, online business
Procedia PDF Downloads 10424509 An Analysis of Privacy and Security for Internet of Things Applications
Authors: Dhananjay Singh, M. Abdullah-Al-Wadud
Abstract:
The Internet of Things is a concept of a large scale ecosystem of wireless actuators. The actuators are defined as things in the IoT, those which contribute or produces some data to the ecosystem. However, ubiquitous data collection, data security, privacy preserving, large volume data processing, and intelligent analytics are some of the key challenges into the IoT technologies. In order to solve the security requirements, challenges and threats in the IoT, we have discussed a message authentication mechanism for IoT applications. Finally, we have discussed data encryption mechanism for messages authentication before propagating into IoT networks.Keywords: Internet of Things (IoT), message authentication, privacy, security
Procedia PDF Downloads 38124508 Logistic Model Tree and Expectation-Maximization for Pollen Recognition and Grouping
Authors: Endrick Barnacin, Jean-Luc Henry, Jack Molinié, Jimmy Nagau, Hélène Delatte, Gérard Lebreton
Abstract:
Palynology is a field of interest for many disciplines. It has multiple applications such as chronological dating, climatology, allergy treatment, and even honey characterization. Unfortunately, the analysis of a pollen slide is a complicated and time-consuming task that requires the intervention of experts in the field, which is becoming increasingly rare due to economic and social conditions. So, the automation of this task is a necessity. Pollen slides analysis is mainly a visual process as it is carried out with the naked eye. That is the reason why a primary method to automate palynology is the use of digital image processing. This method presents the lowest cost and has relatively good accuracy in pollen retrieval. In this work, we propose a system combining recognition and grouping of pollen. It consists of using a Logistic Model Tree to classify pollen already known by the proposed system while detecting any unknown species. Then, the unknown pollen species are divided using a cluster-based approach. Success rates for the recognition of known species have been achieved, and automated clustering seems to be a promising approach.Keywords: pollen recognition, logistic model tree, expectation-maximization, local binary pattern
Procedia PDF Downloads 18024507 Cognitive Science Based Scheduling in Grid Environment
Authors: N. D. Iswarya, M. A. Maluk Mohamed, N. Vijaya
Abstract:
Grid is infrastructure that allows the deployment of distributed data in large size from multiple locations to reach a common goal. Scheduling data intensive applications becomes challenging as the size of data sets are very huge in size. Only two solutions exist in order to tackle this challenging issue. First, computation which requires huge data sets to be processed can be transferred to the data site. Second, the required data sets can be transferred to the computation site. In the former scenario, the computation cannot be transferred since the servers are storage/data servers with little or no computational capability. Hence, the second scenario can be considered for further exploration. During scheduling, transferring huge data sets from one site to another site requires more network bandwidth. In order to mitigate this issue, this work focuses on incorporating cognitive science in scheduling. Cognitive Science is the study of human brain and its related activities. Current researches are mainly focused on to incorporate cognitive science in various computational modeling techniques. In this work, the problem solving approach of human brain is studied and incorporated during the data intensive scheduling in grid environments. Here, a cognitive engine is designed and deployed in various grid sites. The intelligent agents present in CE will help in analyzing the request and creating the knowledge base. Depending upon the link capacity, decision will be taken whether to transfer data sets or to partition the data sets. Prediction of next request is made by the agents to serve the requesting site with data sets in advance. This will reduce the data availability time and data transfer time. Replica catalog and Meta data catalog created by the agents assist in decision making process.Keywords: data grid, grid workflow scheduling, cognitive artificial intelligence
Procedia PDF Downloads 39224506 Heritage and Tourism in the Era of Big Data: Analysis of Chinese Cultural Tourism in Catalonia
Authors: Xinge Liao, Francesc Xavier Roige Ventura, Dolores Sanchez Aguilera
Abstract:
With the development of the Internet, the study of tourism behavior has rapidly expanded from the traditional physical market to the online market. Data on the Internet is characterized by dynamic changes, and new data appear all the time. In recent years the generation of a large volume of data was characterized, such as forums, blogs, and other sources, which have expanded over time and space, together they constitute large-scale Internet data, known as Big Data. This data of technological origin that derives from the use of devices and the activity of multiple users is becoming a source of great importance for the study of geography and the behavior of tourists. The study will focus on cultural heritage tourist practices in the context of Big Data. The research will focus on exploring the characteristics and behavior of Chinese tourists in relation to the cultural heritage of Catalonia. Geographical information, target image, perceptions in user-generated content will be studied through data analysis from Weibo -the largest social networks of blogs in China. Through the analysis of the behavior of heritage tourists in the Big Data environment, this study will understand the practices (activities, motivations, perceptions) of cultural tourists and then understand the needs and preferences of tourists in order to better guide the sustainable development of tourism in heritage sites.Keywords: Barcelona, Big Data, Catalonia, cultural heritage, Chinese tourism market, tourists’ behavior
Procedia PDF Downloads 13724505 Towards A Framework for Using Open Data for Accountability: A Case Study of A Program to Reduce Corruption
Authors: Darusalam, Jorish Hulstijn, Marijn Janssen
Abstract:
Media has revealed a variety of corruption cases in the regional and local governments all over the world. Many governments pursued many anti-corruption reforms and have created a system of checks and balances. Three types of corruption are faced by citizens; administrative corruption, collusion and extortion. Accountability is one of the benchmarks for building transparent government. The public sector is required to report the results of the programs that have been implemented so that the citizen can judge whether the institution has been working such as economical, efficient and effective. Open Data is offering solutions for the implementation of good governance in organizations who want to be more transparent. In addition, Open Data can create transparency and accountability to the community. The objective of this paper is to build a framework of open data for accountability to combating corruption. This paper will investigate the relationship between open data, and accountability as part of anti-corruption initiatives. This research will investigate the impact of open data implementation on public organization.Keywords: open data, accountability, anti-corruption, framework
Procedia PDF Downloads 33624504 Syndromic Surveillance Framework Using Tweets Data Analytics
Authors: David Ming Liu, Benjamin Hirsch, Bashir Aden
Abstract:
Syndromic surveillance is to detect or predict disease outbreaks through the analysis of medical sources of data. Using social media data like tweets to do syndromic surveillance becomes more and more popular with the aid of open platform to collect data and the advantage of microblogging text and mobile geographic location features. In this paper, a Syndromic Surveillance Framework is presented with machine learning kernel using tweets data analytics. Influenza and the three cities Abu Dhabi, Al Ain and Dubai of United Arabic Emirates are used as the test disease and trial areas. Hospital cases data provided by the Health Authority of Abu Dhabi (HAAD) are used for the correlation purpose. In our model, Latent Dirichlet allocation (LDA) engine is adapted to do supervised learning classification and N-Fold cross validation confusion matrix are given as the simulation results with overall system recall 85.595% performance achieved.Keywords: Syndromic surveillance, Tweets, Machine Learning, data mining, Latent Dirichlet allocation (LDA), Influenza
Procedia PDF Downloads 115