Search results for: cardio data analysis
42059 Big Data Analysis with RHadoop
Authors: Ji Eun Shin, Byung Ho Jung, Dong Hoon Lim
Abstract:
It is almost impossible to store or analyze big data increasing exponentially with traditional technologies. Hadoop is a new technology to make that possible. R programming language is by far the most popular statistical tool for big data analysis based on distributed processing with Hadoop technology. With RHadoop that integrates R and Hadoop environment, we implemented parallel multiple regression analysis with different sizes of actual data. Experimental results showed our RHadoop system was much faster as the number of data nodes increases. We also compared the performance of our RHadoop with lm function and big lm packages available on big memory. The results showed that our RHadoop was faster than other packages owing to paralleling processing with increasing the number of map tasks as the size of data increases.Keywords: big data, Hadoop, parallel regression analysis, R, RHadoop
Procedia PDF Downloads 43742058 Effectiveness of Simulation Resuscitation Training to Improve Self-Efficacy of Physicians and Nurses at Aga Khan University Hospital in Advanced Cardiac Life Support Courses Quasi-Experimental Study Design
Authors: Salima R. Rajwani, Tazeen Ali, Rubina Barolia, Yasmin Parpio, Nasreen Alwani, Salima B. Virani
Abstract:
Introduction: Nurses and physicians have a critical role in initiating lifesaving interventions during cardiac arrest. It is important that timely delivery of high quality Cardio Pulmonary Resuscitation (CPR) with advanced resuscitation skills and management of cardiac arrhythmias is a key dimension of code during cardiac arrest. It will decrease the chances of patient survival if the healthcare professionals are unable to initiate CPR timely. Moreover, traditional training will not prepare physicians and nurses at a competent level and their knowledge level declines over a period of time. In this regard, simulation training has been proven to be effective in promoting resuscitation skills. Simulation teaching learning strategy improves knowledge level, and skills performance during resuscitation through experiential learning without compromising patient safety in real clinical situations. The purpose of the study is to evaluate the effectiveness of simulation training in Advanced Cardiac Life Support Courses by using the selfefficacy tool. Methods: The study design is a quantitative research design and non-randomized quasi-experimental study design. The study examined the effectiveness of simulation through self-efficacy in two instructional methods; one is Medium Fidelity Simulation (MFS) and second is Traditional Training Method (TTM). The sample size was 220. Data was compiled by using the SPSS tool. The standardized simulation based training increases self-efficacy, knowledge, and skills and improves the management of patients in actual resuscitation. Results: 153 students participated in study; CG: n = 77 and EG: n = 77. The comparison was done between arms in pre and post-test. (F value was 1.69, p value is <0.195 and df was 1). There was no significant difference between arms in the pre and post-test. The interaction between arms was observed and there was no significant difference in interaction between arms in the pre and post-test. (F value was 0.298, p value is <0.586 and df is 1. However, the results showed self-efficacy scores were significantly higher within experimental group in post-test in advanced cardiac life support resuscitation courses as compared to Traditional Training Method (TTM) and had overall (p <0.0001) and F value was 143.316 (mean score was 45.01 and SD was 9.29) verses pre-test result showed (mean score was 31.15 and SD was 12.76) as compared to TTM in post-test (mean score was 29.68 and SD was 14.12) verses pre-test result showed (mean score was 42.33 and SD was 11.39). Conclusion: The standardized simulation-based training was conducted in the safe learning environment in Advanced Cardiac Life Suport Courses and physicians and nurses benefited from self-confidence, early identification of life-threatening scenarios, early initiation of CPR, and provides high-quality CPR, timely administration of medication and defibrillation, appropriate airway management, rhythm analysis and interpretation, and Return of Spontaneous Circulation (ROSC), team dynamics, debriefing, and teaching and learning strategies that will improve the patient survival in actual resuscitation.Keywords: advanced cardiac life support, cardio pulmonary resuscitation, return of spontaneous circulation, simulation
Procedia PDF Downloads 8042057 Incremental Learning of Independent Topic Analysis
Authors: Takahiro Nishigaki, Katsumi Nitta, Takashi Onoda
Abstract:
In this paper, we present a method of applying Independent Topic Analysis (ITA) to increasing the number of document data. The number of document data has been increasing since the spread of the Internet. ITA was presented as one method to analyze the document data. ITA is a method for extracting the independent topics from the document data by using the Independent Component Analysis (ICA). ICA is a technique in the signal processing; however, it is difficult to apply the ITA to increasing number of document data. Because ITA must use the all document data so temporal and spatial cost is very high. Therefore, we present Incremental ITA which extracts the independent topics from increasing number of document data. Incremental ITA is a method of updating the independent topics when the document data is added after extracted the independent topics from a just previous the data. In addition, Incremental ITA updates the independent topics when the document data is added. And we show the result applied Incremental ITA to benchmark datasets.Keywords: text mining, topic extraction, independent, incremental, independent component analysis
Procedia PDF Downloads 30942056 A Review of Spatial Analysis as a Geographic Information Management Tool
Authors: Chidiebere C. Agoha, Armstong C. Awuzie, Chukwuebuka N. Onwubuariri, Joy O. Njoku
Abstract:
Spatial analysis is a field of study that utilizes geographic or spatial information to understand and analyze patterns, relationships, and trends in data. It is characterized by the use of geographic or spatial information, which allows for the analysis of data in the context of its location and surroundings. It is different from non-spatial or aspatial techniques, which do not consider the geographic context and may not provide as complete of an understanding of the data. Spatial analysis is applied in a variety of fields, which includes urban planning, environmental science, geosciences, epidemiology, marketing, to gain insights and make decisions about complex spatial problems. This review paper explores definitions of spatial analysis from various sources, including examples of its application and different analysis techniques such as Buffer analysis, interpolation, and Kernel density analysis (multi-distance spatial cluster analysis). It also contrasts spatial analysis with non-spatial analysis.Keywords: aspatial technique, buffer analysis, epidemiology, interpolation
Procedia PDF Downloads 31842055 Conception of a Regulated, Dynamic and Intelligent Sewerage in Ostrevent
Authors: Rabaa Tlili Yaakoubi, Hind Nakouri, Olivier Blanpain
Abstract:
The current tools for real time management of sewer systems are based on two software tools: the software of weather forecast and the software of hydraulic simulation. The use of the first ones is an important cause of imprecision and uncertainty, the use of the second requires temporal important steps of decision because of their need in times of calculation. This way of proceeding fact that the obtained results are generally different from those waited. The major idea of the CARDIO project is to change the basic paradigm by approaching the problem by the "automatic" face rather than by that "hydrology". The objective is to make possible the realization of a large number of simulations at very short times (a few seconds) allowing to take place weather forecasts by using directly the real time meditative pluviometric data. The aim is to reach a system where the decision-making is realized from reliable data and where the correction of the error is permanent. A first model of control laws was realized and tested with different return-period rainfalls. The gains obtained in rejecting volume vary from 40 to 100%. The development of a new algorithm was then used to optimize calculation time and thus to overcome the subsequent combinatorial problem in our first approach. Finally, this new algorithm was tested with 16- year-rainfall series. The obtained gains are 60% of total volume rejected to the natural environment and of 80 % in the number of discharges.Keywords: RTC, paradigm, optimization, automation
Procedia PDF Downloads 28442054 On Pooling Different Levels of Data in Estimating Parameters of Continuous Meta-Analysis
Authors: N. R. N. Idris, S. Baharom
Abstract:
A meta-analysis may be performed using aggregate data (AD) or an individual patient data (IPD). In practice, studies may be available at both IPD and AD level. In this situation, both the IPD and AD should be utilised in order to maximize the available information. Statistical advantages of combining the studies from different level have not been fully explored. This study aims to quantify the statistical benefits of including available IPD when conducting a conventional summary-level meta-analysis. Simulated meta-analysis were used to assess the influence of the levels of data on overall meta-analysis estimates based on IPD-only, AD-only and the combination of IPD and AD (mixed data, MD), under different study scenario. The percentage relative bias (PRB), root mean-square-error (RMSE) and coverage probability were used to assess the efficiency of the overall estimates. The results demonstrate that available IPD should always be included in a conventional meta-analysis using summary level data as they would significantly increased the accuracy of the estimates. On the other hand, if more than 80% of the available data are at IPD level, including the AD does not provide significant differences in terms of accuracy of the estimates. Additionally, combining the IPD and AD has moderating effects on the biasness of the estimates of the treatment effects as the IPD tends to overestimate the treatment effects, while the AD has the tendency to produce underestimated effect estimates. These results may provide some guide in deciding if significant benefit is gained by pooling the two levels of data when conducting meta-analysis.Keywords: aggregate data, combined-level data, individual patient data, meta-analysis
Procedia PDF Downloads 37542053 Multidimensional Item Response Theory Models for Practical Application in Large Tests Designed to Measure Multiple Constructs
Authors: Maria Fernanda Ordoñez Martinez, Alvaro Mauricio Montenegro
Abstract:
This work presents a statistical methodology for measuring and founding constructs in Latent Semantic Analysis. This approach uses the qualities of Factor Analysis in binary data with interpretations present on Item Response Theory. More precisely, we propose initially reducing dimensionality with specific use of Principal Component Analysis for the linguistic data and then, producing axes of groups made from a clustering analysis of the semantic data. This approach allows the user to give meaning to previous clusters and found the real latent structure presented by data. The methodology is applied in a set of real semantic data presenting impressive results for the coherence, speed and precision.Keywords: semantic analysis, factorial analysis, dimension reduction, penalized logistic regression
Procedia PDF Downloads 44342052 Collision Theory Based Sentiment Detection Using Discourse Analysis in Hadoop
Authors: Anuta Mukherjee, Saswati Mukherjee
Abstract:
Data is growing everyday. Social networking sites such as Twitter are becoming an integral part of our daily lives, contributing a large increase in the growth of data. It is a rich source especially for sentiment detection or mining since people often express honest opinion through tweets. However, although sentiment analysis is a well-researched topic in text, this analysis using Twitter data poses additional challenges since these are unstructured data with abbreviations and without a strict grammatical correctness. We have employed collision theory to achieve sentiment analysis in Twitter data. We have also incorporated discourse analysis in the collision theory based model to detect accurate sentiment from tweets. We have also used the retweet field to assign weights to certain tweets and obtained the overall weightage of a topic provided in the form of a query. Hadoop has been exploited for speed. Our experiments show effective results.Keywords: sentiment analysis, twitter, collision theory, discourse analysis
Procedia PDF Downloads 53542051 Total Arterial Coronary Revascularization with Aorto-Bifemoral Bipopliteal Bypass: A Case Report
Authors: Nuruddin Mohammod Zahangir, Syed Tanvir Ahmady, Firoz Ahmed, Mainul Kabir, Tamjid Mohammad Najmus Sakib Khan, Nazmul Hossain, Niaz Ahmed, Madhava Janardhan Naik
Abstract:
The management of combined Coronary Artery Disease and Peripheral Vascular Disease is a challenge and brings with it numerous clinical dilemmas.The 56 year old gentleman presented to our department with significant triple vessel disease with occluded lower end of aorta just before bifurcation and bilateral superficial femoral arteries. Operation was done on 11.03.14. The The Left Internal Mammary Artery (LIMA) and the Right Internal Mammary Artery (RIMA) were harvested in skeletonized manner. The free RIMA was then anastomosed with LIMA to make LIMA-RIMA Y. Cardio Pulmonary Bypass was then established and coronary artery bypass grafts performed. LIMA was anastomosed to the Left Anterior Descending artery. RIMA was anastomosed to Posterior Descending Artery, 1st and 2nd Obtuse Marginal arteries in a sequential manner. Abdomen was opened by midline incision. The infrarenal aorta exposed and was found to be severely diseased. A Vascular Clamp was applied infrarenally, aortotomy done and limited endarterectomy performed. An end-to-side anastomosis was done with upper end of PTFE synthetic Y-graft (14/7 mm) to the infarenal Aorta and the Clamp released. Good flow noted in both limbs of the graft. Patient was then slowly weaned off from Cardio Pulmonary Bypass without difficulty. The distal two limbs of the Y graft were passed to the groin through retroperitoneal tunnels and anastomosed end-to-side with the common femoral arteries. Saphenous vein was interposed between common femoral and popliteal arteries bilaterally through subfascial tunnels in both thigh. On 12th postoperative day he was discharged from hospital in good general condition. Follow up after 3 months of operation the patient is doing good and free of chest pain and claudication pain.Keywords: total arterial, coronary revascularization, aorto-bifemoral bypass, bifemoro-bipopliteal bypass
Procedia PDF Downloads 47242050 Analysis of Expression Data Using Unsupervised Techniques
Authors: M. A. I Perera, C. R. Wijesinghe, A. R. Weerasinghe
Abstract:
his study was conducted to review and identify the unsupervised techniques that can be employed to analyze gene expression data in order to identify better subtypes of tumors. Identifying subtypes of cancer help in improving the efficacy and reducing the toxicity of the treatments by identifying clues to find target therapeutics. Process of gene expression data analysis described under three steps as preprocessing, clustering, and cluster validation. Feature selection is important since the genomic data are high dimensional with a large number of features compared to samples. Hierarchical clustering and K Means are often used in the analysis of gene expression data. There are several cluster validation techniques used in validating the clusters. Heatmaps are an effective external validation method that allows comparing the identified classes with clinical variables and visual analysis of the classes.Keywords: cancer subtypes, gene expression data analysis, clustering, cluster validation
Procedia PDF Downloads 14942049 Data Mining Algorithms Analysis: Case Study of Price Predictions of Lands
Authors: Julio Albuja, David Zaldumbide
Abstract:
Data analysis is an important step before taking a decision about money. The aim of this work is to analyze the factors that influence the final price of the houses through data mining algorithms. To our best knowledge, previous work was researched just to compare results. Furthermore, before using the data of the data set, the Z-Transformation were used to standardize the data in the same range. Hence, the data was classified into two groups to visualize them in a readability format. A decision tree was built, and graphical data is displayed where clearly is easy to see the results and the factors' influence in these graphics. The definitions of these methods are described, as well as the descriptions of the results. Finally, conclusions and recommendations are presented related to the released results that our research showed making it easier to apply these algorithms using a customized data set.Keywords: algorithms, data, decision tree, transformation
Procedia PDF Downloads 37442048 The Association Between Different Body Mass Index Levels And Midterm Surgical Revascularization Outcomes
Authors: Farzad Masoud Kabir, Jamshid Bagheri, Khosro Barkhordari
Abstract:
This historical cohort study included 17,751 patients patients who underwent isolated CABG at our center between 2007 and 2016. The endpoints of this study were all-cause mortality and major adverse cardio-cerebrovascular events (MACCEs), comprising acute coronary syndromes, cerebrovascular accidents, and all-cause mortality at five years. Our findings suggest that preoperative obesity (BMI>30 kg/m2) in patients who survive early after CABG is associated with an increased risk of 5-year all-cause mortality and 5-year MACCEs.Keywords: body mass index, surgical outcomes, midterm, cardiac surgery patients
Procedia PDF Downloads 7742047 A Modular Framework for Enabling Analysis for Educators with Different Levels of Data Mining Skills
Authors: Kyle De Freitas, Margaret Bernard
Abstract:
Enabling data mining analysis among a wider audience of educators is an active area of research within the educational data mining (EDM) community. The paper proposes a framework for developing an environment that caters for educators who have little technical data mining skills as well as for more advanced users with some data mining expertise. This framework architecture was developed through the review of the strengths and weaknesses of existing models in the literature. The proposed framework provides a modular architecture for future researchers to focus on the development of specific areas within the EDM process. Finally, the paper also highlights a strategy of enabling analysis through either the use of predefined questions or a guided data mining process and highlights how the developed questions and analysis conducted can be reused and extended over time.Keywords: educational data mining, learning management system, learning analytics, EDM framework
Procedia PDF Downloads 32642046 Quantile Coherence Analysis: Application to Precipitation Data
Authors: Yaeji Lim, Hee-Seok Oh
Abstract:
The coherence analysis measures the linear time-invariant relationship between two data sets and has been studied various fields such as signal processing, engineering, and medical science. However classical coherence analysis tends to be sensitive to outliers and focuses only on mean relationship. In this paper, we generalized cross periodogram to quantile cross periodogram and provide richer inter-relationship between two data sets. This is a general version of Laplace cross periodogram. We prove its asymptotic distribution under the long range process and compare them with ordinary coherence through numerical examples. We also present real data example to confirm the usefulness of quantile coherence analysis.Keywords: coherence, cross periodogram, spectrum, quantile
Procedia PDF Downloads 39042045 Modeling and Statistical Analysis of a Soap Production Mix in Bejoy Manufacturing Industry, Anambra State, Nigeria
Authors: Okolie Chukwulozie Paul, Iwenofu Chinwe Onyedika, Sinebe Jude Ebieladoh, M. C. Nwosu
Abstract:
The research work is based on the statistical analysis of the processing data. The essence is to analyze the data statistically and to generate a design model for the production mix of soap manufacturing products in Bejoy manufacturing company Nkpologwu, Aguata Local Government Area, Anambra state, Nigeria. The statistical analysis shows the statistical analysis and the correlation of the data. T test, Partial correlation and bi-variate correlation were used to understand what the data portrays. The design model developed was used to model the data production yield and the correlation of the variables show that the R2 is 98.7%. However, the results confirm that the data is fit for further analysis and modeling. This was proved by the correlation and the R-squared.Keywords: General Linear Model, correlation, variables, pearson, significance, T-test, soap, production mix and statistic
Procedia PDF Downloads 44542044 Saving Energy at a Wastewater Treatment Plant through Electrical and Production Data Analysis
Authors: Adriano Araujo Carvalho, Arturo Alatrista Corrales
Abstract:
This paper intends to show how electrical energy consumption and production data analysis were used to find opportunities to save energy at Taboada wastewater treatment plant in Callao, Peru. In order to access the data, it was used independent data networks for both electrical and process instruments, which were taken to analyze under an ISO 50001 energy audit, which considered, thus, Energy Performance Indexes for each process and a step-by-step guide presented in this text. Due to the use of aforementioned methodology and data mining techniques applied on information gathered through electronic multimeters (conveniently placed on substation switchboards connected to a cloud network), it was possible to identify thoroughly the performance of each process and thus, evidence saving opportunities which were previously hidden before. The data analysis brought both costs and energy reduction, allowing the plant to save significant resources and to be certified under ISO 50001.Keywords: energy and production data analysis, energy management, ISO 50001, wastewater treatment plant energy analysis
Procedia PDF Downloads 19342043 Prompt Design for Code Generation in Data Analysis Using Large Language Models
Authors: Lu Song Ma Li Zhi
Abstract:
With the rapid advancement of artificial intelligence technology, large language models (LLMs) have become a milestone in the field of natural language processing, demonstrating remarkable capabilities in semantic understanding, intelligent question answering, and text generation. These models are gradually penetrating various industries, particularly showcasing significant application potential in the data analysis domain. However, retraining or fine-tuning these models requires substantial computational resources and ample downstream task datasets, which poses a significant challenge for many enterprises and research institutions. Without modifying the internal parameters of the large models, prompt engineering techniques can rapidly adapt these models to new domains. This paper proposes a prompt design strategy aimed at leveraging the capabilities of large language models to automate the generation of data analysis code. By carefully designing prompts, data analysis requirements can be described in natural language, which the large language model can then understand and convert into executable data analysis code, thereby greatly enhancing the efficiency and convenience of data analysis. This strategy not only lowers the threshold for using large models but also significantly improves the accuracy and efficiency of data analysis. Our approach includes requirements for the precision of natural language descriptions, coverage of diverse data analysis needs, and mechanisms for immediate feedback and adjustment. Experimental results show that with this prompt design strategy, large language models perform exceptionally well in multiple data analysis tasks, generating high-quality code and significantly shortening the data analysis cycle. This method provides an efficient and convenient tool for the data analysis field and demonstrates the enormous potential of large language models in practical applications.Keywords: large language models, prompt design, data analysis, code generation
Procedia PDF Downloads 3942042 A Method of Detecting the Difference in Two States of Brain Using Statistical Analysis of EEG Raw Data
Authors: Digvijaysingh S. Bana, Kiran R. Trivedi
Abstract:
This paper introduces various methods for the alpha wave to detect the difference between two states of brain. One healthy subject participated in the experiment. EEG was measured on the forehead above the eye (FP1 Position) with reference and ground electrode are on the ear clip. The data samples are obtained in the form of EEG raw data. The time duration of reading is of one minute. Various test are being performed on the alpha band EEG raw data.The readings are performed in different time duration of the entire day. The statistical analysis is being carried out on the EEG sample data in the form of various tests.Keywords: electroencephalogram(EEG), biometrics, authentication, EEG raw data
Procedia PDF Downloads 46442041 Application of Blockchain Technology in Geological Field
Authors: Mengdi Zhang, Zhenji Gao, Ning Kang, Rongmei Liu
Abstract:
Management and application of geological big data is an important part of China's national big data strategy. With the implementation of a national big data strategy, geological big data management becomes more and more critical. At present, there are still a lot of technology barriers as well as cognition chaos in many aspects of geological big data management and application, such as data sharing, intellectual property protection, and application technology. Therefore, it’s a key task to make better use of new technologies for deeper delving and wider application of geological big data. In this paper, we briefly introduce the basic principle of blockchain technology at the beginning and then make an analysis of the application dilemma of geological data. Based on the current analysis, we bring forward some feasible patterns and scenarios for the blockchain application in geological big data and put forward serval suggestions for future work in geological big data management.Keywords: blockchain, intellectual property protection, geological data, big data management
Procedia PDF Downloads 8942040 A Review on Existing Challenges of Data Mining and Future Research Perspectives
Authors: Hema Bhardwaj, D. Srinivasa Rao
Abstract:
Technology for analysing, processing, and extracting meaningful data from enormous and complicated datasets can be termed as "big data." The technique of big data mining and big data analysis is extremely helpful for business movements such as making decisions, building organisational plans, researching the market efficiently, improving sales, etc., because typical management tools cannot handle such complicated datasets. Special computational and statistical issues, such as measurement errors, noise accumulation, spurious correlation, and storage and scalability limitations, are brought on by big data. These unique problems call for new computational and statistical paradigms. This research paper offers an overview of the literature on big data mining, its process, along with problems and difficulties, with a focus on the unique characteristics of big data. Organizations have several difficulties when undertaking data mining, which has an impact on their decision-making. Every day, terabytes of data are produced, yet only around 1% of that data is really analyzed. The idea of the mining and analysis of data and knowledge discovery techniques that have recently been created with practical application systems is presented in this study. This article's conclusion also includes a list of issues and difficulties for further research in the area. The report discusses the management's main big data and data mining challenges.Keywords: big data, data mining, data analysis, knowledge discovery techniques, data mining challenges
Procedia PDF Downloads 11042039 Data and Spatial Analysis for Economy and Education of 28 E.U. Member-States for 2014
Authors: Alexiou Dimitra, Fragkaki Maria
Abstract:
The objective of the paper is the study of geographic, economic and educational variables and their contribution to determine the position of each member-state among the EU-28 countries based on the values of seven variables as given by Eurostat. The Data Analysis methods of Multiple Factorial Correspondence Analysis (MFCA) Principal Component Analysis and Factor Analysis have been used. The cross tabulation tables of data consist of the values of seven variables for the 28 countries for 2014. The data are manipulated using the CHIC Analysis V 1.1 software package. The results of this program using MFCA and Ascending Hierarchical Classification are given in arithmetic and graphical form. For comparison reasons with the same data the Factor procedure of Statistical package IBM SPSS 20 has been used. The numerical and graphical results presented with tables and graphs, demonstrate the agreement between the two methods. The most important result is the study of the relation between the 28 countries and the position of each country in groups or clouds, which are formed according to the values of the corresponding variables.Keywords: Multiple Factorial Correspondence Analysis, Principal Component Analysis, Factor Analysis, E.U.-28 countries, Statistical package IBM SPSS 20, CHIC Analysis V 1.1 Software, Eurostat.eu Statistics
Procedia PDF Downloads 51142038 Analysis of Big Data
Authors: Sandeep Sharma, Sarabjit Singh
Abstract:
As per the user demand and growth trends of large free data the storage solutions are now becoming more challenge-able to protect, store and to retrieve data. The days are not so far when the storage companies and organizations are start saying 'no' to store our valuable data or they will start charging a huge amount for its storage and protection. On the other hand as per the environmental conditions it becomes challenge-able to maintain and establish new data warehouses and data centers to protect global warming threats. A challenge of small data is over now, the challenges are big that how to manage the exponential growth of data. In this paper we have analyzed the growth trend of big data and its future implications. We have also focused on the impact of the unstructured data on various concerns and we have also suggested some possible remedies to streamline big data.Keywords: big data, unstructured data, volume, variety, velocity
Procedia PDF Downloads 54842037 Survey on Arabic Sentiment Analysis in Twitter
Authors: Sarah O. Alhumoud, Mawaheb I. Altuwaijri, Tarfa M. Albuhairi, Wejdan M. Alohaideb
Abstract:
Large-scale data stream analysis has become one of the important business and research priorities lately. Social networks like Twitter and other micro-blogging platforms hold an enormous amount of data that is large in volume, velocity and variety. Extracting valuable information and trends out of these data would aid in a better understanding and decision-making. Multiple analysis techniques are deployed for English content. Moreover, one of the languages that produce a large amount of data over social networks and is least analyzed is the Arabic language. The proposed paper is a survey on the research efforts to analyze the Arabic content in Twitter focusing on the tools and methods used to extract the sentiments for the Arabic content on Twitter.Keywords: big data, social networks, sentiment analysis, twitter
Procedia PDF Downloads 57642036 Data Mining Meets Educational Analysis: Opportunities and Challenges for Research
Authors: Carla Silva
Abstract:
Recent development of information and communication technology enables us to acquire, collect, analyse data in various fields of socioeconomic – technological systems. Along with the increase of economic globalization and the evolution of information technology, data mining has become an important approach for economic data analysis. As a result, there has been a critical need for automated approaches to effective and efficient usage of massive amount of educational data, in order to support institutions to a strategic planning and investment decision-making. In this article, we will address data from several different perspectives and define the applied data to sciences. Many believe that 'big data' will transform business, government, and other aspects of the economy. We discuss how new data may impact educational policy and educational research. Large scale administrative data sets and proprietary private sector data can greatly improve the way we measure, track, and describe educational activity and educational impact. We also consider whether the big data predictive modeling tools that have emerged in statistics and computer science may prove useful in educational and furthermore in economics. Finally, we highlight a number of challenges and opportunities for future research.Keywords: data mining, research analysis, investment decision-making, educational research
Procedia PDF Downloads 35842035 Evaluation of Apolipoprotein Profile in HIV/Aids Subjects in Pre and Post 12 Months Antiretroviral Therapy Using 1.5 NG/ML Troponin Diagnostic Cut-off for Myocardial Infarction in Nauth Nnewi, South Eastern Nigeria
Authors: I. P. Ezeugwunne, C. C. Onyenekwe, J. E. Ahaneku, G. I. Ahaneku
Abstract:
Introduction: It has been reported that acute myocardial infarction (AMI) might occur at 1.5 ng/ml troponin level. HIV infection has been documented to influence antiviral drugs, stimulate the production of proteins that enhance fatty acids synthesis. Information on cardiac status in HIV-infected subjects in Nigeria is scanty. Aim: To evaluate the Apolipoprotein profile of HIV subjects in pre-and-post 12 months of antiretroviral therapy (ART) using 1.5 ng/ml troponin diagnostic cut-off for myocardial infarction (MI) in Nnewi, South Eastern, Nigeria. Methodology: A total of 30 symptomatic HIV subjects without malaria co-infection with a mean age of 40.70 ±10.56 years were randomly recruited for this prospective case-controlled study. Serum apolipoproteins (Apo A1, A2, B, C2,C3 and Apo E), troponin and CD4 counts were measured using standard laboratory methods. Parameters were re-classified based on 1.5 ng/ml troponin diagnostic cut-off for MI. Analysis of variance and student paired t-tests were used for data analyses. Results: paired-wise comparison showed that there were significantly higher levels of CD4 counts, Apo A2, Apo C2, Apo E but lower levels of ApoA1, ApoB and ApoC3 in symptomatic HIV subjects before antiretroviral therapy (ART) when compared with after therapy at p<0.05 respectively. The troponin value was significantly higher amongst the group studied at p<0.05, respectively. Conclusion: The increased values of troponin observed among the groups were higher than the diagnostic cut-off for AMI. This may imply that AMI may occur at any group of studies. But the significant reduction in the serum levels of Apo A2, Apo B, Apo C3, Apo E and a significant increase in serum levels of Apo A1, Apo C2 and blood CD4 counts as the length of therapy lengthened may indicate possible cardio-protective effects of the ART on the heart, which may connote recovery.Keywords: ART, apolipoprotein, HIV, myocardial infarction
Procedia PDF Downloads 16442034 Pattern Recognition Using Feature Based Die-Map Clustering in the Semiconductor Manufacturing Process
Authors: Seung Hwan Park, Cheng-Sool Park, Jun Seok Kim, Youngji Yoo, Daewoong An, Jun-Geol Baek
Abstract:
Depending on the big data analysis becomes important, yield prediction using data from the semiconductor process is essential. In general, yield prediction and analysis of the causes of the failure are closely related. The purpose of this study is to analyze pattern affects the final test results using a die map based clustering. Many researches have been conducted using die data from the semiconductor test process. However, analysis has limitation as the test data is less directly related to the final test results. Therefore, this study proposes a framework for analysis through clustering using more detailed data than existing die data. This study consists of three phases. In the first phase, die map is created through fail bit data in each sub-area of die. In the second phase, clustering using map data is performed. And the third stage is to find patterns that affect final test result. Finally, the proposed three steps are applied to actual industrial data and experimental results showed the potential field application.Keywords: die-map clustering, feature extraction, pattern recognition, semiconductor manufacturing process
Procedia PDF Downloads 40242033 The Economic Limitations of Defining Data Ownership Rights
Authors: Kacper Tomasz Kröber-Mulawa
Abstract:
This paper will address the topic of data ownership from an economic perspective, and examples of economic limitations of data property rights will be provided, which have been identified using methods and approaches of economic analysis of law. To properly build a background for the economic focus, in the beginning a short perspective of data and data ownership in the EU’s legal system will be provided. It will include a short introduction to its political and social importance and highlight relevant viewpoints. This will stress the importance of a Single Market for data but also far-reaching regulations of data governance and privacy (including the distinction of personal and non-personal data, data held by public bodies and private businesses). The main discussion of this paper will build upon the briefly referred to legal basis as well as methods and approaches of economic analysis of law.Keywords: antitrust, data, data ownership, digital economy, property rights
Procedia PDF Downloads 8142032 Iot Device Cost Effective Storage Architecture and Real-Time Data Analysis/Data Privacy Framework
Authors: Femi Elegbeleye, Omobayo Esan, Muienge Mbodila, Patrick Bowe
Abstract:
This paper focused on cost effective storage architecture using fog and cloud data storage gateway and presented the design of the framework for the data privacy model and data analytics framework on a real-time analysis when using machine learning method. The paper began with the system analysis, system architecture and its component design, as well as the overall system operations. The several results obtained from this study on data privacy model shows that when two or more data privacy model is combined we tend to have a more stronger privacy to our data, and when fog storage gateway have several advantages over using the traditional cloud storage, from our result shows fog has reduced latency/delay, low bandwidth consumption, and energy usage when been compare with cloud storage, therefore, fog storage will help to lessen excessive cost. This paper dwelt more on the system descriptions, the researchers focused on the research design and framework design for the data privacy model, data storage, and real-time analytics. This paper also shows the major system components and their framework specification. And lastly, the overall research system architecture was shown, its structure, and its interrelationships.Keywords: IoT, fog, cloud, data analysis, data privacy
Procedia PDF Downloads 9942031 Cloud Design for Storing Large Amount of Data
Authors: M. Strémy, P. Závacký, P. Cuninka, M. Juhás
Abstract:
Main goal of this paper is to introduce our design of private cloud for storing large amount of data, especially pictures, and to provide good technological backend for data analysis based on parallel processing and business intelligence. We have tested hypervisors, cloud management tools, storage for storing all data and Hadoop to provide data analysis on unstructured data. Providing high availability, virtual network management, logical separation of projects and also rapid deployment of physical servers to our environment was also needed.Keywords: cloud, glusterfs, hadoop, juju, kvm, maas, openstack, virtualization
Procedia PDF Downloads 35342030 Data Integration with Geographic Information System Tools for Rural Environmental Monitoring
Authors: Tamas Jancso, Andrea Podor, Eva Nagyne Hajnal, Peter Udvardy, Gabor Nagy, Attila Varga, Meng Qingyan
Abstract:
The paper deals with the conditions and circumstances of integration of remotely sensed data for rural environmental monitoring purposes. The main task is to make decisions during the integration process when we have data sources with different resolution, location, spectral channels, and dimension. In order to have exact knowledge about the integration and data fusion possibilities, it is necessary to know the properties (metadata) that characterize the data. The paper explains the joining of these data sources using their attribute data through a sample project. The resulted product will be used for rural environmental analysis.Keywords: remote sensing, GIS, metadata, integration, environmental analysis
Procedia PDF Downloads 120