Search results for: data encoding
23401 Parallel Fuzzy Rough Support Vector Machine for Data Classification in Cloud Environment
Authors: Arindam Chaudhuri
Abstract:
Classification of data has been actively used for most effective and efficient means of conveying knowledge and information to users. The prima face has always been upon techniques for extracting useful knowledge from data such that returns are maximized. With emergence of huge datasets the existing classification techniques often fail to produce desirable results. The challenge lies in analyzing and understanding characteristics of massive data sets by retrieving useful geometric and statistical patterns. We propose a supervised parallel fuzzy rough support vector machine (PFRSVM) for data classification in cloud environment. The classification is performed by PFRSVM using hyperbolic tangent kernel. The fuzzy rough set model takes care of sensitiveness of noisy samples and handles impreciseness in training samples bringing robustness to results. The membership function is function of center and radius of each class in feature space and is represented with kernel. It plays an important role towards sampling the decision surface. The success of PFRSVM is governed by choosing appropriate parameter values. The training samples are either linear or nonlinear separable. The different input points make unique contributions to decision surface. The algorithm is parallelized with a view to reduce training times. The system is built on support vector machine library using Hadoop implementation of MapReduce. The algorithm is tested on large data sets to check its feasibility and convergence. The performance of classifier is also assessed in terms of number of support vectors. The challenges encountered towards implementing big data classification in machine learning frameworks are also discussed. The experiments are done on the cloud environment available at University of Technology and Management, India. The results are illustrated for Gaussian RBF and Bayesian kernels. The effect of variability in prediction and generalization of PFRSVM is examined with respect to values of parameter C. It effectively resolves outliers’ effects, imbalance and overlapping class problems, normalizes to unseen data and relaxes dependency between features and labels. The average classification accuracy for PFRSVM is better than other classifiers for both Gaussian RBF and Bayesian kernels. The experimental results on both synthetic and real data sets clearly demonstrate the superiority of the proposed technique.Keywords: FRSVM, Hadoop, MapReduce, PFRSVM
Procedia PDF Downloads 48923400 Design and Development of a Computerized Medical Record System for Hospitals in Remote Areas
Authors: Grace Omowunmi Soyebi
Abstract:
A computerized medical record system is a collection of medical information about a person that is stored on a computer. One principal problem of most hospitals in rural areas is using the file management system for keeping records. A lot of time is wasted when a patient visits the hospital, probably in an emergency, and the nurse or attendant has to search through voluminous files before the patient's file can be retrieved, this may cause an unexpected to happen to the patient. This Data Mining application is to be designed using a Structured System Analysis and design method which will help in a well-articulated analysis of the existing file management system, feasibility study, and proper documentation of the Design and Implementation of a Computerized medical record system. This Computerized system will replace the file management system and help to quickly retrieve a patient's record with increased data security, access clinical records for decision-making, and reduce the time range at which a patient gets attended to.Keywords: programming, computing, data, innovation
Procedia PDF Downloads 11723399 Modified CUSUM Algorithm for Gradual Change Detection in a Time Series Data
Authors: Victoria Siriaki Jorry, I. S. Mbalawata, Hayong Shin
Abstract:
The main objective in a change detection problem is to develop algorithms for efficient detection of gradual and/or abrupt changes in the parameter distribution of a process or time series data. In this paper, we present a modified cumulative (MCUSUM) algorithm to detect the start and end of a time-varying linear drift in mean value of a time series data based on likelihood ratio test procedure. The design, implementation and performance of the proposed algorithm for a linear drift detection is evaluated and compared to the existing CUSUM algorithm using different performance measures. An approach to accurately approximate the threshold of the MCUSUM is also provided. Performance of the MCUSUM for gradual change-point detection is compared to that of standard cumulative sum (CUSUM) control chart designed for abrupt shift detection using Monte Carlo Simulations. In terms of the expected time for detection, the MCUSUM procedure is found to have a better performance than a standard CUSUM chart for detection of the gradual change in mean. The algorithm is then applied and tested to a randomly generated time series data with a gradual linear trend in mean to demonstrate its usefulness.Keywords: average run length, CUSUM control chart, gradual change detection, likelihood ratio test
Procedia PDF Downloads 29723398 Contextual Toxicity Detection with Data Augmentation
Authors: Julia Ive, Lucia Specia
Abstract:
Understanding and detecting toxicity is an important problem to support safer human interactions online. Our work focuses on the important problem of contextual toxicity detection, where automated classifiers are tasked with determining whether a short textual segment (usually a sentence) is toxic within its conversational context. We use “toxicity” as an umbrella term to denote a number of variants commonly named in the literature, including hate, abuse, offence, among others. Detecting toxicity in context is a non-trivial problem and has been addressed by very few previous studies. These previous studies have analysed the influence of conversational context in human perception of toxicity in controlled experiments and concluded that humans rarely change their judgements in the presence of context. They have also evaluated contextual detection models based on state-of-the-art Deep Learning and Natural Language Processing (NLP) techniques. Counterintuitively, they reached the general conclusion that computational models tend to suffer performance degradation in the presence of context. We challenge these empirical observations by devising better contextual predictive models that also rely on NLP data augmentation techniques to create larger and better data. In our study, we start by further analysing the human perception of toxicity in conversational data (i.e., tweets), in the absence versus presence of context, in this case, previous tweets in the same conversational thread. We observed that the conclusions of previous work on human perception are mainly due to data issues: The contextual data available does not provide sufficient evidence that context is indeed important (even for humans). The data problem is common in current toxicity datasets: cases labelled as toxic are either obviously toxic (i.e., overt toxicity with swear, racist, etc. words), and thus context does is not needed for a decision, or are ambiguous, vague or unclear even in the presence of context; in addition, the data contains labeling inconsistencies. To address this problem, we propose to automatically generate contextual samples where toxicity is not obvious (i.e., covert cases) without context or where different contexts can lead to different toxicity judgements for the same tweet. We generate toxic and non-toxic utterances conditioned on the context or on target tweets using a range of techniques for controlled text generation(e.g., Generative Adversarial Networks and steering techniques). On the contextual detection models, we posit that their poor performance is due to limitations on both of the data they are trained on (same problems stated above) and the architectures they use, which are not able to leverage context in effective ways. To improve on that, we propose text classification architectures that take the hierarchy of conversational utterances into account. In experiments benchmarking ours against previous models on existing and automatically generated data, we show that both data and architectural choices are very important. Our model achieves substantial performance improvements as compared to the baselines that are non-contextual or contextual but agnostic of the conversation structure.Keywords: contextual toxicity detection, data augmentation, hierarchical text classification models, natural language processing
Procedia PDF Downloads 16923397 Osteoarthritis (OA): A Total Knee Replacement Surgery
Authors: Loveneet Kaur
Abstract:
Introduction: Osteoarthritis (OA) is one of the leading causes of disability, and the knee is the most commonly affected joint in the body. The last resort for treatment of knee OA is Total Knee Replacement (TKR) surgery. Despite numerous advances in prosthetic design, patients do not reach normal function after surgery. Current surgical decisions are made on 2D radiographs and patient interviews. Aims: The aim of this study was to compare knee kinematics pre and post-TKR surgery using computer-animated images of patient-specific models under everyday conditions. Methods: 7 subjects were recruited for the study. Subjects underwent 3D gait analysis during 4 everyday activities and medical imaging of the knee joint pre- and one-month post-surgery. A 3D model was created from each of the scans, and the kinematic gait analysis data was used to animate the images. Results: Improvements were seen in a range of motion in all 4 activities 1-year post-surgery. The preoperative 3D images provide detailed information on the anatomy of the osteoarthritic knee. The postoperative images demonstrate potential future problems associated with the implant. Although not accurate enough to be of clinical use, the animated data can provide valuable insight into what conditions cause damage to both the osteoarthritic and prosthetic knee joints. As the animated data does not require specialist training to view, the images can be utilized across the fields of health professionals and manufacturing in the assessment and treatment of patients pre and post-knee replacement surgery. Future improvements in the collection and processing of data may yield clinically useful data. Conclusion: Although not yet of clinical use, the potential application of 3D animations of the knee joint pre and post-surgery is widespread.Keywords: Orthoporosis, Ortharthritis, knee replacement, TKR
Procedia PDF Downloads 4523396 Time of Week Intensity Estimation from Interval Censored Data with Application to Police Patrol Planning
Authors: Jiahao Tian, Michael D. Porter
Abstract:
Law enforcement agencies are tasked with crime prevention and crime reduction under limited resources. Having an accurate temporal estimate of the crime rate would be valuable to achieve such a goal. However, estimation is usually complicated by the interval-censored nature of crime data. We cast the problem of intensity estimation as a Poisson regression using an EM algorithm to estimate the parameters. Two special penalties are added that provide smoothness over the time of day and day of the week. This approach presented here provides accurate intensity estimates and can also uncover day-of-week clusters that share the same intensity patterns. Anticipating where and when crimes might occur is a key element to successful policing strategies. However, this task is complicated by the presence of interval-censored data. The censored data refers to the type of data that the event time is only known to lie within an interval instead of being observed exactly. This type of data is prevailing in the field of criminology because of the absence of victims for certain types of crime. Despite its importance, the research in temporal analysis of crime has lagged behind the spatial component. Inspired by the success of solving crime-related problems with a statistical approach, we propose a statistical model for the temporal intensity estimation of crime with censored data. The model is built on Poisson regression and has special penalty terms added to the likelihood. An EM algorithm was derived to obtain maximum likelihood estimates, and the resulting model shows superior performance to the competing model. Our research is in line with the smart policing initiative (SPI) proposed by the Bureau Justice of Assistance (BJA) as an effort to support law enforcement agencies in building evidence-based, data-driven law enforcement tactics. The goal is to identify strategic approaches that are effective in crime prevention and reduction. In our case, we allow agencies to deploy their resources for a relatively short period of time to achieve the maximum level of crime reduction. By analyzing a particular area within cities where data are available, our proposed approach could not only provide an accurate estimate of intensities for the time unit considered but a time-variation crime incidence pattern. Both will be helpful in the allocation of limited resources by either improving the existing patrol plan with the understanding of the discovery of the day of week cluster or supporting extra resources available.Keywords: cluster detection, EM algorithm, interval censoring, intensity estimation
Procedia PDF Downloads 6423395 Diversifying from Petroleum Products to Arable Farming as Source of Revenue Generation in Nigeria: A Case Study of Ondo West Local Government
Authors: A. S. Akinbani
Abstract:
Overdependence on petroleum is causing set back in Nigeria economy. Field survey was carried out to assess the profitability and production of selected arable crops in six selected towns and villages of Ondo southwestern. Data were collected from 240 arable crop farmers with the aid of both primary and secondary data. Data were collected with the use of oral interview and structured questionnaires. Data collected were analyzed using both descriptive and inferential statistics. Forty farmers were randomly selected to give a total number of 240 respondents. 84 farmers interviewed had no formal education, 72 had primary education, 50 farmers attained secondary education while 38 attained beyond secondary education. The majority of the farmers hold less than 10 acres of land. The data collected from the field showed that 192 farmers practiced mixed cropping which includes mixtures of yam, cowpea, cocoyam, vegetable, cassava and maize while only 48 farmers practiced monocropping. Among the sampled farmers, 93% agreed that arable production is profitable while 7% disagreed. The findings show that managerial practices that conserve the soil fertility and reduce labor cost such as planting of leguminous crops and herbicide application instead of using hand held hoe for weeding should be encouraged. All the respondents agreed that yam, cowpea, cocoyam, sweet potato, rice, maize and vegetable production will solve the problem of hunger and increase standard of living compared with petroleum product that Nigeria relied on as means of livelihood.Keywords: farmers, arable crop, cocoyam, respondents, maize
Procedia PDF Downloads 24823394 Participation of Students and Lecturers in Social Networking for Teaching and Learning in Public Universities in Rivers State, Nigeria
Authors: Nkeiruka Queendarline Nwaizugbu
Abstract:
The use of social media and mobile devices has become acceptable in virtually all areas of today’s world. Hence, this study is a survey that was carried out to find out if students and lecturers in public universities in Rivers State use social networking for educational purposes. The sample of the study comprised of 240 students and 99 lecturers from the University of Port Harcourt and the Rivers State University of science and Technology. The study had five research questions, two hypotheses and the instrument for data collection was a 4-point Likert-type rating scale questionnaire. The data was analysed using mean, standard deviation and z-test. The findings gotten from the analysed data shows that students participate in social networking using different types of web applications but they hardly use them for educational purposes. Some recommendations were also made.Keywords: internet access, mobile learning, participation, social media, social networking, technology
Procedia PDF Downloads 42223393 Handling Missing Data by Using Expectation-Maximization and Expectation-Maximization with Bootstrapping for Linear Functional Relationship Model
Authors: Adilah Abdul Ghapor, Yong Zulina Zubairi, A. H. M. R. Imon
Abstract:
Missing value problem is common in statistics and has been of interest for years. This article considers two modern techniques in handling missing data for linear functional relationship model (LFRM) namely the Expectation-Maximization (EM) algorithm and Expectation-Maximization with Bootstrapping (EMB) algorithm using three performance indicators; namely the mean absolute error (MAE), root mean square error (RMSE) and estimated biased (EB). In this study, we applied the methods of imputing missing values in two types of LFRM namely the full model of LFRM and in LFRM when the slope is estimated using a nonparametric method. Results of the simulation study suggest that EMB algorithm performs much better than EM algorithm in both models. We also illustrate the applicability of the approach in a real data set.Keywords: expectation-maximization, expectation-maximization with bootstrapping, linear functional relationship model, performance indicators
Procedia PDF Downloads 45423392 A Comparative Study of Environment Risk Assessment Guidelines of Developing and Developed Countries Including Bangladesh
Authors: Syeda Fahria Hoque Mimmi, Aparna Islam
Abstract:
Genetically engineered (GE) plants are the need of time for increased demand for food. A complete set of regulations need to be followed from the development of a GE plant to its release into the environment. The whole regulation system is categorized into separate stages for maintaining the proper biosafety. Environmental risk assessment (ERA) is one of such crucial stages in the whole process. ERA identifies potential risks and their impacts through science-based evaluation where it is done in a case-by-case study. All the countries which deal with GE plants follow specific guidelines to conduct a successful ERA. In this study, ERA guidelines of 4 developing and 4 developed countries, including Bangladesh, were compared. ERA guidelines of countries such as India, Canada, Australia, the European Union, Argentina, Brazil, and the US were considered as a model to conduct the comparison study with Bangladesh. Initially, ten parameters were detected to compare the required data and information among all the guidelines. Surprisingly, an adequate amount of data and information requirements (e.g., if the intended modification/new traits of interest has been achieved or not, the growth habit of GE plants, consequences of any potential gene flow upon the cultivation of GE plants to sexually compatible plant species, potential adverse effects on the human health, etc.) matched between all the countries. However, a few differences in data requirement (e.g., agronomic conventions of non-transformed plants, applicants should clearly describe experimental procedures followed, etc.) were also observed in the study. Moreover, it was found that only a few countries provide instructions on the quality of the data used for ERA. If these similarities are recognized in a more framed manner, then the approval pathway of GE plants can be shared.Keywords: GE plants, ERA, harmonization, ERA guidelines, Information and data requirements
Procedia PDF Downloads 18623391 In-service High School Teachers’ Experiences On Blended Teaching Approach Of Mathematics
Authors: Lukholo Raxangana
Abstract:
Fourth Industrial Revolution (4IR)-era teaching offers in-service mathematics teachers opportunities to use blended approaches to engage learners while teaching mathematics. This study explores in-service high school teachers' experiences with a blended teaching approach to mathematics. This qualitative case study involved eight pre-service teachers from four selected schools in the Sedibeng West District of the Gauteng Province. The study used the community of inquiry model as its analytical framework for data analysis. Data collection was through semi-structured interviews and focus-group discussions to explore in-service teachers' experiences with the influence of blended teaching (BT) on learning mathematics. The study results are the impact of load-shedding, benefits of BT, and perceptions of in-service and hindrances of BT. Based on these findings, the study recommends that further research should focus on developing data-free BT tools to assist during load-shedding, regardless of location.Keywords: bended teaching, teachers, in-service, and mathematics
Procedia PDF Downloads 5623390 Auditory Brainstem Response in Wave VI for the Detection of Learning Disabilities
Authors: Maria Isabel Garcia-Planas, Maria Victoria Garcia-Camba
Abstract:
The use of brain stem auditory evoked potential (BAEP) is a common way to study the auditory function of people, a way to learn the functionality of a part of the brain neuronal groups that intervene in the learning process by studying the behaviour of wave VI. The latest advances in neuroscience have revealed the existence of different brain activity in the learning process that can be highlighted through the use of innocuous, low-cost, and easy-access techniques such as, among others, the BAEP that can help us to detect early possible neurodevelopmental difficulties for their subsequent assessment and cure. To date and to the authors' best knowledge, only the latency data obtained, observing the first to V waves and mainly in the left ear, were taken into account. This work shows that it is essential to take into account both ears; with these latest data, it has been possible had diagnosed more precise some cases than with the previous data had been diagnosed as 'normal' despite showing signs of some alteration that motivated the new consultation to the specialist.Keywords: ear, neurodevelopment, auditory evoked potentials, intervals of normality, learning disabilities
Procedia PDF Downloads 16223389 Quantum Cryptography: Classical Cryptography Algorithms’ Vulnerability State as Quantum Computing Advances
Authors: Tydra Preyear, Victor Clincy
Abstract:
Quantum computing presents many computational advantages over classical computing methods due to the utilization of quantum mechanics. The capability of this computing infrastructure poses threats to standard cryptographic systems such as RSA and AES, which are designed for classical computing environments. This paper discusses the impact that quantum computing has on cryptography, while focusing on the evolution from classical cryptographic concepts to quantum and post-quantum cryptographic concepts. Standard Cryptography is essential for securing data by utilizing encryption and decryption methods, and these methods face vulnerability problems due to the advancement of quantum computing. In order to counter these vulnerabilities, the methods that are proposed are quantum cryptography and post-quantum cryptography. Quantum cryptography uses principles such as the uncertainty principle and photon polarization in order to provide secure data transmission. In addition, the concept of Quantum key distribution is introduced to ensure more secure communication channels by distributing cryptographic keys. There is the emergence of post-quantum cryptography which is used for improving cryptographic algorithms in order to be more secure from attacks by classical and quantum computers. Throughout this exploration, the paper mentions the critical role of the advancement of cryptographic methods to keep data integrity and privacy safe from quantum computing concepts. Future research directions that would be discussed would be more effective cryptographic methods through the advancement of technology.Keywords: quantum computing, quantum cryptography, cryptography, data integrity and privacy
Procedia PDF Downloads 2023388 Intelligent Electric Vehicle Charging System (IEVCS)
Authors: Prateek Saxena, Sanjeev Singh, Julius Roy
Abstract:
The security of the power distribution grid remains a paramount to the utility professionals while enhancing and making it more efficient. The most serious threat to the system can be maintaining the transformers, as the load is ever increasing with the addition of elements like electric vehicles. In this paper, intelligent transformer monitoring and grid management has been proposed. The engineering is done to use the evolving data from the smart meter for grid analytics and diagnostics for preventive maintenance. The two-tier architecture for hardware and software integration is coupled to form a robust system for the smart grid. The proposal also presents interoperable meter standards for easy integration. Distribution transformer analytics based on real-time data benefits utilities preventing outages, protects the revenue loss, improves the return on asset and reduces overall maintenance cost by predictive monitoring.Keywords: electric vehicle charging, transformer monitoring, data analytics, intelligent grid
Procedia PDF Downloads 78923387 Self-Organizing Maps for Credit Card Fraud Detection
Authors: ChunYi Peng, Wei Hsuan CHeng, Shyh Kuang Ueng
Abstract:
This study focuses on the application of self-organizing maps (SOM) technology in analyzing credit card transaction data, aiming to enhance the accuracy and efficiency of fraud detection. Som, as an artificial neural network, is particularly suited for pattern recognition and data classification, making it highly effective for the complex and variable nature of credit card transaction data. By analyzing transaction characteristics with SOM, the research identifies abnormal transaction patterns that could indicate potentially fraudulent activities. Moreover, this study has developed a specialized visualization tool to intuitively present the relationships between SOM analysis outcomes and transaction data, aiding financial institution personnel in quickly identifying and responding to potential fraud, thereby reducing financial losses. Additionally, the research explores the integration of SOM technology with composite intelligent system technologies (including finite state machines, fuzzy logic, and decision trees) to further improve fraud detection accuracy. This multimodal approach provides a comprehensive perspective for identifying and understanding various types of fraud within credit card transactions. In summary, by integrating SOM technology with visualization tools and composite intelligent system technologies, this research offers a more effective method of fraud detection for the financial industry, not only enhancing detection accuracy but also deepening the overall understanding of fraudulent activities.Keywords: self-organizing map technology, fraud detection, information visualization, data analysis, composite intelligent system technologies, decision support technologies
Procedia PDF Downloads 5623386 Robust Recognition of Locomotion Patterns via Data-Driven Machine Learning in the Cloud Environment
Authors: Shinoy Vengaramkode Bhaskaran, Kaushik Sathupadi, Sandesh Achar
Abstract:
Human locomotion recognition is important in a variety of sectors, such as robotics, security, healthcare, fitness tracking and cloud computing. With the increasing pervasiveness of peripheral devices, particularly Inertial Measurement Units (IMUs) sensors, researchers have attempted to exploit these advancements in order to precisely and efficiently identify and categorize human activities. This research paper introduces a state-of-the-art methodology for the recognition of human locomotion patterns in a cloud environment. The methodology is based on a publicly available benchmark dataset. The investigation implements a denoising and windowing strategy to deal with the unprocessed data. Next, feature extraction is adopted to abstract the main cues from the data. The SelectKBest strategy is used to abstract optimal features from the data. Furthermore, state-of-the-art ML classifiers are used to evaluate the performance of the system, including logistic regression, random forest, gradient boosting and SVM have been investigated to accomplish precise locomotion classification. Finally, a detailed comparative analysis of results is presented to reveal the performance of recognition models.Keywords: artificial intelligence, cloud computing, IoT, human locomotion, gradient boosting, random forest, neural networks, body-worn sensors
Procedia PDF Downloads 523385 Feature Selection of Personal Authentication Based on EEG Signal for K-Means Cluster Analysis Using Silhouettes Score
Authors: Jianfeng Hu
Abstract:
Personal authentication based on electroencephalography (EEG) signals is one of the important field for the biometric technology. More and more researchers have used EEG signals as data source for biometric. However, there are some disadvantages for biometrics based on EEG signals. The proposed method employs entropy measures for feature extraction from EEG signals. Four type of entropies measures, sample entropy (SE), fuzzy entropy (FE), approximate entropy (AE) and spectral entropy (PE), were deployed as feature set. In a silhouettes calculation, the distance from each data point in a cluster to all another point within the same cluster and to all other data points in the closest cluster are determined. Thus silhouettes provide a measure of how well a data point was classified when it was assigned to a cluster and the separation between them. This feature renders silhouettes potentially well suited for assessing cluster quality in personal authentication methods. In this study, “silhouettes scores” was used for assessing the cluster quality of k-means clustering algorithm is well suited for comparing the performance of each EEG dataset. The main goals of this study are: (1) to represent each target as a tuple of multiple feature sets, (2) to assign a suitable measure to each feature set, (3) to combine different feature sets, (4) to determine the optimal feature weighting. Using precision/recall evaluations, the effectiveness of feature weighting in clustering was analyzed. EEG data from 22 subjects were collected. Results showed that: (1) It is possible to use fewer electrodes (3-4) for personal authentication. (2) There was the difference between each electrode for personal authentication (p<0.01). (3) There is no significant difference for authentication performance among feature sets (except feature PE). Conclusion: The combination of k-means clustering algorithm and silhouette approach proved to be an accurate method for personal authentication based on EEG signals.Keywords: personal authentication, K-mean clustering, electroencephalogram, EEG, silhouettes
Procedia PDF Downloads 28323384 Developing an Active Leisure Wear Range: A Dilemma for Khanna Enterprises
Authors: Jagriti Mishra, Vasundhara Chaudhary
Abstract:
Introduction: The case highlights various issues and challenges faced by Khanna Enterprises while conceptualizing and execution of launching Active Leisure wear in the domestic market, where different steps involved in the range planning and production have been elaborated. Although Khanna Enterprises was an established company which dealt in the production of knitted and woven garments, they took the risk of launching a new concept- Active Leisure wear for Millennials. Methodology: It is based on primary and secondary research where data collection has been done through survey, in-depth interviews and various reports, forecasts, and journals. Findings: The research through primary and secondary data and execution of active leisure wear substantiated the acceptance, not only by the millennials but also by the generation X. There was a demand of bigger sizes as well as more muted colours. Conclusion: The sales data paved the way for future product development in tune with the strengths of Khanna Enterprises.Keywords: millennials, range planning, production, active leisure wear
Procedia PDF Downloads 20923383 A Review of Data Visualization Best Practices: Lessons for Open Government Data Portals
Authors: Bahareh Ansari
Abstract:
Background: The Open Government Data (OGD) movement in the last decade has encouraged many government organizations around the world to make their data publicly available to advance democratic processes. But current open data platforms have not yet reached to their full potential in supporting all interested parties. To make the data useful and understandable for everyone, scholars suggested that opening the data should be supplemented by visualization. However, different visualizations of the same information can dramatically change an individual’s cognitive and emotional experience in working with the data. This study reviews the data visualization literature to create a list of the methods empirically tested to enhance users’ performance and experience in working with a visualization tool. This list can be used in evaluating the OGD visualization practices and informing the future open data initiatives. Methods: Previous reviews of visualization literature categorized the visualization outcomes into four categories including recall/memorability, insight/comprehension, engagement, and enjoyment. To identify the papers, a search for these outcomes was conducted in the abstract of the publications of top-tier visualization venues including IEEE Transactions for Visualization and Computer Graphics, Computer Graphics, and proceedings of the CHI Conference on Human Factors in Computing Systems. The search results are complemented with a search in the references of the identified articles, and a search for 'open data visualization,' and 'visualization evaluation' keywords in the IEEE explore and ACM digital libraries. Articles are included if they provide empirical evidence through conducting controlled user experiments, or provide a review of these empirical studies. The qualitative synthesis of the studies focuses on identification and classifying the methods, and the conditions under which they are examined to positively affect the visualization outcomes. Findings: The keyword search yields 760 studies, of which 30 are included after the title/abstract review. The classification of the included articles shows five distinct methods: interactive design, aesthetic (artistic) style, storytelling, decorative elements that do not provide extra information including text, image, and embellishment on the graphs), and animation. Studies on decorative elements show consistency on the positive effects of these elements on user engagement and recall but are less consistent in their examination of the user performance. This inconsistency could be attributable to the particular data type or specific design method used in each study. The interactive design studies are consistent in their findings of the positive effect on the outcomes. Storytelling studies show some inconsistencies regarding the design effect on user engagement, enjoyment, recall, and performance, which could be indicative of the specific conditions required for the use of this method. Last two methods, aesthetics and animation, have been less frequent in the included articles, and provide consistent positive results on some of the outcomes. Implications for e-government: Review of the visualization best-practice methods show that each of these methods is beneficial under specific conditions. By using these methods in a potentially beneficial condition, OGD practices can promote a wide range of individuals to involve and work with the government data and ultimately engage in government policy-making procedures.Keywords: best practices, data visualization, literature review, open government data
Procedia PDF Downloads 10423382 Reduced Power Consumption by Randomization for DSI3
Authors: David Levy
Abstract:
The newly released Distributed System Interface 3 (DSI3) Bus Standard specification defines 3 modulation levels from which 16 valid symbols are coded. This structure creates power consumption variations depending on the transmitted data of a factor of more than 2 between minimum and maximum. The power generation unit has to consider therefore the worst case maximum consumption all the time and be built accordingly. This paper proposes a method to reduce both the average current consumption and worst case current consumption. The transmitter randomizes the data using several pseudo-random sequences. It then estimates the energy consumption of the generated frames and selects to transmit the one which consumes the least. The transmitter also prepends the index of the pseudo-random sequence, which is not randomized, to allow the receiver to recover the original data using the correct sequence. We show that in the case that the frame occupies most of the DSI3 synchronization period, we achieve average power consumption reduction by up to 13% and the worst case power consumption is reduced by 17.7%.Keywords: DSI3, energy, power consumption, randomization
Procedia PDF Downloads 53623381 Ensemble-Based SVM Classification Approach for miRNA Prediction
Authors: Sondos M. Hammad, Sherin M. ElGokhy, Mahmoud M. Fahmy, Elsayed A. Sallam
Abstract:
In this paper, an ensemble-based Support Vector Machine (SVM) classification approach is proposed. It is used for miRNA prediction. Three problems, commonly associated with previous approaches, are alleviated. These problems arise due to impose assumptions on the secondary structural of premiRNA, imbalance between the numbers of the laboratory checked miRNAs and the pseudo-hairpins, and finally using a training data set that does not consider all the varieties of samples in different species. We aggregate the predicted outputs of three well-known SVM classifiers; namely, Triplet-SVM, Virgo and Mirident, weighted by their variant features without any structural assumptions. An additional SVM layer is used in aggregating the final output. The proposed approach is trained and then tested with balanced data sets. The results of the proposed approach outperform the three base classifiers. Improved values for the metrics of 88.88% f-score, 92.73% accuracy, 90.64% precision, 96.64% specificity, 87.2% sensitivity, and the area under the ROC curve is 0.91 are achieved.Keywords: MiRNAs, SVM classification, ensemble algorithm, assumption problem, imbalance data
Procedia PDF Downloads 34723380 Quality of Life of Patients on Oral Antiplatelet Therapy in Outpatient Cardiac Department Dr. Hasan Sadikin Central General Hospital Bandung
Authors: Andhiani Sharfina Arnellya, Mochammad Indra Permana, Dika Pramita Destiani, Ellin Febrina
Abstract:
Health Research Data, Ministry of Health of Indonesia in 2007, showed coronary heart disease (CHD) or coronary artery disease (CAD) was the third leading cause of death in Indonesia after hypertension and stroke with 7.2% incidence rate. Antiplatelet is one of the important therapy in management of patients with CHD. In addition to therapeutic effect on patients, quality of life is one aspect of another assessment to see the success of antiplatelet therapy. The purpose of this study was to determine the quality of life of patients on oral antiplatelet therapy in outpatient cardiac department Dr. Hasan Sadikin central general hospital, Bandung, Indonesia. This research is a cross sectional by collecting data through quality of life questionnaire of patients which performed prospectively as primary data and secondary data from medical record of patients. The results of this study showed that 54.3% of patients had a good quality of life, 45% had a moderate quality of life, and 0.7% had a poor quality of life. There are no significant differences in quality of life-based on age, gender, diagnosis, and duration of drug use.Keywords: antiplatelet, quality of life, coronary artery disease, coronary heart disease
Procedia PDF Downloads 32223379 Commissioning of a Flattening Filter Free (FFF) using an Anisotropic Analytical Algorithm (AAA)
Authors: Safiqul Islam, Anamul Haque, Mohammad Amran Hossain
Abstract:
Aim: To compare the dosimetric parameters of the flattened and flattening filter free (FFF) beam and to validate the beam data using anisotropic analytical algorithm (AAA). Materials and Methods: All the dosimetric data’s (i.e. depth dose profiles, profile curves, output factors, penumbra etc.) required for the beam modeling of AAA were acquired using the Blue Phantom RFA for 6 MV, 6 FFF, 10MV & 10FFF. Progressive resolution Optimizer and Dose Volume Optimizer algorithm for VMAT and IMRT were are also configured in the beam model. Beam modeling of the AAA were compared with the measured data sets. Results: Due to the higher and lover energy component in 6FFF and 10 FFF the surface doses are 10 to 15% higher compared to flattened 6 MV and 10 MV beams. FFF beam has a lower mean energy compared to the flattened beam and the beam quality index were 6 MV 0.667, 6FFF 0.629, 10 MV 0.74 and 10 FFF 0.695 respectively. Gamma evaluation with 2% dose and 2 mm distance criteria for the Open Beam, IMRT and VMAT plans were also performed and found a good agreement between the modeled and measured data. Conclusion: We have successfully modeled the AAA algorithm for the flattened and FFF beams and achieved a good agreement with the calculated and measured value.Keywords: commissioning of a Flattening Filter Free (FFF) , using an Anisotropic Analytical Algorithm (AAA), flattened beam, parameters
Procedia PDF Downloads 29823378 Molecular Characterization of Polyploid Bamboo (Dendrocalamus hamiltonii) Using Microsatellite Markers
Authors: Rajendra K. Meena, Maneesh S. Bhandari, Santan Barthwal, Harish S. Ginwal
Abstract:
Microsatellite markers are the most valuable tools for the characterization of plant genetic resources or population genetic analysis. Since it is codominant and allelic markers, utilizing them in polyploid species remained doubtful. In such cases, the microsatellite marker is usually analyzed by treating them as a dominant marker. In the current study, it has been showed that despite losing the advantage of co-dominance, microsatellite markers are still a powerful tool for genotyping of polyploid species because of availability of large number of reproducible alleles per locus. It has been studied by genotyping of 19 subpopulations of Dendrocalamus hamiltonii (hexaploid bamboo species) with 17 polymorphic simple sequence repeat (SSR) primer pairs. Among these, ten primers gave typical banding pattern of microsatellite marker as expected in diploid species, but rest 7 gave an unusual pattern, i.e., more than two bands per locus per genotype. In such case, genotyping data are generally analyzed by considering as dominant markers. In the current study, data were analyzed in both ways as dominant and co-dominant. All the 17 primers were first scored as nonallelic data and analyzed; later, the ten primers giving standard banding patterns were analyzed as allelic data and the results were compared. The UPGMA clustering and genetic structure showed that results obtained with both the data sets are very similar with slight variation, and therefore the SSR marker could be utilized to characterize polyploid species by considering them as a dominant marker. The study is highly useful to widen the scope for SSR markers applications and beneficial to the researchers dealing with polyploid species.Keywords: microsatellite markers, Dendrocalamus hamiltonii, dominant and codominant, polyploids
Procedia PDF Downloads 14123377 Big Data Analysis Approach for Comparison New York Taxi Drivers' Operation Patterns between Workdays and Weekends Focusing on the Revenue Aspect
Authors: Yongqi Dong, Zuo Zhang, Rui Fu, Li Li
Abstract:
The records generated by taxicabs which are equipped with GPS devices is of vital importance for studying human mobility behavior, however, here we are focusing on taxi drivers' operation strategies between workdays and weekends temporally and spatially. We identify a group of valuable characteristics through large scale drivers' behavior in a complex metropolis environment. Based on the daily operations of 31,000 taxi drivers in New York City, we classify drivers into top, ordinary and low-income groups according to their monthly working load, daily income, daily ranking and the variance of the daily rank. Then, we apply big data analysis and visualization methods to compare the different characteristics among top, ordinary and low income drivers in selecting of working time, working area as well as strategies between workdays and weekends. The results verify that top drivers do have special operation tactics to help themselves serve more passengers, travel faster thus make more money per unit time. This research provides new possibilities for fully utilizing the information obtained from urban taxicab data for estimating human behavior, which is not only very useful for individual taxicab driver but also to those policy-makers in city authorities.Keywords: big data, operation strategies, comparison, revenue, temporal, spatial
Procedia PDF Downloads 22623376 Using Morlet Wavelet Filter to Denoising Geoelectric ‘Disturbances’ Map of Moroccan Phosphate Deposit ‘Disturbances’
Authors: Saad Bakkali
Abstract:
Morocco is a major producer of phosphate, with an annual output of 19 million tons and reserves in excess of 35 billion cubic meters. This represents more than 75% of world reserves. Resistivity surveys have been successfully used in the Oulad Abdoun phosphate basin. A Schlumberger resistivity survey over an area of 50 hectares was carried out. A new field procedure based on analytic signal response of resistivity data was tested to deal with the presence of phosphate deposit disturbances. A resistivity map was expected to allow the electrical resistivity signal to be imaged in 2D. 2D wavelet is standard tool in the interpretation of geophysical potential field data. Wavelet transform is particularly suitable in denoising, filtering and analyzing geophysical data singularities. Wavelet transform tools are applied to analysis of a moroccan phosphate deposit ‘disturbances’. Wavelet approach applied to modeling surface phosphate “disturbances” was found to be consistently useful.Keywords: resistivity, Schlumberger, phosphate, wavelet, Morocco
Procedia PDF Downloads 41523375 Imputation of Incomplete Large-Scale Monitoring Count Data via Penalized Estimation
Authors: Mohamed Dakki, Genevieve Robin, Marie Suet, Abdeljebbar Qninba, Mohamed A. El Agbani, Asmâa Ouassou, Rhimou El Hamoumi, Hichem Azafzaf, Sami Rebah, Claudia Feltrup-Azafzaf, Nafouel Hamouda, Wed a.L. Ibrahim, Hosni H. Asran, Amr A. Elhady, Haitham Ibrahim, Khaled Etayeb, Essam Bouras, Almokhtar Saied, Ashrof Glidan, Bakar M. Habib, Mohamed S. Sayoud, Nadjiba Bendjedda, Laura Dami, Clemence Deschamps, Elie Gaget, Jean-Yves Mondain-Monval, Pierre Defos Du Rau
Abstract:
In biodiversity monitoring, large datasets are becoming more and more widely available and are increasingly used globally to estimate species trends and con- servation status. These large-scale datasets challenge existing statistical analysis methods, many of which are not adapted to their size, incompleteness and heterogeneity. The development of scalable methods to impute missing data in incomplete large-scale monitoring datasets is crucial to balance sampling in time or space and thus better inform conservation policies. We developed a new method based on penalized Poisson models to impute and analyse incomplete monitoring data in a large-scale framework. The method al- lows parameterization of (a) space and time factors, (b) the main effects of predic- tor covariates, as well as (c) space–time interactions. It also benefits from robust statistical and computational capability in large-scale settings. The method was tested extensively on both simulated and real-life waterbird data, with the findings revealing that it outperforms six existing methods in terms of missing data imputation errors. Applying the method to 16 waterbird species, we estimated their long-term trends for the first time at the entire North African scale, a region where monitoring data suffer from many gaps in space and time series. This new approach opens promising perspectives to increase the accuracy of species-abundance trend estimations. We made it freely available in the r package ‘lori’ (https://CRAN.R-project.org/package=lori) and recommend its use for large- scale count data, particularly in citizen science monitoring programmes.Keywords: biodiversity monitoring, high-dimensional statistics, incomplete count data, missing data imputation, waterbird trends in North-Africa
Procedia PDF Downloads 15323374 The Role of Cholesterol Oxidase of Mycobacterium tuberculosis in the Down-Regulation of TLR2-Signaling Pathway in Human Macrophages during Infection Process
Authors: Michal Kielbik, Izabela Szulc-Kielbik, Anna Brzostek, Jaroslaw Dziadek, Magdalena Klink
Abstract:
The goal of many research groups in the world is to find new components that are important for survival of mycobacteria in the host cells. Mycobacterium tuberculosis (Mtb) possesses a number of enzymes degrading cholesterol that are considered to be an important factor for its survival and persistence in host macrophages. One of them - cholesterol oxidase (ChoD), although not being essential for cholesterol degradation, is discussed as a virulence compound, however its involvement in macrophages’ response to Mtb is still not sufficiently determined. The recognition of tubercle bacilli antigens by pathogen recognition receptors is crucial for the initiation of the host innate immune response. An important receptor that has been implicated in the recognition and/or uptake of Mtb is Toll-like receptor type 2 (TLR2). Engagement of TLR2 results in the activation and phosphorylation of intracellular signaling proteins including IRAK-1 and -4, TRAF-6, which in turn leads to the activation of target kinases and transcription factors responsible for bactericidal and pro-inflammatory response of macrophages. The aim of these studies was a detailed clarification of the role of Mtb cholesterol oxidase as a virulence factor affecting the TLR2 signaling pathway in human macrophages. As human macrophages the THP-1 differentiated cells were applied. The virulent wild-type Mtb strain (H37Rv), its mutant lacking a functional copy of gene encoding cholesterol oxidase (∆choD), as well as complimented strain (∆choD–choD) were used. We tested the impact of Mtb strains on the expression of TLR2-depended signaling proteins (mRNA level, cytosolic level and phosphorylation status). The cytokine and bactericidal response of THP-1 derived macrophages infected with Mtb strains in relation to TLR2 signaling pathway dependence was also determined. We found that during the 24-hours of infection process the wild-type and complemented Mtb significantly reduced the cytosolic level and phosphorylation status of IRAK-4 and TRAF-6 proteins in macrophages, that was not observed in the case of ΔchoD mutant. Decreasement of TLR2-dependent signaling proteins, induced by wild-type Mtb, was not dependent on the activity of proteasome. Blocking of TLR2 expression, before infection, effectively prevented the induced by wild-type strain reduction of cytosolic level and phosphorylation of IRAK-4. None of the strains affected the surface expression of TLR2. The mRNA level of IRAK-4 and TRAF-6 genes were significantly increased in macrophages 24 hours post-infection with either of tested strains. However, the impact of wild-type Mtb strain on both examined genes was significantly stronger than its ΔchoD mutant. We also found that wild-type strain stimulated macrophages to release high amount of immunosuppressive IL-10, accompanied by low amount of pro-inflammatory IL-8 and bactericidal nitric oxide in comparison to mutant lacking cholesterol oxidase. The influence of wild-type Mtb on this type of macrophages' response strongly dependent on fully active IRAK-1 and IRAK-4 signaling proteins. In conclusion, Mtb using cholesterol oxidase causes the over-activation of TLR2 signaling proteins leading to the reduction of their cytosolic level and activity resulting in the modulation of macrophages response to allow its intracellular survival. Supported by grant: 2014/15/B/NZ6/01565, National Science Center, PolandKeywords: Mycobacterium tuberculosis, cholesterol oxidase, macrophages, TLR2-dependent signaling pathway
Procedia PDF Downloads 41523373 Statistical Investigation Projects: A Way for Pre-Service Mathematics Teachers to Actively Solve a Campus Problem
Authors: Muhammet Şahal, Oğuz Köklü
Abstract:
As statistical thinking and problem-solving processes have become increasingly important, teachers need to be more rigorously prepared with statistical knowledge to teach their students effectively. This study examined preservice mathematics teachers' development of statistical investigation projects using data and exploratory data analysis tools, following a design-based research perspective and statistical investigation cycle. A total of 26 pre-service senior mathematics teachers from a public university in Turkiye participated in the study. They formed groups of 3-4 members voluntarily and worked on their statistical investigation projects for six weeks. The data sources were audio recordings of pre-service teachers' group discussions while working on their projects in class, whole-class video recordings, and each group’s weekly and final reports. As part of the study, we reviewed weekly reports, provided timely feedback specific to each group, and revised the following week's class work based on the groups’ needs and development in their project. We used content analysis to analyze groups’ audio and classroom video recordings. The participants encountered several difficulties, which included formulating a meaningful statistical question in the early phase of the investigation, securing the most suitable data collection strategy, and deciding on the data analysis method appropriate for their statistical questions. The data collection and organization processes were challenging for some groups and revealed the importance of comprehensive planning. Overall, preservice senior mathematics teachers were able to work on a statistical project that contained the formulation of a statistical question, planning, data collection, analysis, and reaching a conclusion holistically, even though they faced challenges because of their lack of experience. The study suggests that preservice senior mathematics teachers have the potential to apply statistical knowledge and techniques in a real-world context, and they could proceed with the project with the support of the researchers. We provided implications for the statistical education of teachers and future research.Keywords: design-based study, pre-service mathematics teachers, statistical investigation projects, statistical model
Procedia PDF Downloads 8123372 The Study on the Tourism Routes to Create Interpretation for Promote Cultural Tourism in Bangnoi Floating Market, Bangkontee District, Samut Songkhram Province, Thailand
Authors: Pornnapat Berndt
Abstract:
The purpose of this research is to study the tourism routes in Bangnoi Floating Market, Bangkhontee District, Samut Songkhram province, Thailand in order to create type and form of interpretation to promote cultural tourism based on local community and visitor requirement. To accomplish the goals and objectives, qualitative research will be applied. The research instruments used are observation, questionnaires, basic interviews, in-depth interviews, focus group, interviewed of key local informants including site visitors. The study also uses both primary data and secondary data. A Statistical Package for Social Sciences (SPSS) was used to analyze the data. Descriptive and inferential statistics such as tables, percentage, mean and standard deviation were used for data analysis and summary. From research result, it is revealed that the local community requirement on types of interpretation conforms to visitors require which need guide post, guide book, etc. with up to date and informally content to present Bangnoi Floating Market which got the most demand score (3.78) considered as most wanted demand.Keywords: interpretation, cultural tourism, tourism route, local community, stakeholders participated
Procedia PDF Downloads 291