Search results for: data infrastructure
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 25645

Search results for: data infrastructure

23515 A Review of Methods for Handling Missing Data in the Formof Dropouts in Longitudinal Clinical Trials

Authors: A. Satty, H. Mwambi

Abstract:

Much clinical trials data-based research are characterized by the unavoidable problem of dropout as a result of missing or erroneous values. This paper aims to review some of the various techniques to address the dropout problems in longitudinal clinical trials. The fundamental concepts of the patterns and mechanisms of dropout are discussed. This study presents five general techniques for handling dropout: (1) Deletion methods; (2) Imputation-based methods; (3) Data augmentation methods; (4) Likelihood-based methods; and (5) MNAR-based methods. Under each technique, several methods that are commonly used to deal with dropout are presented, including a review of the existing literature in which we examine the effectiveness of these methods in the analysis of incomplete data. Two application examples are presented to study the potential strengths or weaknesses of some of the methods under certain dropout mechanisms as well as to assess the sensitivity of the modelling assumptions.

Keywords: incomplete longitudinal clinical trials, missing at random (MAR), imputation, weighting methods, sensitivity analysis

Procedia PDF Downloads 399
23514 Feedback Preference and Practice of English Majors’ in Pronunciation Instruction

Authors: Claerchille Jhulia Robin

Abstract:

This paper discusses the perspective of ESL learners towards pronunciation instruction. It sought to determine how these learners view the type of feedback their speech teacher gives and its impact on their own classroom practice of providing feedback. This study utilized a quantitative-qualitative approach to the problem. The respondents were Education students majoring in English. A survey questionnaire and interview guide were used for data gathering. The data from the survey was tabulated using frequency count and the data from the interview were then transcribed and analyzed. Results showed that ESL learners favor immediate corrective feedback and they do not find any issue in being corrected in front of their peers. They also practice the same corrective technique in their own classroom.

Keywords: ESL, feedback, learner perspective, pronunciation instruction

Procedia PDF Downloads 215
23513 Automatic Tagging and Accuracy in Assamese Text Data

Authors: Chayanika Hazarika Bordoloi

Abstract:

This paper is an attempt to work on a highly inflectional language called Assamese. This is also one of the national languages of India and very little has been achieved in terms of computational research. Building a language processing tool for a natural language is not very smooth as the standard and language representation change at various levels. This paper presents inflectional suffixes of Assamese verbs and how the statistical tools, along with linguistic features, can improve the tagging accuracy. Conditional random fields (CRF tool) was used to automatically tag and train the text data; however, accuracy was improved after linguistic featured were fed into the training data. Assamese is a highly inflectional language; hence, it is challenging to standardizing its morphology. Inflectional suffixes are used as a feature of the text data. In order to analyze the inflections of Assamese word forms, a list of suffixes is prepared. This list comprises suffixes, comprising of all possible suffixes that various categories can take is prepared. Assamese words can be classified into inflected classes (noun, pronoun, adjective and verb) and un-inflected classes (adverb and particle). The corpus used for this morphological analysis has huge tokens. The corpus is a mixed corpus and it has given satisfactory accuracy. The accuracy rate of the tagger has gradually improved with the modified training data.

Keywords: CRF, morphology, tagging, tagset

Procedia PDF Downloads 180
23512 A Human Activity Recognition System Based on Sensory Data Related to Object Usage

Authors: M. Abdullah, Al-Wadud

Abstract:

Sensor-based activity recognition systems usually accounts which sensors have been activated to perform an activity. The system then combines the conditional probabilities of those sensors to represent different activities and takes the decision based on that. However, the information about the sensors which are not activated may also be of great help in deciding which activity has been performed. This paper proposes an approach where the sensory data related to both usage and non-usage of objects are utilized to make the classification of activities. Experimental results also show the promising performance of the proposed method.

Keywords: Naïve Bayesian, based classification, activity recognition, sensor data, object-usage model

Procedia PDF Downloads 309
23511 Application of Post-Stack and Pre-Stack Seismic Inversion for Prediction of Hydrocarbon Reservoirs in a Persian Gulf Gas Field

Authors: Nastaran Moosavi, Mohammad Mokhtari

Abstract:

Seismic inversion is a technique which has been in use for years and its main goal is to estimate and to model physical characteristics of rocks and fluids. Generally, it is a combination of seismic and well-log data. Seismic inversion can be carried out through different methods; we have conducted and compared post-stack and pre- stack seismic inversion methods on real data in one of the fields in the Persian Gulf. Pre-stack seismic inversion can transform seismic data to rock physics such as P-impedance, S-impedance and density. While post- stack seismic inversion can just estimate P-impedance. Then these parameters can be used in reservoir identification. Based on the results of inverting seismic data, a gas reservoir was detected in one of Hydrocarbon oil fields in south of Iran (Persian Gulf). By comparing post stack and pre-stack seismic inversion it can be concluded that the pre-stack seismic inversion provides a more reliable and detailed information for identification and prediction of hydrocarbon reservoirs.

Keywords: density, p-impedance, s-impedance, post-stack seismic inversion, pre-stack seismic inversion

Procedia PDF Downloads 304
23510 A Data-Driven Monitoring Technique Using Combined Anomaly Detectors

Authors: Fouzi Harrou, Ying Sun, Sofiane Khadraoui

Abstract:

Anomaly detection based on Principal Component Analysis (PCA) was studied intensively and largely applied to multivariate processes with highly cross-correlated process variables. Monitoring metrics such as the Hotelling's T2 and the Q statistics are usually used in PCA-based monitoring to elucidate the pattern variations in the principal and residual subspaces, respectively. However, these metrics are ill suited to detect small faults. In this paper, the Exponentially Weighted Moving Average (EWMA) based on the Q and T statistics, T2-EWMA and Q-EWMA, were developed for detecting faults in the process mean. The performance of the proposed methods was compared with that of the conventional PCA-based fault detection method using synthetic data. The results clearly show the benefit and the effectiveness of the proposed methods over the conventional PCA method, especially for detecting small faults in highly correlated multivariate data.

Keywords: data-driven method, process control, anomaly detection, dimensionality reduction

Procedia PDF Downloads 279
23509 Leveraging Power BI for Advanced Geotechnical Data Analysis and Visualization in Mining Projects

Authors: Elaheh Talebi, Fariba Yavari, Lucy Philip, Lesley Town

Abstract:

The mining industry generates vast amounts of data, necessitating robust data management systems and advanced analytics tools to achieve better decision-making processes in the development of mining production and maintaining safety. This paper highlights the advantages of Power BI, a powerful intelligence tool, over traditional Excel-based approaches for effectively managing and harnessing mining data. Power BI enables professionals to connect and integrate multiple data sources, ensuring real-time access to up-to-date information. Its interactive visualizations and dashboards offer an intuitive interface for exploring and analyzing geotechnical data. Advanced analytics is a collection of data analysis techniques to improve decision-making. Leveraging some of the most complex techniques in data science, advanced analytics is used to do everything from detecting data errors and ensuring data accuracy to directing the development of future project phases. However, while Power BI is a robust tool, specific visualizations required by geotechnical engineers may have limitations. This paper studies the capability to use Python or R programming within the Power BI dashboard to enable advanced analytics, additional functionalities, and customized visualizations. This dashboard provides comprehensive tools for analyzing and visualizing key geotechnical data metrics, including spatial representation on maps, field and lab test results, and subsurface rock and soil characteristics. Advanced visualizations like borehole logs and Stereonet were implemented using Python programming within the Power BI dashboard, enhancing the understanding and communication of geotechnical information. Moreover, the dashboard's flexibility allows for the incorporation of additional data and visualizations based on the project scope and available data, such as pit design, rock fall analyses, rock mass characterization, and drone data. This further enhances the dashboard's usefulness in future projects, including operation, development, closure, and rehabilitation phases. Additionally, this helps in minimizing the necessity of utilizing multiple software programs in projects. This geotechnical dashboard in Power BI serves as a user-friendly solution for analyzing, visualizing, and communicating both new and historical geotechnical data, aiding in informed decision-making and efficient project management throughout various project stages. Its ability to generate dynamic reports and share them with clients in a collaborative manner further enhances decision-making processes and facilitates effective communication within geotechnical projects in the mining industry.

Keywords: geotechnical data analysis, power BI, visualization, decision-making, mining industry

Procedia PDF Downloads 73
23508 Mangroves in the Douala Area, Cameroon: The Challenges of Open Access Resources for Forest Governance

Authors: Bissonnette Jean-François, Dossa Fabrice

Abstract:

The project focuses on analyzing the spatial and temporal evolution of mangrove forest ecosystems near the city of Douala, Cameroon, in response to increasing human and environmental pressures. The selected study area, located in the Wouri River estuary, has a unique combination of economic importance, and ecological prominence. The study included valuable insights by conducting semi-structured interviews with resource operators and local officials. The thorough analysis of socio-economic data, farmer surveys, and satellite-derived information was carried out utilizing quantitative approaches in Excel and SPSS. Simultaneously, qualitative data was subjected to rigorous classification and correlation with other sources. The use of ArcGIS and CorelDraw facilitated the visual representation of the gradual changes seen in various land cover classifications. The research reveals complex processes that characterize mangrove ecosystems on Manoka and Cape Cameroon Islands. The lack of regulations in urbanization and the continuous growth of infrastructure have led to a significant increase in land conversion, causing negative impacts on natural landscapes and forests. The repeated instances of flooding and coastal erosion have further shaped landscape alterations, fostering the proliferation of water and mudflat areas. The unregulated use of mangrove resources is a significant factor in the degradation of these ecosystems. Activities including the use of wood for smoking and fishing, together with the coastal pollution resulting from the absence of waste collection, have had a significant influence. In addition, forest operators contribute to the degradation of vegetation, hence exacerbating the harmful impact of invasive species on the ecosystem. Strategic interventions are necessary to guarantee the sustainable management of these ecosystems. The proposals include advocating for sustainable wood exploitation techniques, using appropriate techniques, along with regeneration, and enforcing rules to prevent wood overexploitation. By implementing these measures, the ecological balance can be preserved, safeguarding the long-term viability of these precious ecosystems. On a conceptual level, this paper uses the framework developed by Elinor Ostrom and her colleagues to investigate the consequences of open access resources, where local actors have not been able to enforce measures to prevent overexploitation of mangrove wood resources. Governmental authorities have demonstrated limited capacity to enforce sustainable management of wood resources and have not been able to establish effective relationships with local fishing communities and with communities involved in the purchase of wood. As a result, wood resources in the mangrove areas remain largely accessible, while authorities do not monitor wood volumes extracted nor methods of exploitation. There have only been limited and punctual attempts at forest restoration with no significant consequence on mangrove forests dynamics.

Keywords: Mangroves, forest management, governance, open access resources, Cameroon

Procedia PDF Downloads 40
23507 An Investigation of E-Government by Using GIS and Establishing E-Government in Developing Countries Case Study: Iraq

Authors: Ahmed M. Jamel

Abstract:

Electronic government initiatives and public participation to them are among the indicators of today's development criteria of the countries. After consequent two wars, Iraq's current position in, for example, UN's e-government ranking is quite concerning and did not improve in recent years, either. In the preparation of this work, we are motivated with the fact that handling geographic data of the public facilities and resources are needed in most of the e-government projects. Geographical information systems (GIS) provide most common tools not only to manage spatial data but also to integrate such type of data with nonspatial attributes of the features. With this background, this paper proposes that establishing a working GIS in the health sector of Iraq would improve e-government applications. As the case study, investigating hospital locations in Erbil is chosen.

Keywords: e-government, GIS, Iraq, Erbil

Procedia PDF Downloads 374
23506 Evaluation of Classification Algorithms for Diagnosis of Asthma in Iranian Patients

Authors: Taha SamadSoltani, Peyman Rezaei Hachesu, Marjan GhaziSaeedi, Maryam Zolnoori

Abstract:

Introduction: Data mining defined as a process to find patterns and relationships along data in the database to build predictive models. Application of data mining extended in vast sectors such as the healthcare services. Medical data mining aims to solve real-world problems in the diagnosis and treatment of diseases. This method applies various techniques and algorithms which have different accuracy and precision. The purpose of this study was to apply knowledge discovery and data mining techniques for the diagnosis of asthma based on patient symptoms and history. Method: Data mining includes several steps and decisions should be made by the user which starts by creation of an understanding of the scope and application of previous knowledge in this area and identifying KD process from the point of view of the stakeholders and finished by acting on discovered knowledge using knowledge conducting, integrating knowledge with other systems and knowledge documenting and reporting.in this study a stepwise methodology followed to achieve a logical outcome. Results: Sensitivity, Specifity and Accuracy of KNN, SVM, Naïve bayes, NN, Classification tree and CN2 algorithms and related similar studies was evaluated and ROC curves were plotted to show the performance of the system. Conclusion: The results show that we can accurately diagnose asthma, approximately ninety percent, based on the demographical and clinical data. The study also showed that the methods based on pattern discovery and data mining have a higher sensitivity compared to expert and knowledge-based systems. On the other hand, medical guidelines and evidence-based medicine should be base of diagnostics methods, therefore recommended to machine learning algorithms used in combination with knowledge-based algorithms.

Keywords: asthma, datamining, classification, machine learning

Procedia PDF Downloads 433
23505 Application of GPRS in Water Quality Monitoring System

Authors: V. Ayishwarya Bharathi, S. M. Hasker, J. Indhu, M. Mohamed Azarudeen, G. Gowthami, R. Vinoth Rajan, N. Vijayarangan

Abstract:

Identification of water quality conditions in a river system based on limited observations is an essential task for meeting the goals of environmental management. The traditional method of water quality testing is to collect samples manually and then send to laboratory for analysis. However, it has been unable to meet the demands of water quality monitoring today. So a set of automatic measurement and reporting system of water quality has been developed. In this project specifies Water quality parameters collected by multi-parameter water quality probe are transmitted to data processing and monitoring center through GPRS wireless communication network of mobile. The multi parameter sensor is directly placed above the water level. The monitoring center consists of GPRS and micro-controller which monitor the data. The collected data can be monitor at any instant of time. In the pollution control board they will monitor the water quality sensor data in computer using Visual Basic Software. The system collects, transmits and processes water quality parameters automatically, so production efficiency and economy benefit are improved greatly. GPRS technology can achieve well within the complex environment of poor water quality non-monitored, and more specifically applicable to the collection point, data transmission automatically generate the field of water analysis equipment data transmission and monitoring.

Keywords: multiparameter sensor, GPRS, visual basic software, RS232

Procedia PDF Downloads 387
23504 Decision Support System in Air Pollution Using Data Mining

Authors: E. Fathallahi Aghdam, V. Hosseini

Abstract:

Environmental pollution is not limited to a specific region or country; that is why sustainable development, as a necessary process for improvement, pays attention to issues such as destruction of natural resources, degradation of biological system, global pollution, and climate change in the world, especially in the developing countries. According to the World Health Organization, as a developing city, Tehran (capital of Iran) is one of the most polluted cities in the world in terms of air pollution. In this study, three pollutants including particulate matter less than 10 microns, nitrogen oxides, and sulfur dioxide were evaluated in Tehran using data mining techniques and through Crisp approach. The data from 21 air pollution measuring stations in different areas of Tehran were collected from 1999 to 2013. Commercial softwares Clementine was selected for this study. Tehran was divided into distinct clusters in terms of the mentioned pollutants using the software. As a data mining technique, clustering is usually used as a prologue for other analyses, therefore, the similarity of clusters was evaluated in this study through analyzing local conditions, traffic behavior, and industrial activities. In fact, the results of this research can support decision-making system, help managers improve the performance and decision making, and assist in urban studies.

Keywords: data mining, clustering, air pollution, crisp approach

Procedia PDF Downloads 416
23503 Test Suite Optimization Using an Effective Meta-Heuristic BAT Algorithm

Authors: Anuradha Chug, Sunali Gandhi

Abstract:

Regression Testing is a very expensive and time-consuming process carried out to ensure the validity of modified software. Due to the availability of insufficient resources to re-execute all the test cases in time constrained environment, efforts are going on to generate test data automatically without human efforts. Many search based techniques have been proposed to generate efficient, effective as well as optimized test data, so that the overall cost of the software testing can be minimized. The generated test data should be able to uncover all potential lapses that exist in the software or product. Inspired from the natural behavior of bat for searching her food sources, current study employed a meta-heuristic, search-based bat algorithm for optimizing the test data on the basis certain parameters without compromising their effectiveness. Mathematical functions are also applied that can effectively filter out the redundant test data. As many as 50 Java programs are used to check the effectiveness of proposed test data generation and it has been found that 86% saving in testing efforts can be achieved using bat algorithm while covering 100% of the software code for testing. Bat algorithm was found to be more efficient in terms of simplicity and flexibility when the results were compared with another nature inspired algorithms such as Firefly Algorithm (FA), Hill Climbing Algorithm (HC) and Ant Colony Optimization (ACO). The output of this study would be useful to testers as they can achieve 100% path coverage for testing with minimum number of test cases.

Keywords: regression testing, test case selection, test case prioritization, genetic algorithm, bat algorithm

Procedia PDF Downloads 356
23502 The System of Uniform Criteria for the Characterization and Evaluation of Elements of Economic Structure: The Territory, Infrastructure, Processes, Technological Chains, the End Products

Authors: Aleksandr A. Gajour, Vladimir G. Merzlikin, Vladimir I. Veselov

Abstract:

This paper refers to the analysis of the characteristics of industrial and lifestyle facilities heat- energy objects as a part of the thermal envelope of Earth's surface for inclusion in any database of economic forecasting. The idealized model of the Earth's surface is discussed. This model gives the opportunity to obtain the energy equivalent for each element of terrain and world ocean. Energy efficiency criterion of comfortable human existence is introduced. Dynamics of changes of this criterion offers the possibility to simulate the possible technogenic catastrophes with the spontaneous industrial development of the certain Earth areas. Calculated model with the confirmed forecast of the Gulf Stream freezing in the polar regions in 2011 due to the heat-energy balance disturbance for the oceanic subsurface oil polluted layer is given. Two opposing trends of human development under limited and unlimited amount of heat-energy resources are analyzed.

Keywords: Earth's surface, heat-energy consumption, energy criteria, technogenic catastrophes

Procedia PDF Downloads 387
23501 Modified InVEST for Whatsapp Messages Forensic Triage and Search through Visualization

Authors: Agria Rhamdhan

Abstract:

WhatsApp as the most popular mobile messaging app has been used as evidence in many criminal cases. As the use of mobile messages generates large amounts of data, forensic investigation faces the challenge of large data problems. The hardest part of finding this important evidence is because current practice utilizes tools and technique that require manual analysis to check all messages. That way, analyze large sets of mobile messaging data will take a lot of time and effort. Our work offers methodologies based on forensic triage to reduce large data to manageable sets resulting easier to do detailed reviews, then show the results through interactive visualization to show important term, entities and relationship through intelligent ranking using Term Frequency-Inverse Document Frequency (TF-IDF) and Latent Dirichlet Allocation (LDA) Model. By implementing this methodology, investigators can improve investigation processing time and result's accuracy.

Keywords: forensics, triage, visualization, WhatsApp

Procedia PDF Downloads 155
23500 Low Cost Webcam Camera and GNSS Integration for Updating Home Data Using AI Principles

Authors: Mohkammad Nur Cahyadi, Hepi Hapsari Handayani, Agus Budi Raharjo, Ronny Mardianto, Daud Wahyu Imani, Arizal Bawazir, Luki Adi Triawan

Abstract:

PDAM (local water company) determines customer charges by considering the customer's building or house. Charges determination significantly affects PDAM income and customer costs because the PDAM applies a subsidy policy for customers classified as small households. Periodic updates are needed so that pricing is in line with the target. A thorough customer survey in Surabaya is needed to update customer building data. However, the survey that has been carried out so far has been by deploying officers to conduct one-by-one surveys for each PDAM customer. Surveys with this method require a lot of effort and cost. For this reason, this research offers a technology called moblie mapping, a mapping method that is more efficient in terms of time and cost. The use of this tool is also quite simple, where the device will be installed in the car so that it can record the surrounding buildings while the car is running. Mobile mapping technology generally uses lidar sensors equipped with GNSS, but this technology requires high costs. In overcoming this problem, this research develops low-cost mobile mapping technology using a webcam camera sensor added to the GNSS and IMU sensors. The camera used has specifications of 3MP with a resolution of 720 and a diagonal field of view of 78⁰. The principle of this invention is to integrate four camera sensors, a GNSS webcam, and GPS to acquire photo data, which is equipped with location data (latitude, longitude) and IMU (roll, pitch, yaw). This device is also equipped with a tripod and a vacuum cleaner to attach to the car's roof so it doesn't fall off while running. The output data from this technology will be analyzed with artificial intelligence to reduce similar data (Cosine Similarity) and then classify building types. Data reduction is used to eliminate similar data and maintain the image that displays the complete house so that it can be processed for later classification of buildings. The AI method used is transfer learning by utilizing a trained model named VGG-16. From the analysis of similarity data, it was found that the data reduction reached 50%. Then georeferencing is done using the Google Maps API to get address information according to the coordinates in the data. After that, geographic join is done to link survey data with customer data already owned by PDAM Surya Sembada Surabaya.

Keywords: mobile mapping, GNSS, IMU, similarity, classification

Procedia PDF Downloads 67
23499 An Investigation into the Views of Distant Science Education Students Regarding Teaching Laboratory Work Online

Authors: Abraham Motlhabane

Abstract:

This research analysed the written views of science education students regarding the teaching of laboratory work using the online mode. The research adopted the qualitative methodology. The qualitative research was aimed at investigating small and distinct groups normally regarded as a single-site study. Qualitative research was used to describe and analyze the phenomena from the student’s perspective. This means the research began with assumptions of the world view that use theoretical lenses of research problems inquiring into the meaning of individual students. The research was conducted with three groups of students studying for Postgraduate Certificate in Education, Bachelor of Education and honors Bachelor of Education respectively. In each of the study programmes, the science education module is compulsory. Five science education students from each study programme were purposively selected to participate in this research. Therefore, 15 students participated in the research. In order to analysis the data, the data were first printed and hard copies were used in the analysis. The data was read several times and key concepts and ideas were highlighted. Themes and patterns were identified to describe the data. Coding as a process of organising and sorting data was used. The findings of the study are very diverse; some students are in favour of online laboratory whereas other students argue that science can only be learnt through hands-on experimentation.

Keywords: online learning, laboratory work, views, perceptions

Procedia PDF Downloads 125
23498 Green Walls and Living Facades: The Portuguese Experience

Authors: Andreia Cortes, Carla Pimentel-Rodrigues, Joao Almeida, Myriam Kanoun-Boule, Carla Carvalho, Antonio Tadeu, Armando Silva-Afonso

Abstract:

The adoption of green infrastructure is nowadays encouraged as an essential measure of urban planning and territorial development whenever it offers a better alternative, or is complementary, to current solutions. Green walls and living facades often provide healthy alternatives to traditional grey infrastructures, offering many benefits for both citizens and cities. Beyond the ability to improve environmental conditions and quality of life, they can augment the energy efficiency of buildings, enhance biodiversity and deliver a range of ecosystem services such as water purification, reduction of the urban heat island effect, improvement of air quality and climate change adaptation. For this communication, a systematic survey of the existing green walls and living facades in Portugal was carried out. Different systems were analyzed and compared in terms of dimensions, constructive solutions, vegetative species, maintenance necessities and environmental aspects.

Keywords: green buildings, green walls, living facades, sustainability construction

Procedia PDF Downloads 408
23497 The Communication Library DIALOG for iFDAQ of the COMPASS Experiment

Authors: Y. Bai, M. Bodlak, V. Frolov, S. Huber, V. Jary, I. Konorov, D. Levit, J. Novy, D. Steffen, O. Subrt, M. Virius

Abstract:

Modern experiments in high energy physics impose great demands on the reliability, the efficiency, and the data rate of Data Acquisition Systems (DAQ). This contribution focuses on the development and deployment of the new communication library DIALOG for the intelligent, FPGA-based Data Acquisition System (iFDAQ) of the COMPASS experiment at CERN. The iFDAQ utilizing a hardware event builder is designed to be able to readout data at the maximum rate of the experiment. The DIALOG library is a communication system both for distributed and mixed environments, it provides a network transparent inter-process communication layer. Using the high-performance and modern C++ framework Qt and its Qt Network API, the DIALOG library presents an alternative to the previously used DIM library. The DIALOG library was fully incorporated to all processes in the iFDAQ during the run 2016. From the software point of view, it might be considered as a significant improvement of iFDAQ in comparison with the previous run. To extend the possibilities of debugging, the online monitoring of communication among processes via DIALOG GUI is a desirable feature. In the paper, we present the DIALOG library from several insights and discuss it in a detailed way. Moreover, the efficiency measurement and comparison with the DIM library with respect to the iFDAQ requirements is provided.

Keywords: data acquisition system, DIALOG library, DIM library, FPGA, Qt framework, TCP/IP

Procedia PDF Downloads 304
23496 Mining Scientific Literature to Discover Potential Research Data Sources: An Exploratory Study in the Field of Haemato-Oncology

Authors: A. Anastasiou, K. S. Tingay

Abstract:

Background: Discovering suitable datasets is an important part of health research, particularly for projects working with clinical data from patients organized in cohorts (cohort data), but with the proliferation of so many national and international initiatives, it is becoming increasingly difficult for research teams to locate real world datasets that are most relevant to their project objectives. We present a method for identifying healthcare institutes in the European Union (EU) which may hold haemato-oncology (HO) data. A key enabler of this research was the bibInsight platform, a scientometric data management and analysis system developed by the authors at Swansea University. Method: A PubMed search was conducted using HO clinical terms taken from previous work. The resulting XML file was processed using the bibInsight platform, linking affiliations to the Global Research Identifier Database (GRID). GRID is an international, standardized list of institutions, including the city and country in which the institution exists, as well as a category of the main business type, e.g., Academic, Healthcare, Government, Company. Countries were limited to the 28 current EU members, and institute type to 'Healthcare'. An article was considered valid if at least one author was affiliated with an EU-based healthcare institute. Results: The PubMed search produced 21,310 articles, consisting of 9,885 distinct affiliations with correspondence in GRID. Of these articles, 760 were from EU countries, and 390 of these were healthcare institutes. One affiliation was excluded as being a veterinary hospital. Two EU countries did not have any publications in our analysis dataset. The results were analysed by country and by individual healthcare institute. Networks both within the EU and internationally show institutional collaborations, which may suggest a willingness to share data for research purposes. Geographical mapping can ensure that data has broad population coverage. Collaborations with industry or government may exclude healthcare institutes that may have embargos or additional costs associated with data access. Conclusions: Data reuse is becoming increasingly important both for ensuring the validity of results, and economy of available resources. The ability to identify potential, specific data sources from over twenty thousand articles in less than an hour could assist in improving knowledge of, and access to, data sources. As our method has not yet specified if these healthcare institutes are holding data, or merely publishing on that topic, future work will involve text mining of data-specific concordant terms to identify numbers of participants, demographics, study methodologies, and sub-topics of interest.

Keywords: data reuse, data discovery, data linkage, journal articles, text mining

Procedia PDF Downloads 103
23495 Using Data Mining Technique for Scholarship Disbursement

Authors: J. K. Alhassan, S. A. Lawal

Abstract:

This work is on decision tree-based classification for the disbursement of scholarship. Tree-based data mining classification technique is used in other to determine the generic rule to be used to disburse the scholarship. The system based on the defined rules from the tree is able to determine the class (status) to which an applicant shall belong whether Granted or Not Granted. The applicants that fall to the class of granted denote a successful acquirement of scholarship while those in not granted class are unsuccessful in the scheme. An algorithm that can be used to classify the applicants based on the rules from tree-based classification was also developed. The tree-based classification is adopted because of its efficiency, effectiveness, and easy to comprehend features. The system was tested with the data of National Information Technology Development Agency (NITDA) Abuja, a Parastatal of Federal Ministry of Communication Technology that is mandated to develop and regulate information technology in Nigeria. The system was found working according to the specification. It is therefore recommended for all scholarship disbursement organizations.

Keywords: classification, data mining, decision tree, scholarship

Procedia PDF Downloads 353
23494 [Keynote Speech]: Feature Selection and Predictive Modeling of Housing Data Using Random Forest

Authors: Bharatendra Rai

Abstract:

Predictive data analysis and modeling involving machine learning techniques become challenging in presence of too many explanatory variables or features. Presence of too many features in machine learning is known to not only cause algorithms to slow down, but they can also lead to decrease in model prediction accuracy. This study involves housing dataset with 79 quantitative and qualitative features that describe various aspects people consider while buying a new house. Boruta algorithm that supports feature selection using a wrapper approach build around random forest is used in this study. This feature selection process leads to 49 confirmed features which are then used for developing predictive random forest models. The study also explores five different data partitioning ratios and their impact on model accuracy are captured using coefficient of determination (r-square) and root mean square error (rsme).

Keywords: housing data, feature selection, random forest, Boruta algorithm, root mean square error

Procedia PDF Downloads 305
23493 Image-Based (RBG) Technique for Estimating Phosphorus Levels of Different Crops

Authors: M. M. Ali, Ahmed Al- Ani, Derek Eamus, Daniel K. Y. Tan

Abstract:

In this glasshouse study, we developed the new image-based non-destructive technique for detecting leaf P status of different crops such as cotton, tomato and lettuce. Plants were allowed to grow on nutrient media containing different P concentrations, i.e. 0%, 50% and 100% of recommended P concentration (P0 = no P, L; P1 = 2.5 mL 10 L-1 of P and P2 = 5 mL 10 L-1 of P as NaH2PO4). After 10 weeks of growth, plants were harvested and data on leaf P contents were collected using the standard destructive laboratory method and at the same time leaf images were collected by a handheld crop image sensor. We calculated leaf area, leaf perimeter and RGB (red, green and blue) values of these images. This data was further used in the linear discriminant analysis (LDA) to estimate leaf P contents, which successfully classified these plants on the basis of leaf P contents. The data indicated that P deficiency in crop plants can be predicted using the image and morphological data. Our proposed non-destructive imaging method is precise in estimating P requirements of different crop species.

Keywords: image-based techniques, leaf area, leaf P contents, linear discriminant analysis

Procedia PDF Downloads 361
23492 Improved Throttled Load Balancing Approach for Cloud Environment

Authors: Sushant Singh, Anurag Jain, Seema Sabharwal

Abstract:

Cloud computing is advancing with a rapid speed. Already, it has been adopted by a huge set of users. Easy to use and anywhere access like potential of cloud computing has made it more attractive relative to other technologies. This has resulted in reduction of deployment cost on user side. It has also allowed the big companies to sell their infrastructure to recover the installation cost for the organization. Roots of cloud computing have extended from Grid computing. Along with the inherited characteristics of its predecessor technologies it has also adopted the loopholes present in those technologies. Some of the loopholes are identified and corrected recently, but still some are yet to be rectified. Two major areas where still scope of improvement exists are security and performance. The proposed work is devoted to performance enhancement for the user of the existing cloud system by improving the basic throttled mapping approach between task and resources. The improved procedure has been tested using the cloud analyst simulator. The results are compared with the original and it has been found that proposed work is one step ahead of existing techniques.

Keywords: cloud analyst, cloud computing, load balancing, throttled

Procedia PDF Downloads 235
23491 Design of Visual Repository, Constraint and Process Modeling Tool Based on Eclipse Plug-Ins

Authors: Rushiraj Heshi, Smriti Bhandari

Abstract:

Master Data Management requires creation of Central repository, applying constraints on Repository and designing processes to manage data. Designing of Repository, constraints on repository and business processes is very tedious and time consuming task for large Enterprise. Hence Visual Repository, constraints and Process (Workflow) modeling is the most critical step in Master Data Management.In this paper, we realize a Visual Modeling tool for implementing Repositories, Constraints and Processes based on Eclipse Plugin using GMF/EMF which follows principles of Model Driven Engineering (MDE).

Keywords: EMF, GMF, GEF, repository, constraint, process

Procedia PDF Downloads 478
23490 Modeling the Human Harbor: An Equity Project in New York City, New York USA

Authors: Lauren B. Birney

Abstract:

The envisioned long-term outcome of this three-year research, and implementation plan is for 1) teachers and students to design and build their own computational models of real-world environmental-human health phenomena occurring within the context of the “Human Harbor” and 2) project researchers to evaluate the degree to which these integrated Computer Science (CS) education experiences in New York City (NYC) public school classrooms (PreK-12) impact students’ computational-technical skill development, job readiness, career motivations, and measurable abilities to understand, articulate, and solve the underlying phenomena at the center of their models. This effort builds on the partnership’s successes over the past eight years in developing a benchmark Model of restoration-based Science, Technology, Engineering, and Math (STEM) education for urban public schools and achieving relatively broad-based implementation in the nation’s largest public school system. The Billion Oyster Project Curriculum and Community Enterprise for Restoration Science (BOP-CCERS STEM + Computing) curriculum, teacher professional developments, and community engagement programs have reached more than 200 educators and 11,000 students at 124 schools, with 84 waterfront locations and Out of School of Time (OST) programs. The BOP-CCERS Partnership is poised to develop a more refined focus on integrating computer science across the STEM domains; teaching industry-aligned computational methods and tools; and explicitly preparing students from the city’s most under-resourced and underrepresented communities for upwardly mobile careers in NYC’s ever-expanding “digital economy,” in which jobs require computational thinking and an increasing percentage require discreet computer science technical skills. Project Objectives include the following: 1. Computational Thinking (CT) Integration: Integrate computational thinking core practices across existing middle/high school BOP-CCERS STEM curriculum as a means of scaffolding toward long term computer science and computational modeling outcomes. 2. Data Science and Data Analytics: Enabling Researchers to perform interviews with Teachers, students, community members, partners, stakeholders, and Science, Technology, Engineering, and Mathematics (STEM) industry Professionals. Collaborative analysis and data collection were also performed. As a centerpiece, the BOP-CCERS partnership will expand to include a dedicated computer science education partner. New York City Department of Education (NYCDOE), Computer Science for All (CS4ALL) NYC will serve as the dedicated Computer Science (CS) lead, advising the consortium on integration and curriculum development, working in tandem. The BOP-CCERS Model™ also validates that with appropriate application of technical infrastructure, intensive teacher professional developments, and curricular scaffolding, socially connected science learning can be mainstreamed in the nation’s largest urban public school system. This is evidenced and substantiated in the initial phases of BOP-CCERS™. The BOP-CCERS™ student curriculum and teacher professional development have been implemented in approximately 24% of NYC public middle schools, reaching more than 250 educators and 11,000 students directly. BOP-CCERS™ is a fully scalable and transferable educational model, adaptable to all American school districts. In all settings of the proposed Phase IV initiative, the primary beneficiary group will be underrepresented NYC public school students who live in high-poverty neighborhoods and are traditionally underrepresented in the STEM fields, including African Americans, Latinos, English language learners, and children from economically disadvantaged households. In particular, BOP-CCERS Phase IV will explicitly prepare underrepresented students for skilled positions within New York City’s expanding digital economy, computer science, computational information systems, and innovative technology sectors.

Keywords: computer science, data science, equity, diversity and inclusion, STEM education

Procedia PDF Downloads 42
23489 Design and Development of Data Visualization in 2D and 3D Space Using Front-End Technologies

Authors: Sourabh Yaduvanshi, Varsha Namdeo, Namrata Yaduvanshi

Abstract:

This study delves into the design and development intricacies of crafting detailed 2D bar charts via d3.js, recognizing its limitations in generating 3D visuals within the DOM. The study combines three.js with d3.js, facilitating a smooth evolution from 2D to immersive 3D representations. This fusion epitomizes the synergy between front-end technologies, expanding horizons in data visualization. Beyond technical expertise, it symbolizes a creative convergence, pushing boundaries in visual representation. The abstract illuminates methodologies, unraveling the intricate integration of this fusion and guiding enthusiasts. It narrates a compelling story of transcending 2D constraints, propelling data visualization into captivating three-dimensional realms, and igniting creativity in front-end visualization endeavors.

Keywords: design, development, front-end technologies, visualization

Procedia PDF Downloads 59
23488 The Classification Performance in Parametric and Nonparametric Discriminant Analysis for a Class- Unbalanced Data of Diabetes Risk Groups

Authors: Lily Ingsrisawang, Tasanee Nacharoen

Abstract:

Introduction: The problems of unbalanced data sets generally appear in real world applications. Due to unequal class distribution, many research papers found that the performance of existing classifier tends to be biased towards the majority class. The k -nearest neighbors’ nonparametric discriminant analysis is one method that was proposed for classifying unbalanced classes with good performance. Hence, the methods of discriminant analysis are of interest to us in investigating misclassification error rates for class-imbalanced data of three diabetes risk groups. Objective: The purpose of this study was to compare the classification performance between parametric discriminant analysis and nonparametric discriminant analysis in a three-class classification application of class-imbalanced data of diabetes risk groups. Methods: Data from a healthy project for 599 staffs in a government hospital in Bangkok were obtained for the classification problem. The staffs were diagnosed into one of three diabetes risk groups: non-risk (90%), risk (5%), and diabetic (5%). The original data along with the variables; diabetes risk group, age, gender, cholesterol, and BMI was analyzed and bootstrapped up to 50 and 100 samples, 599 observations per sample, for additional estimation of misclassification error rate. Each data set was explored for the departure of multivariate normality and the equality of covariance matrices of the three risk groups. Both the original data and the bootstrap samples show non-normality and unequal covariance matrices. The parametric linear discriminant function, quadratic discriminant function, and the nonparametric k-nearest neighbors’ discriminant function were performed over 50 and 100 bootstrap samples and applied to the original data. In finding the optimal classification rule, the choices of prior probabilities were set up for both equal proportions (0.33: 0.33: 0.33) and unequal proportions with three choices of (0.90:0.05:0.05), (0.80: 0.10: 0.10) or (0.70, 0.15, 0.15). Results: The results from 50 and 100 bootstrap samples indicated that the k-nearest neighbors approach when k = 3 or k = 4 and the prior probabilities of {non-risk:risk:diabetic} as {0.90:0.05:0.05} or {0.80:0.10:0.10} gave the smallest error rate of misclassification. Conclusion: The k-nearest neighbors approach would be suggested for classifying a three-class-imbalanced data of diabetes risk groups.

Keywords: error rate, bootstrap, diabetes risk groups, k-nearest neighbors

Procedia PDF Downloads 422
23487 Business Challenges and Opportunities of Mobile Applications for Equity Trading in India

Authors: Helee Dave

Abstract:

Globalization has helped in the growth and change of the Indian economy to a great extent. The purchasing power of Indians has increased. IT Infrastructure has considerably improved in India. There is an increase in the usage of smartphones. The smartphones facilitate all sorts of work now a day, from getting groceries to planning a tour; it is just one click away. Similar is the case with equity trading. The traders in equity market can now deal with their stocks through mobile applications eliminating the middle man. The traders do not have an option but to open a dematerialization account with the banks which are compulsory enough irrespective of their mode of transaction that is online or offline. Considering that India is a young country having more than 50% of its population below the age of 25 and 65% of its population below the age of 35; this youth is comfortable with the usage of smartphones. The banking industry is also providing a virtual platform supporting equity market industry. Yet equity trading through online applications is at an infant stage. This paper primarily attempts to understand challenges and opportunities faced by equity trading through mobile apps in India.

Keywords: BPO, business process outsourcing, de-materialization account, equity, ITES, information technology enabled services

Procedia PDF Downloads 293
23486 Welding Process Selection for Storage Tank by Integrated Data Envelopment Analysis and Fuzzy Credibility Constrained Programming Approach

Authors: Rahmad Wisnu Wardana, Eakachai Warinsiriruk, Sutep Joy-A-Ka

Abstract:

Selecting the most suitable welding process usually depends on experiences or common application in similar companies. However, this approach generally ignores many criteria that can be affecting the suitable welding process selection. Therefore, knowledge automation through knowledge-based systems will significantly improve the decision-making process. The aims of this research propose integrated data envelopment analysis (DEA) and fuzzy credibility constrained programming approach for identifying the best welding process for stainless steel storage tank in the food and beverage industry. The proposed approach uses fuzzy concept and credibility measure to deal with uncertain data from experts' judgment. Furthermore, 12 parameters are used to determine the most appropriate welding processes among six competitive welding processes.

Keywords: welding process selection, data envelopment analysis, fuzzy credibility constrained programming, storage tank

Procedia PDF Downloads 151