Search results for: structured and unstructured data
26474 Using Optical Character Recognition to Manage the Unstructured Disaster Data into Smart Disaster Management System
Authors: Dong Seop Lee, Byung Sik Kim
Abstract:
In the 4th Industrial Revolution, various intelligent technologies have been developed in many fields. These artificial intelligence technologies are applied in various services, including disaster management. Disaster information management does not just support disaster work, but it is also the foundation of smart disaster management. Furthermore, it gets historical disaster information using artificial intelligence technology. Disaster information is one of important elements of entire disaster cycle. Disaster information management refers to the act of managing and processing electronic data about disaster cycle from its’ occurrence to progress, response, and plan. However, information about status control, response, recovery from natural and social disaster events, etc. is mainly managed in the structured and unstructured form of reports. Those exist as handouts or hard-copies of reports. Such unstructured form of data is often lost or destroyed due to inefficient management. It is necessary to manage unstructured data for disaster information. In this paper, the Optical Character Recognition approach is used to convert handout, hard-copies, images or reports, which is printed or generated by scanners, etc. into electronic documents. Following that, the converted disaster data is organized into the disaster code system as disaster information. Those data are stored in the disaster database system. Gathering and creating disaster information based on Optical Character Recognition for unstructured data is important element as realm of the smart disaster management. In this paper, Korean characters were improved to over 90% character recognition rate by using upgraded OCR. In the case of character recognition, the recognition rate depends on the fonts, size, and special symbols of character. We improved it through the machine learning algorithm. These converted structured data is managed in a standardized disaster information form connected with the disaster code system. The disaster code system is covered that the structured information is stored and retrieve on entire disaster cycle such as historical disaster progress, damages, response, and recovery. The expected effect of this research will be able to apply it to smart disaster management and decision making by combining artificial intelligence technologies and historical big data.Keywords: disaster information management, unstructured data, optical character recognition, machine learning
Procedia PDF Downloads 12926473 Delaunay Triangulations Efficiency for Conduction-Convection Problems
Authors: Bashar Albaalbaki, Roger E. Khayat
Abstract:
This work is a comparative study on the effect of Delaunay triangulation algorithms on discretization error for conduction-convection conservation problems. A structured triangulation and many unstructured Delaunay triangulations using three popular algorithms for node placement strategies are used. The numerical method employed is the vertex-centered finite volume method. It is found that when the computational domain can be meshed using a structured triangulation, the discretization error is lower for structured triangulations compared to unstructured ones for only low Peclet number values, i.e. when conduction is dominant. However, as the Peclet number is increased and convection becomes more significant, the unstructured triangulations reduce the discretization error. Also, no statistical correlation between triangulation angle extremums and the discretization error is found using 200 samples of randomly generated Delaunay and non-Delaunay triangulations. Thus, the angle extremums cannot be an indicator of the discretization error on their own and need to be combined with other triangulation quality measures, which is the subject of further studies.Keywords: conduction-convection problems, Delaunay triangulation, discretization error, finite volume method
Procedia PDF Downloads 10326472 Resource Framework Descriptors for Interestingness in Data
Authors: C. B. Abhilash, Kavi Mahesh
Abstract:
Human beings are the most advanced species on earth; it's all because of the ability to communicate and share information via human language. In today's world, a huge amount of data is available on the web in text format. This has also resulted in the generation of big data in structured and unstructured formats. In general, the data is in the textual form, which is highly unstructured. To get insights and actionable content from this data, we need to incorporate the concepts of text mining and natural language processing. In our study, we mainly focus on Interesting data through which interesting facts are generated for the knowledge base. The approach is to derive the analytics from the text via the application of natural language processing. Using semantic web Resource framework descriptors (RDF), we generate the triple from the given data and derive the interesting patterns. The methodology also illustrates data integration using the RDF for reliable, interesting patterns.Keywords: RDF, interestingness, knowledge base, semantic data
Procedia PDF Downloads 16226471 A Proposal of Advanced Key Performance Indicators for Assessing Six Performances of Construction Projects
Authors: Wi Sung Yoo, Seung Woo Lee, Youn Kyoung Hur, Sung Hwan Kim
Abstract:
Large-scale construction projects are continuously increasing, and the need for tools to monitor and evaluate the project success is emphasized. At the construction industry level, there are limitations in deriving performance evaluation factors that reflect the diversity of construction sites and systems that can objectively evaluate and manage performance. Additionally, there are difficulties in integrating structured and unstructured data generated at construction sites and deriving improvements. In this study, we propose the Key Performance Indicators (KPIs) to enable performance evaluation that reflects the increased diversity of construction sites and the unstructured data generated, and present a model for measuring performance by the derived indicators. The comprehensive performance of a unit construction site is assessed based on 6 areas (Time, Cost, Quality, Safety, Environment, Productivity) and 26 indicators. We collect performance indicator information from 30 construction sites that meet legal standards and have been successfully performed. And We apply data augmentation and optimization techniques into establishing measurement standards for each indicator. In other words, the KPI for construction site performance evaluation presented in this study provides standards for evaluating performance in six areas using institutional requirement data and document data. This can be expanded to establish a performance evaluation system considering the scale and type of construction project. Also, they are expected to be used as a comprehensive indicator of the construction industry and used as basic data for tracking competitiveness at the national level and establishing policies.Keywords: key performance indicator, performance measurement, structured and unstructured data, data augmentation
Procedia PDF Downloads 4326470 Analysing Techniques for Fusing Multimodal Data in Predictive Scenarios Using Convolutional Neural Networks
Authors: Philipp Ruf, Massiwa Chabbi, Christoph Reich, Djaffar Ould-Abdeslam
Abstract:
In recent years, convolutional neural networks (CNN) have demonstrated high performance in image analysis, but oftentimes, there is only structured data available regarding a specific problem. By interpreting structured data as images, CNNs can effectively learn and extract valuable insights from tabular data, leading to improved predictive accuracy and uncovering hidden patterns that may not be apparent in traditional structured data analysis. In applying a single neural network for analyzing multimodal data, e.g., both structured and unstructured information, significant advantages in terms of time complexity and energy efficiency can be achieved. Converting structured data into images and merging them with existing visual material offers a promising solution for applying CNN in multimodal datasets, as they often occur in a medical context. By employing suitable preprocessing techniques, structured data is transformed into image representations, where the respective features are expressed as different formations of colors and shapes. In an additional step, these representations are fused with existing images to incorporate both types of information. This final image is finally analyzed using a CNN.Keywords: CNN, image processing, tabular data, mixed dataset, data transformation, multimodal fusion
Procedia PDF Downloads 12326469 Implementation of Big Data Concepts Led by the Business Pressures
Authors: Snezana Savoska, Blagoj Ristevski, Violeta Manevska, Zlatko Savoski, Ilija Jolevski
Abstract:
Big data is widely accepted by the pharmaceutical companies as a result of business demands create through legal pressure. Pharmaceutical companies have many legal demands as well as standards’ demands and have to adapt their procedures to the legislation. To manage with these demands, they have to standardize the usage of the current information technology and use the latest software tools. This paper highlights some important aspects of experience with big data projects implementation in a pharmaceutical Macedonian company. These projects made improvements of their business processes by the help of new software tools selected to comply with legal and business demands. They use IT as a strategic tool to obtain competitive advantage on the market and to reengineer the processes towards new Internet economy and quality demands. The company is required to manage vast amounts of structured as well as unstructured data. For these reasons, they implement projects for emerging and appropriate software tools which have to deal with big data concepts accepted in the company.Keywords: big data, unstructured data, SAP ERP, documentum
Procedia PDF Downloads 27126468 Analysis of Big Data
Authors: Sandeep Sharma, Sarabjit Singh
Abstract:
As per the user demand and growth trends of large free data the storage solutions are now becoming more challenge-able to protect, store and to retrieve data. The days are not so far when the storage companies and organizations are start saying 'no' to store our valuable data or they will start charging a huge amount for its storage and protection. On the other hand as per the environmental conditions it becomes challenge-able to maintain and establish new data warehouses and data centers to protect global warming threats. A challenge of small data is over now, the challenges are big that how to manage the exponential growth of data. In this paper we have analyzed the growth trend of big data and its future implications. We have also focused on the impact of the unstructured data on various concerns and we have also suggested some possible remedies to streamline big data.Keywords: big data, unstructured data, volume, variety, velocity
Procedia PDF Downloads 54826467 Research of Data Cleaning Methods Based on Dependency Rules
Authors: Yang Bao, Shi Wei Deng, WangQun Lin
Abstract:
This paper introduces the concept and principle of data cleaning, analyzes the types and causes of dirty data, and proposes several key steps of typical cleaning process, puts forward a well scalability and versatility data cleaning framework, in view of data with attribute dependency relation, designs several of violation data discovery algorithms by formal formula, which can obtain inconsistent data to all target columns with condition attribute dependent no matter data is structured (SQL) or unstructured (NoSQL), and gives 6 data cleaning methods based on these algorithms.Keywords: data cleaning, dependency rules, violation data discovery, data repair
Procedia PDF Downloads 56426466 Impact of Pan Pacific's Training Program to Hotel and Restaurant Management (HRM) Practicum Trainees
Authors: Bandojo Paula Maria Noella, Bernardo Bea Samantha B., Del Rosario Hanassa Mae S., Gomez Marian Louise D., Gomez Rome Voltaire M., Reyes Alessa Anne Therese A.
Abstract:
The purpose of this study is to determine if there is a significant difference between the training program of Pan Pacific Hotel to other Five Star Hotels in terms of the technical, professional and personal competencies before and after their training. The theoretical framework of this study is the practicum manual of the University of Santo Tomas College of Tourism and Hospitality Management, Hotel and Restaurant Management Program Practicum Manual. This study was conducted using survey questionnaires that were distributed to 50 respondents. The results showed that there is a significant difference with the level of competencies of the practicum trainee before and after the training regardless if the training is structured or unstructured. Results also showed that the structured training program of Pan Pacific Hotel significantly improved the Technical Competencies in the different departments of the hotel industry. On the other hand, the findings also showed that there is no difference between the structured and unstructured training program in terms of Professional Competencies and Personal Competencies. The proponents concluded the study by providing recommendations to the partner hotels of the University of Santo Tomas College of Tourism and Hospitality Management that there should be a structured training program for the practicum trainees.Keywords: structured and structured training program, practicum trainees, competencies, tourism
Procedia PDF Downloads 36326465 Research on the Aero-Heating Prediction Based on Hybrid Meshes and Hybrid Schemes
Authors: Qiming Zhang, Youda Ye, Qinxue Jiang
Abstract:
Accurate prediction of external flowfield and aero-heating at the wall of hypersonic vehicle is very crucial for the design of aircrafts. Unstructured/hybrid meshes have more powerful advantages than structured meshes in terms of pre-processing, parallel computing and mesh adaptation, so it is imperative to develop high-resolution numerical methods for the calculation of aerothermal environment on unstructured/hybrid meshes. The inviscid flux scheme is one of the most important factors affecting the accuracy of unstructured/ hybrid mesh heat flux calculation. Here, a new hybrid flux scheme is developed and the approach of interface type selection is proposed: i.e. 1) using the exact Riemann scheme solution to calculate the flux on the faces parallel to the wall; 2) employing Sterger-Warming (S-W) scheme to improve the stability of the numerical scheme in other interfaces. The results of the heat flux fit the one observed experimentally and have little dependence on grids, which show great application prospect in unstructured/ hybrid mesh.Keywords: aero-heating prediction, computational fluid dynamics, hybrid meshes, hybrid schemes
Procedia PDF Downloads 25026464 Agile Methodology for Modeling and Design of Data Warehouses -AM4DW-
Authors: Nieto Bernal Wilson, Carmona Suarez Edgar
Abstract:
The organizations have structured and unstructured information in different formats, sources, and systems. Part of these come from ERP under OLTP processing that support the information system, however these organizations in OLAP processing level, presented some deficiencies, part of this problematic lies in that does not exist interesting into extract knowledge from their data sources, as also the absence of operational capabilities to tackle with these kind of projects. Data Warehouse and its applications are considered as non-proprietary tools, which are of great interest to business intelligence, since they are repositories basis for creating models or patterns (behavior of customers, suppliers, products, social networks and genomics) and facilitate corporate decision making and research. The following paper present a structured methodology, simple, inspired from the agile development models as Scrum, XP and AUP. Also the models object relational, spatial data models, and the base line of data modeling under UML and Big data, from this way sought to deliver an agile methodology for the developing of data warehouses, simple and of easy application. The methodology naturally take into account the application of process for the respectively information analysis, visualization and data mining, particularly for patterns generation and derived models from the objects facts structured.Keywords: data warehouse, model data, big data, object fact, object relational fact, process developed data warehouse
Procedia PDF Downloads 41026463 Weighted-Distance Sliding Windows and Cooccurrence Graphs for Supporting Entity-Relationship Discovery in Unstructured Text
Authors: Paolo Fantozzi, Luigi Laura, Umberto Nanni
Abstract:
The problem of Entity relation discovery in structured data, a well covered topic in literature, consists in searching within unstructured sources (typically, text) in order to find connections among entities. These can be a whole dictionary, or a specific collection of named items. In many cases machine learning and/or text mining techniques are used for this goal. These approaches might be unfeasible in computationally challenging problems, such as processing massive data streams. A faster approach consists in collecting the cooccurrences of any two words (entities) in order to create a graph of relations - a cooccurrence graph. Indeed each cooccurrence highlights some grade of semantic correlation between the words because it is more common to have related words close each other than having them in the opposite sides of the text. Some authors have used sliding windows for such problem: they count all the occurrences within a sliding windows running over the whole text. In this paper we generalise such technique, coming up to a Weighted-Distance Sliding Window, where each occurrence of two named items within the window is accounted with a weight depending on the distance between items: a closer distance implies a stronger evidence of a relationship. We develop an experiment in order to support this intuition, by applying this technique to a data set consisting in the text of the Bible, split into verses.Keywords: cooccurrence graph, entity relation graph, unstructured text, weighted distance
Procedia PDF Downloads 15426462 Healthcare Big Data Analytics Using Hadoop
Authors: Chellammal Surianarayanan
Abstract:
Healthcare industry is generating large amounts of data driven by various needs such as record keeping, physician’s prescription, medical imaging, sensor data, Electronic Patient Record(EPR), laboratory, pharmacy, etc. Healthcare data is so big and complex that they cannot be managed by conventional hardware and software. The complexity of healthcare big data arises from large volume of data, the velocity with which the data is accumulated and different varieties such as structured, semi-structured and unstructured nature of data. Despite the complexity of big data, if the trends and patterns that exist within the big data are uncovered and analyzed, higher quality healthcare at lower cost can be provided. Hadoop is an open source software framework for distributed processing of large data sets across clusters of commodity hardware using a simple programming model. The core components of Hadoop include Hadoop Distributed File System which offers way to store large amount of data across multiple machines and MapReduce which offers way to process large data sets with a parallel, distributed algorithm on a cluster. Hadoop ecosystem also includes various other tools such as Hive (a SQL-like query language), Pig (a higher level query language for MapReduce), Hbase(a columnar data store), etc. In this paper an analysis has been done as how healthcare big data can be processed and analyzed using Hadoop ecosystem.Keywords: big data analytics, Hadoop, healthcare data, towards quality healthcare
Procedia PDF Downloads 41326461 Association Rules Mining and NOSQL Oriented Document in Big Data
Authors: Sarra Senhadji, Imene Benzeguimi, Zohra Yagoub
Abstract:
Big Data represents the recent technology of manipulating voluminous and unstructured data sets over multiple sources. Therefore, NOSQL appears to handle the problem of unstructured data. Association rules mining is one of the popular techniques of data mining to extract hidden relationship from transactional databases. The algorithm for finding association dependencies is well-solved with Map Reduce. The goal of our work is to reduce the time of generating of frequent itemsets by using Map Reduce and NOSQL database oriented document. A comparative study is given to evaluate the performances of our algorithm with the classical algorithm Apriori.Keywords: Apriori, Association rules mining, Big Data, Data Mining, Hadoop, MapReduce, MongoDB, NoSQL
Procedia PDF Downloads 16226460 Bio-Mimetic Foot Design for Legged Locomotion over Unstructured Terrain
Authors: Hannah Kolano, Paul Nadan, Jeremy Ryan, Sophia Nielsen
Abstract:
The hooves of goats and other ruminants, or the family Ruminantia, are uniquely structured to adapt to rough terrain. Their hooves possess a hard outer shell and a soft interior that allow them to both conform to uneven surfaces and hook onto prominent features. In an effort to apply this unique mechanism to a robotics context, artificial feet for a hexapedal robot have been designed based on the hooves of ruminants to improve the robot’s ability to traverse unstructured environments such as those found on a rocky planet or asteroid, as well as in earth-based environments such as rubble, caves, and mountainous regions. The feet were manufactured using a combination of 3D printing and polyurethane casting techniques and attached to a commercially available hexapedal robot. The robot was programmed with a terrain-adaptive gait and proved capable of traversing a variety of uneven surfaces and inclines. This development of more adaptable robotic feet allows legged robots to operate in a wider range of environments and expands their possible applications.Keywords: biomimicry, legged locomotion, robotic foot design, ruminant feet, unstructured terrain navigation
Procedia PDF Downloads 12926459 Physical Activity and Academic Achievement: How Physical Activity Should Be Implemented to Enhance Mathematical Achievement and Mathematical Self-Concept
Authors: Laura C. Dapp, Claudia M. Roebers
Abstract:
Being physically active has many benefits for children and adolescents. It is crucial for various aspects of physical and mental health, the development of a healthy self-concept, and also positively influences academic performance and school achievement. In addressing the still incomplete understanding of the link between physical activity (PA) and academic achievement, the current study scrutinized the open issue of how PA has to be implemented to positively affect mathematical outcomes in N = 138 fourth graders. Therefore, the current study distinguished between structured PA (formal, organized, adult-led exercise and deliberate sports practice) and unstructured PA (non-formal, playful, peer-led physically active play and sports activities). Results indicated that especially structured PA has the potential to contribute to mathematical outcomes. Although children spent almost twice as much time engaging in unstructured PA as compared to structured PA, only structured PA was significantly related to mathematical achievement as well as to mathematical self-concept. Furthermore, the pending issue concerning the quantity of PA needed to enhance children’s mathematical achievement was addressed. As to that, results indicated that the amount of time spent in structured PA constitutes a critical factor in accounting for mathematical outcomes, since children engaging in PA for two hours or more a week were shown to be both the ones with the highest mathematical self-concept as well as those attaining the highest mathematical achievement scores. Finally, the present study investigated the indirect effect of PA on mathematical achievement by controlling for the mathematical self-concept as a mediating variable. The results of a maximum likelihood mediation analysis with a 2’000 resampling bootstrapping procedure for the 95% confidence intervals revealed a full mediation, indicating that PA improves mathematical self-concept, which, in turn, positively affects mathematical achievement. Thus, engaging in high amounts of structured PA constitutes an advantageous leisure activity for children and adolescents, not only to enhance their physical health but also to foster their self-concept in a way that is favorable and encouraging for promoting their academic achievement. Note: The content of this abstract is partially based on a paper published elswhere by the authors.Keywords: Academic Achievement, Mathematical Performance, Physical Activity, Self-Concept
Procedia PDF Downloads 11326458 Bayesian Flexibility Modelling of the Conditional Autoregressive Prior in a Disease Mapping Model
Authors: Davies Obaromi, Qin Yongsong, James Ndege, Azeez Adeboye, Akinwumi Odeyemi
Abstract:
The basic model usually used in disease mapping, is the Besag, York and Mollie (BYM) model and which combines the spatially structured and spatially unstructured priors as random effects. Bayesian Conditional Autoregressive (CAR) model is a disease mapping method that is commonly used for smoothening the relative risk of any disease as used in the Besag, York and Mollie (BYM) model. This model (CAR), which is also usually assigned as a prior to one of the spatial random effects in the BYM model, successfully uses information from adjacent sites to improve estimates for individual sites. To our knowledge, there are some unrealistic or counter-intuitive consequences on the posterior covariance matrix of the CAR prior for the spatial random effects. In the conventional BYM (Besag, York and Mollie) model, the spatially structured and the unstructured random components cannot be seen independently, and which challenges the prior definitions for the hyperparameters of the two random effects. Therefore, the main objective of this study is to construct and utilize an extended Bayesian spatial CAR model for studying tuberculosis patterns in the Eastern Cape Province of South Africa, and then compare for flexibility with some existing CAR models. The results of the study revealed the flexibility and robustness of this alternative extended CAR to the commonly used CAR models by comparison, using the deviance information criteria. The extended Bayesian spatial CAR model is proved to be a useful and robust tool for disease modeling and as a prior for the structured spatial random effects because of the inclusion of an extra hyperparameter.Keywords: Besag2, CAR models, disease mapping, INLA, spatial models
Procedia PDF Downloads 28126457 Q-Map: Clinical Concept Mining from Clinical Documents
Authors: Sheikh Shams Azam, Manoj Raju, Venkatesh Pagidimarri, Vamsi Kasivajjala
Abstract:
Over the past decade, there has been a steep rise in the data-driven analysis in major areas of medicine, such as clinical decision support system, survival analysis, patient similarity analysis, image analytics etc. Most of the data in the field are well-structured and available in numerical or categorical formats which can be used for experiments directly. But on the opposite end of the spectrum, there exists a wide expanse of data that is intractable for direct analysis owing to its unstructured nature which can be found in the form of discharge summaries, clinical notes, procedural notes which are in human written narrative format and neither have any relational model nor any standard grammatical structure. An important step in the utilization of these texts for such studies is to transform and process the data to retrieve structured information from the haystack of irrelevant data using information retrieval and data mining techniques. To address this problem, the authors present Q-Map in this paper, which is a simple yet robust system that can sift through massive datasets with unregulated formats to retrieve structured information aggressively and efficiently. It is backed by an effective mining technique which is based on a string matching algorithm that is indexed on curated knowledge sources, that is both fast and configurable. The authors also briefly examine its comparative performance with MetaMap, one of the most reputed tools for medical concepts retrieval and present the advantages the former displays over the latter.Keywords: information retrieval, unified medical language system, syntax based analysis, natural language processing, medical informatics
Procedia PDF Downloads 13326456 Ontology Mapping with R-GNN for IT Infrastructure: Enhancing Ontology Construction and Knowledge Graph Expansion
Authors: Andrey Khalov
Abstract:
The rapid growth of unstructured data necessitates advanced methods for transforming raw information into structured knowledge, particularly in domain-specific contexts such as IT service management and outsourcing. This paper presents a methodology for automatically constructing domain ontologies using the DOLCE framework as the base ontology. The research focuses on expanding ITIL-based ontologies by integrating concepts from ITSMO, followed by the extraction of entities and relationships from domain-specific texts through transformers and statistical methods like formal concept analysis (FCA). In particular, this work introduces an R-GNN-based approach for ontology mapping, enabling more efficient entity extraction and ontology alignment with existing knowledge bases. Additionally, the research explores transfer learning techniques using pre-trained transformer models (e.g., DeBERTa-v3-large) fine-tuned on synthetic datasets generated via large language models such as LLaMA. The resulting ontology, termed IT Ontology (ITO), is evaluated against existing methodologies, highlighting significant improvements in precision and recall. This study advances the field of ontology engineering by automating the extraction, expansion, and refinement of ontologies tailored to the IT domain, thus bridging the gap between unstructured data and actionable knowledge.Keywords: ontology mapping, knowledge graphs, R-GNN, ITIL, NER
Procedia PDF Downloads 1926455 System for Monitoring Marine Turtles Using Unstructured Supplementary Service Data
Authors: Luís Pina
Abstract:
The conservation of marine biodiversity keeps ecosystems in balance and ensures the sustainable use of resources. In this context, technological resources have been used for monitoring marine species to allow biologists to obtain data in real-time. There are different mobile applications developed for data collection for monitoring purposes, but these systems are designed to be utilized only on third-generation (3G) phones or smartphones with Internet access and in rural parts of the developing countries, Internet services and smartphones are scarce. Thus, the objective of this work is to develop a system to monitor marine turtles using Unstructured Supplementary Service Data (USSD), which users can access through basic mobile phones. The system aims to improve the data collection mechanism and enhance the effectiveness of current systems in monitoring sea turtles using any type of mobile device without Internet access. The system will be able to report information related to the biological activities of marine turtles. Also, it will be used as a platform to assist marine conservation entities to receive reports of illegal sales of sea turtles. The system can also be utilized as an educational tool for communities, providing knowledge and allowing the inclusion of communities in the process of monitoring marine turtles. Therefore, this work may contribute with information to decision-making and implementation of contingency plans for marine conservation programs.Keywords: GSM, marine biology, marine turtles, unstructured supplementary service data (USSD)
Procedia PDF Downloads 20626454 Unified Structured Process for Health Analytics
Authors: Supunmali Ahangama, Danny Chiang Choon Poo
Abstract:
Health analytics (HA) is used in healthcare systems for effective decision-making, management, and planning of healthcare and related activities. However, user resistance, the unique position of medical data content, and structure (including heterogeneous and unstructured data) and impromptu HA projects have held up the progress in HA applications. Notably, the accuracy of outcomes depends on the skills and the domain knowledge of the data analyst working on the healthcare data. The success of HA depends on having a sound process model, effective project management and availability of supporting tools. Thus, to overcome these challenges through an effective process model, we propose an HA process model with features from the rational unified process (RUP) model and agile methodology.Keywords: agile methodology, health analytics, unified process model, UML
Procedia PDF Downloads 50626453 A Quantum Leap: Developing Quantum Semi-Structured Complex Numbers to Solve the “Division by Zero” Problem
Authors: Peter Jean-Paul, Shanaz Wahid
Abstract:
The problem of division by zero can be stated as: “what is the value of 0 x 1/0?” This expression has been considered undefined by mathematicians because it can have two equally valid solutions either 0 or 1. Recently semi-structured complex number set was invented to solve “division by zero”. However, whilst the number set had some merits it was considered to have a poor theoretical foundation and did not provide a quality solution to “division by zero”. Moreover, the set lacked consistency in simple algebraic calculations producing contradictory results when dividing by zero. To overcome these issues this research starts by treating the expression " 0 x 1/0" as a quantum mechanical system that produces two tangled results 0 and 1. Dirac Notation (a tool from quantum mechanics) was then used to redefine the unstructured unit p in semi-structured complex numbers so that p represents the superposition of two results (0 and 1) and collapses into a single value when used in algebraic expressions. In the process, this paper describes a new number set called Quantum Semi-structured Complex Numbers that provides a valid solution to the problem of “division by zero”. This research shows that this new set (1) forms a “Field”, (2) can produce consistent results when solving division by zero problems, (3) can be used to accurately describe systems whose mathematical descriptions involve division by zero. This research served to provide a firm foundation for Quantum Semi-structured Complex Numbers and support their practical use.Keywords: division by zero, semi-structured complex numbers, quantum mechanics, Hilbert space, Euclidean space
Procedia PDF Downloads 15726452 An Embarrassingly Simple Semi-supervised Approach to Increase Recall in Online Shopping Domain to Match Structured Data with Unstructured Data
Authors: Sachin Nagargoje
Abstract:
Complete labeled data is often difficult to obtain in a practical scenario. Even if one manages to obtain the data, the quality of the data is always in question. In shopping vertical, offers are the input data, which is given by advertiser with or without a good quality of information. In this paper, an author investigated the possibility of using a very simple Semi-supervised learning approach to increase the recall of unhealthy offers (has badly written Offer Title or partial product details) in shopping vertical domain. The author found that the semisupervised learning method had improved the recall in the Smart Phone category by 30% on A=B testing on 10% traffic and increased the YoY (Year over Year) number of impressions per month by 33% at production. This also made a significant increase in Revenue, but that cannot be publicly disclosed.Keywords: semi-supervised learning, clustering, recall, coverage
Procedia PDF Downloads 12226451 Machine Learning Strategies for Data Extraction from Unstructured Documents in Financial Services
Authors: Delphine Vendryes, Dushyanth Sekhar, Baojia Tong, Matthew Theisen, Chester Curme
Abstract:
Much of the data that inform the decisions of governments, corporations and individuals are harvested from unstructured documents. Data extraction is defined here as a process that turns non-machine-readable information into a machine-readable format that can be stored, for instance, in a database. In financial services, introducing more automation in data extraction pipelines is a major challenge. Information sought by financial data consumers is often buried within vast bodies of unstructured documents, which have historically required thorough manual extraction. Automated solutions provide faster access to non-machine-readable datasets, in a context where untimely information quickly becomes irrelevant. Data quality standards cannot be compromised, so automation requires high data integrity. This multifaceted task is broken down into smaller steps: ingestion, table parsing (detection and structure recognition), text analysis (entity detection and disambiguation), schema-based record extraction, user feedback incorporation. Selected intermediary steps are phrased as machine learning problems. Solutions leveraging cutting-edge approaches from the fields of computer vision (e.g. table detection) and natural language processing (e.g. entity detection and disambiguation) are proposed.Keywords: computer vision, entity recognition, finance, information retrieval, machine learning, natural language processing
Procedia PDF Downloads 11326450 Unsupervised Text Mining Approach to Early Warning System
Authors: Ichihan Tai, Bill Olson, Paul Blessner
Abstract:
Traditional early warning systems that alarm against crisis are generally based on structured or numerical data; therefore, a system that can make predictions based on unstructured textual data, an uncorrelated data source, is a great complement to the traditional early warning systems. The Chicago Board Options Exchange (CBOE) Volatility Index (VIX), commonly referred to as the fear index, measures the cost of insurance against market crash, and spikes in the event of crisis. In this study, news data is consumed for prediction of whether there will be a market-wide crisis by predicting the movement of the fear index, and the historical references to similar events are presented in an unsupervised manner. Topic modeling-based prediction and representation are made based on daily news data between 1990 and 2015 from The Wall Street Journal against VIX index data from CBOE.Keywords: early warning system, knowledge management, market prediction, topic modeling.
Procedia PDF Downloads 33826449 Continuous FAQ Updating for Service Incident Ticket Resolution
Authors: Kohtaroh Miyamoto
Abstract:
As enterprise computing becomes more and more complex, the costs and technical challenges of IT system maintenance and support are increasing rapidly. One popular approach to managing IT system maintenance is to prepare and use an FAQ (Frequently Asked Questions) system to manage and reuse systems knowledge. Such an FAQ system can help reduce the resolution time for each service incident ticket. However, there is a major problem where over time the knowledge in such FAQs tends to become outdated. Much of the knowledge captured in the FAQ requires periodic updates in response to new insights or new trends in the problems addressed in order to maintain its usefulness for problem resolution. These updates require a systematic approach to define the exact portion of the FAQ and its content. Therefore, we are working on a novel method to hierarchically structure the FAQ and automate the updates of its structure and content. We use structured information and the unstructured text information with the timelines of the information in the service incident tickets. We cluster the tickets by structured category information, by keywords, and by keyword modifiers for the unstructured text information. We also calculate an urgency score based on trends, resolution times, and priorities. We carefully studied the tickets of one of our projects over a 2.5-year time period. After the first 6 months, we started to create FAQs and confirmed they improved the resolution times. We continued observing over the next 2 years to assess the ongoing effectiveness of our method for the automatic FAQ updates. We improved the ratio of tickets covered by the FAQ from 32.3% to 68.9% during this time. Also, the average time reduction of ticket resolution was between 31.6% and 43.9%. Subjective analysis showed more than 75% reported that the FAQ system was useful in reducing ticket resolution times.Keywords: FAQ system, resolution time, service incident tickets, IT system maintenance
Procedia PDF Downloads 33826448 A Concept of Data Mining with XML Document
Authors: Akshay Agrawal, Anand K. Srivastava
Abstract:
The increasing amount of XML datasets available to casual users increases the necessity of investigating techniques to extract knowledge from these data. Data mining is widely applied in the database research area in order to extract frequent correlations of values from both structured and semi-structured datasets. The increasing availability of heterogeneous XML sources has raised a number of issues concerning how to represent and manage these semi structured data. In recent years due to the importance of managing these resources and extracting knowledge from them, lots of methods have been proposed in order to represent and cluster them in different ways.Keywords: XML, similarity measure, clustering, cluster quality, semantic clustering
Procedia PDF Downloads 38426447 A Next-Generation Blockchain-Based Data Platform: Leveraging Decentralized Storage and Layer 2 Scaling for Secure Data Management
Authors: Kenneth Harper
Abstract:
The rapid growth of data-driven decision-making across various industries necessitates advanced solutions to ensure data integrity, scalability, and security. This study introduces a decentralized data platform built on blockchain technology to improve data management processes in high-volume environments such as healthcare and financial services. The platform integrates blockchain networks using Cosmos SDK and Polkadot Substrate alongside decentralized storage solutions like IPFS and Filecoin, and coupled with decentralized computing infrastructure built on top of Avalanche. By leveraging advanced consensus mechanisms, we create a scalable, tamper-proof architecture that supports both structured and unstructured data. Key features include secure data ingestion, cryptographic hashing for robust data lineage, and Zero-Knowledge Proof mechanisms that enhance privacy while ensuring compliance with regulatory standards. Additionally, we implement performance optimizations through Layer 2 scaling solutions, including ZK-Rollups, which provide low-latency data access and trustless data verification across a distributed ledger. The findings from this exercise demonstrate significant improvements in data accessibility, reduced operational costs, and enhanced data integrity when tested in real-world scenarios. This platform reference architecture offers a decentralized alternative to traditional centralized data storage models, providing scalability, security, and operational efficiency.Keywords: blockchain, cosmos SDK, decentralized data platform, IPFS, ZK-Rollups
Procedia PDF Downloads 2826446 Using Visualization Techniques to Support Common Clinical Tasks in Clinical Documentation
Authors: Jonah Kenei, Elisha Opiyo
Abstract:
Electronic health records, as a repository of patient information, is nowadays the most commonly used technology to record, store and review patient clinical records and perform other clinical tasks. However, the accurate identification and retrieval of relevant information from clinical records is a difficult task due to the unstructured nature of clinical documents, characterized in particular by a lack of clear structure. Therefore, medical practice is facing a challenge thanks to the rapid growth of health information in electronic health records (EHRs), mostly in narrative text form. As a result, it's becoming important to effectively manage the growing amount of data for a single patient. As a result, there is currently a requirement to visualize electronic health records (EHRs) in a way that aids physicians in clinical tasks and medical decision-making. Leveraging text visualization techniques to unstructured clinical narrative texts is a new area of research that aims to provide better information extraction and retrieval to support clinical decision support in scenarios where data generated continues to grow. Clinical datasets in electronic health records (EHR) offer a lot of potential for training accurate statistical models to classify facets of information which can then be used to improve patient care and outcomes. However, in many clinical note datasets, the unstructured nature of clinical texts is a common problem. This paper examines the very issue of getting raw clinical texts and mapping them into meaningful structures that can support healthcare professionals utilizing narrative texts. Our work is the result of a collaborative design process that was aided by empirical data collected through formal usability testing.Keywords: classification, electronic health records, narrative texts, visualization
Procedia PDF Downloads 11826445 How to Perform Proper Indexing?
Authors: Watheq Mansour, Waleed Bin Owais, Mohammad Basheer Kotit, Khaled Khan
Abstract:
Efficient query processing is one of the utmost requisites in any business environment to satisfy consumer needs. This paper investigates the various types of indexing models, viz. primary, secondary, and multi-level. The investigation is done under the ambit of various types of queries to which each indexing model performs with efficacy. This study also discusses the inherent advantages and disadvantages of each indexing model and how indexing models can be chosen based on a particular environment. This paper also draws parallels between various indexing models and provides recommendations that would help a Database administrator to zero-in on a particular indexing model attributed to the needs and requirements of the production environment. In addition, to satisfy industry and consumer needs attributed to the colossal data generation nowadays, this study has proposed two novel indexing techniques that can be used to index highly unstructured and structured Big Data with efficacy. The study also briefly discusses some best practices that the industry should follow in order to choose an indexing model that is apposite to their prerequisites and requirements.Keywords: indexing, hashing, latent semantic indexing, B-tree
Procedia PDF Downloads 156