Search results for: semantic data profiling
24745 Learning Analytics in a HiFlex Learning Environment
Authors: Matthew Montebello
Abstract:
Student engagement within a virtual learning environment generates masses of data points that can significantly contribute to the learning analytics that lead to decision support. Ideally, similar data is collected during student interaction with a physical learning space, and as a consequence, data is present at a large scale, even in relatively small classes. In this paper, we report of such an occurrence during classes held in a HiFlex modality as we investigate the advantages of adopting such a methodology. We plan to take full advantage of the learner-generated data in an attempt to further enhance the effectiveness of the adopted learning environment. This could shed crucial light on operating modalities that higher education institutions around the world will switch to in a post-COVID era.Keywords: HiFlex, big data in higher education, learning analytics, virtual learning environment
Procedia PDF Downloads 20124744 Li-Fi Technology: Data Transmission through Visible Light
Authors: Shahzad Hassan, Kamran Saeed
Abstract:
People are always in search of Wi-Fi hotspots because Internet is a major demand nowadays. But like all other technologies, there is still room for improvement in the Wi-Fi technology with regards to the speed and quality of connectivity. In order to address these aspects, Harald Haas, a professor at the University of Edinburgh, proposed what we know as the Li-Fi (Light Fidelity). Li-Fi is a new technology in the field of wireless communication to provide connectivity within a network environment. It is a two-way mode of wireless communication using light. Basically, the data is transmitted through Light Emitting Diodes which can vary the intensity of light very fast, even faster than the blink of an eye. From the research and experiments conducted so far, it can be said that Li-Fi can increase the speed and reliability of the transfer of data. This paper pays particular attention on the assessment of the performance of this technology. In other words, it is a 5G technology which uses LED as the medium of data transfer. For coverage within the buildings, Wi-Fi is good but Li-Fi can be considered favorable in situations where large amounts of data are to be transferred in areas with electromagnetic interferences. It brings a lot of data related qualities such as efficiency, security as well as large throughputs to the table of wireless communication. All in all, it can be said that Li-Fi is going to be a future phenomenon where the presence of light will mean access to the Internet as well as speedy data transfer.Keywords: communication, LED, Li-Fi, Wi-Fi
Procedia PDF Downloads 34724743 An Analysis of Humanitarian Data Management of Polish Non-Governmental Organizations in Ukraine Since February 2022 and Its Relevance for Ukrainian Humanitarian Data Ecosystem
Authors: Renata Kurpiewska-Korbut
Abstract:
Making an assumption that the use and sharing of data generated in humanitarian action constitute a core function of humanitarian organizations, the paper analyzes the position of the largest Polish humanitarian non-governmental organizations in the humanitarian data ecosystem in Ukraine and their approach to non-personal and personal data management since February of 2022. Both expert interviews and document analysis of non-profit organizations providing a direct response in the Ukrainian crisis context, i.e., the Polish Humanitarian Action, Caritas, Polish Medical Mission, Polish Red Cross, and the Polish Center for International Aid and the applicability of theoretical perspective of contingency theory – with its central point that the context or specific set of conditions determining the way of behavior and the choice of methods of action – help to examine the significance of data complexity and adaptive approach to data management by relief organizations in the humanitarian supply chain network. The purpose of this study is to determine how the existence of well-established and accurate internal procedures and good practices of using and sharing data (including safeguards for sensitive data) by the surveyed organizations with comparable human and technological capabilities are implemented and adjusted to Ukrainian humanitarian settings and data infrastructure. The study also poses a fundamental question of whether this crisis experience will have a determining effect on their future performance. The obtained finding indicate that Polish humanitarian organizations in Ukraine, which have their own unique code of conduct and effective managerial data practices determined by contingencies, have limited influence on improving the situational awareness of other assistance providers in the data ecosystem despite their attempts to undertake interagency work in the area of data sharing.Keywords: humanitarian data ecosystem, humanitarian data management, polish NGOs, Ukraine
Procedia PDF Downloads 9324742 An Approach for Estimation in Hierarchical Clustered Data Applicable to Rare Diseases
Authors: Daniel C. Bonzo
Abstract:
Practical considerations lead to the use of unit of analysis within subjects, e.g., bleeding episodes or treatment-related adverse events, in rare disease settings. This is coupled with data augmentation techniques such as extrapolation to enlarge the subject base. In general, one can think about extrapolation of data as extending information and conclusions from one estimand to another estimand. This approach induces hierarchichal clustered data with varying cluster sizes. Extrapolation of clinical trial data is being accepted increasingly by regulatory agencies as a means of generating data in diverse situations during drug development process. Under certain circumstances, data can be extrapolated to a different population, a different but related indication, and different but similar product. We consider here the problem of estimation (point and interval) using a mixed-models approach under an extrapolation. It is proposed that estimators (point and interval) be constructed using weighting schemes for the clusters, e.g., equally weighted and with weights proportional to cluster size. Simulated data generated under varying scenarios are then used to evaluate the performance of this approach. In conclusion, the evaluation result showed that the approach is a useful means for improving statistical inference in rare disease settings and thus aids not only signal detection but risk-benefit evaluation as well.Keywords: clustered data, estimand, extrapolation, mixed model
Procedia PDF Downloads 13724741 Identification Strategies for Unknown Victims from Mass Disasters and Unknown Perpetrators from Violent Crime or Terrorist Attacks
Authors: Michael Josef Schwerer
Abstract:
Background: The identification of unknown victims from mass disasters, violent crimes, or terrorist attacks is frequently facilitated through information from missing persons lists, portrait photos, old or recent pictures showing unique characteristics of a person such as scars or tattoos, or simply reference samples from blood relatives for DNA analysis. In contrast, the identification or at least the characterization of an unknown perpetrator from criminal or terrorist actions remains challenging, particularly in the absence of material or data for comparison, such as fingerprints, which had been previously stored in criminal records. In scenarios that result in high levels of destruction of the perpetrator’s corpse, for instance, blast or fire events, the chance for a positive identification using standard techniques is further impaired. Objectives: This study shows the forensic genetic procedures in the Legal Medicine Service of the German Air Force for the identification of unknown individuals, including such cases in which reference samples are not available. Scenarios requiring such efforts predominantly involve aircraft crash investigations, which are routinely carried out by the German Air Force Centre of Aerospace Medicine as one of the Institution’s essential missions. Further, casework by military police or military intelligence is supported based on administrative cooperation. In the talk, data from study projects, as well as examples from real casework, will be demonstrated and discussed with the audience. Methods: Forensic genetic identification in our laboratories involves the analysis of Short Tandem Repeats and Single Nucleotide Polymorphisms in nuclear DNA along with mitochondrial DNA haplotyping. Extended DNA analysis involves phenotypic markers for skin, hair, and eye color together with the investigation of a person’s biogeographic ancestry. Assessment of the biological age of an individual employs CpG-island methylation analysis using bisulfite-converted DNA. Forensic Investigative Genealogy assessment allows the detection of an unknown person’s blood relatives in reference databases. Technically, end-point-PCR, real-time PCR, capillary electrophoresis, pyrosequencing as well as next generation sequencing using flow-cell-based and chip-based systems are used. Results and Discussion: Optimization of DNA extraction from various sources, including difficult matrixes like formalin-fixed, paraffin-embedded tissues, degraded specimens from decomposed bodies or from decedents exposed to blast or fire events, provides soil for successful PCR amplification and subsequent genetic profiling. For cases with extremely low yields of extracted DNA, whole genome preamplification protocols are successfully used, particularly regarding genetic phenotyping. Improved primer design for CpG-methylation analysis, together with validated sampling strategies for the analyzed substrates from, e.g., lymphocyte-rich organs, allows successful biological age estimation even in bodies with highly degraded tissue material. Conclusions: Successful identification of unknown individuals or at least their phenotypic characterization using pigmentation markers together with age-informative methylation profiles, possibly supplemented by family tree search employing Forensic Investigative Genealogy, can be provided in specialized laboratories. However, standard laboratory procedures must be adapted to work with difficult and highly degraded sample materials.Keywords: identification, forensic genetics, phenotypic markers, CPG methylation, biological age estimation, forensic investigative genealogy
Procedia PDF Downloads 5124740 Authorization of Commercial Communication Satellite Grounds for Promoting Turkish Data Relay System
Authors: Celal Dudak, Aslı Utku, Burak Yağlioğlu
Abstract:
Uninterrupted and continuous satellite communication through the whole orbit time is becoming more indispensable every day. Data relay systems are developed and built for various high/low data rate information exchanges like TDRSS of USA and EDRSS of Europe. In these missions, a couple of task-dedicated communication satellites exist. In this regard, for Turkey a data relay system is attempted to be defined exchanging low data rate information (i.e. TTC) for Earth-observing LEO satellites appointing commercial GEO communication satellites all over the world. First, justification of this attempt is given, demonstrating duration enhancements in the link. Discussion of preference of RF communication is, also, given instead of laser communication. Then, preferred communication GEOs – including TURKSAT4A already belonging to Turkey- are given, together with the coverage enhancements through STK simulations and the corresponding link budget. Also, a block diagram of the communication system is given on the LEO satellite.Keywords: communication, GEO satellite, data relay system, coverage
Procedia PDF Downloads 44224739 Development of a Risk Disclosure Index and Examination of Its Determinants: An Empirical Study in Indian Context
Authors: M. V. Shivaani, P. K. Jain, Surendra S. Yadav
Abstract:
Worldwide regulators, practitioners and researchers view risk-disclosure as one of the most important steps that will promote corporate accountability and transparency. Recognizing this growing significance of risk disclosures, the paper first develops a risk disclosure index. Covering 69 risk items/themes, this index is developed by employing thematic content analysis and encompasses three attributes of disclosure: namely, nature (qualitative or quantitative), time horizon (backward-looking or forward-looking) and tone (no impact, positive impact or negative impact). As the focus of study is on substantive rather than symbolic disclosure, content analysis has been carried out manually. The study is based on non-financial companies of Nifty500 index and covers a ten year period from April 1, 2005 to March 31, 2015, thus yielding 3,872 annual reports for analysis. The analysis reveals that (on an average) only about 14% of risk items (i.e. about 10 out 69 risk items studied) are being disclosed by Indian companies. Risk items that are frequently disclosed are mostly macroeconomic in nature and their disclosures tend to be qualitative, forward-looking and conveying both positive and negative aspects of the concerned risk. The second objective of the paper is to gauge the factors that affect the level of disclosures in annual reports. Given the panel nature of data, and possible endogeneity amongst variables, Diff-GMM regression has been applied. The results indicate that age and size of firms have a significant positive impact on disclosure quality, whereas growth rate does not have a significant impact. Further, post-recession period (2009-2015) has witnessed significant improvement in quality of disclosures. In terms of corporate governance variables, board size, board independence, CEO duality, presence of CRO and constitution of risk management committee appear to be significant factors in determining the quality of risk disclosures. It is noteworthy that the study contributes to literature by putting forth a variant to existing disclosure indices that not only captures the quantity but also the quality of disclosures (in terms of semantic attributes). Also, the study is a first of its kind attempt in a prominent emerging market i.e. India. Therefore, this study is expected to facilitate regulators in mandating and regulating risk disclosures and companies in their endeavor to reduce information asymmetry.Keywords: risk disclosure, voluntary disclosures, corporate governance, Diff-GMM
Procedia PDF Downloads 16324738 The Development of Encrypted Near Field Communication Data Exchange Format Transmission in an NFC Passive Tag for Checking the Genuine Product
Authors: Tanawat Hongthai, Dusit Thanapatay
Abstract:
This paper presents the development of encrypted near field communication (NFC) data exchange format transmission in an NFC passive tag for the feasibility of implementing a genuine product authentication. We propose a research encryption and checking the genuine product into four major categories; concept, infrastructure, development and applications. This result shows the passive NFC-forum Type 2 tag can be configured to be compatible with the NFC data exchange format (NDEF), which can be automatically partially data updated when there is NFC field.Keywords: near field communication, NFC data exchange format, checking the genuine product, encrypted NFC
Procedia PDF Downloads 28124737 Data Hiding by Vector Quantization in Color Image
Authors: Yung Gi Wu
Abstract:
With the growing of computer and network, digital data can be spread to anywhere in the world quickly. In addition, digital data can also be copied or tampered easily so that the security issue becomes an important topic in the protection of digital data. Digital watermark is a method to protect the ownership of digital data. Embedding the watermark will influence the quality certainly. In this paper, Vector Quantization (VQ) is used to embed the watermark into the image to fulfill the goal of data hiding. This kind of watermarking is invisible which means that the users will not conscious the existing of embedded watermark even though the embedded image has tiny difference compared to the original image. Meanwhile, VQ needs a lot of computation burden so that we adopt a fast VQ encoding scheme by partial distortion searching (PDS) and mean approximation scheme to speed up the data hiding process. The watermarks we hide to the image could be gray, bi-level and color images. Texts are also can be regarded as watermark to embed. In order to test the robustness of the system, we adopt Photoshop to fulfill sharpen, cropping and altering to check if the extracted watermark is still recognizable. Experimental results demonstrate that the proposed system can resist the above three kinds of tampering in general cases.Keywords: data hiding, vector quantization, watermark, color image
Procedia PDF Downloads 36424736 Anomaly Detection in a Data Center with a Reconstruction Method Using a Multi-Autoencoders Model
Authors: Victor Breux, Jérôme Boutet, Alain Goret, Viviane Cattin
Abstract:
Early detection of anomalies in data centers is important to reduce downtimes and the costs of periodic maintenance. However, there is little research on this topic and even fewer on the fusion of sensor data for the detection of abnormal events. The goal of this paper is to propose a method for anomaly detection in data centers by combining sensor data (temperature, humidity, power) and deep learning models. The model described in the paper uses one autoencoder per sensor to reconstruct the inputs. The auto-encoders contain Long-Short Term Memory (LSTM) layers and are trained using the normal samples of the relevant sensors selected by correlation analysis. The difference signal between the input and its reconstruction is then used to classify the samples using feature extraction and a random forest classifier. The data measured by the sensors of a data center between January 2019 and May 2020 are used to train the model, while the data between June 2020 and May 2021 are used to assess it. Performances of the model are assessed a posteriori through F1-score by comparing detected anomalies with the data center’s history. The proposed model outperforms the state-of-the-art reconstruction method, which uses only one autoencoder taking multivariate sequences and detects an anomaly with a threshold on the reconstruction error, with an F1-score of 83.60% compared to 24.16%.Keywords: anomaly detection, autoencoder, data centers, deep learning
Procedia PDF Downloads 19424735 Integration Process and Analytic Interface of different Environmental Open Data Sets with Java/Oracle and R
Authors: Pavel H. Llamocca, Victoria Lopez
Abstract:
The main objective of our work is the comparative analysis of environmental data from Open Data bases, belonging to different governments. This means that you have to integrate data from various different sources. Nowadays, many governments have the intention of publishing thousands of data sets for people and organizations to use them. In this way, the quantity of applications based on Open Data is increasing. However each government has its own procedures to publish its data, and it causes a variety of formats of data sets because there are no international standards to specify the formats of the data sets from Open Data bases. Due to this variety of formats, we must build a data integration process that is able to put together all kind of formats. There are some software tools developed in order to give support to the integration process, e.g. Data Tamer, Data Wrangler. The problem with these tools is that they need data scientist interaction to take part in the integration process as a final step. In our case we don’t want to depend on a data scientist, because environmental data are usually similar and these processes can be automated by programming. The main idea of our tool is to build Hadoop procedures adapted to data sources per each government in order to achieve an automated integration. Our work focus in environment data like temperature, energy consumption, air quality, solar radiation, speeds of wind, etc. Since 2 years, the government of Madrid is publishing its Open Data bases relative to environment indicators in real time. In the same way, other governments have published Open Data sets relative to the environment (like Andalucia or Bilbao). But all of those data sets have different formats and our solution is able to integrate all of them, furthermore it allows the user to make and visualize some analysis over the real-time data. Once the integration task is done, all the data from any government has the same format and the analysis process can be initiated in a computational better way. So the tool presented in this work has two goals: 1. Integration process; and 2. Graphic and analytic interface. As a first approach, the integration process was developed using Java and Oracle and the graphic and analytic interface with Java (jsp). However, in order to open our software tool, as second approach, we also developed an implementation with R language as mature open source technology. R is a really powerful open source programming language that allows us to process and analyze a huge amount of data with high performance. There are also some R libraries for the building of a graphic interface like shiny. A performance comparison between both implementations was made and no significant differences were found. In addition, our work provides with an Official Real-Time Integrated Data Set about Environment Data in Spain to any developer in order that they can build their own applications.Keywords: open data, R language, data integration, environmental data
Procedia PDF Downloads 31524734 Transforming Data into Knowledge: Mathematical and Statistical Innovations in Data Analytics
Authors: Zahid Ullah, Atlas Khan
Abstract:
The rapid growth of data in various domains has created a pressing need for effective methods to transform this data into meaningful knowledge. In this era of big data, mathematical and statistical innovations play a crucial role in unlocking insights and facilitating informed decision-making in data analytics. This abstract aims to explore the transformative potential of these innovations and their impact on converting raw data into actionable knowledge. Drawing upon a comprehensive review of existing literature, this research investigates the cutting-edge mathematical and statistical techniques that enable the conversion of data into knowledge. By evaluating their underlying principles, strengths, and limitations, we aim to identify the most promising innovations in data analytics. To demonstrate the practical applications of these innovations, real-world datasets will be utilized through case studies or simulations. This empirical approach will showcase how mathematical and statistical innovations can extract patterns, trends, and insights from complex data, enabling evidence-based decision-making across diverse domains. Furthermore, a comparative analysis will be conducted to assess the performance, scalability, interpretability, and adaptability of different innovations. By benchmarking against established techniques, we aim to validate the effectiveness and superiority of the proposed mathematical and statistical innovations in data analytics. Ethical considerations surrounding data analytics, such as privacy, security, bias, and fairness, will be addressed throughout the research. Guidelines and best practices will be developed to ensure the responsible and ethical use of mathematical and statistical innovations in data analytics. The expected contributions of this research include advancements in mathematical and statistical sciences, improved data analysis techniques, enhanced decision-making processes, and practical implications for industries and policymakers. The outcomes will guide the adoption and implementation of mathematical and statistical innovations, empowering stakeholders to transform data into actionable knowledge and drive meaningful outcomes.Keywords: data analytics, mathematical innovations, knowledge extraction, decision-making
Procedia PDF Downloads 7524733 FCNN-MR: A Parallel Instance Selection Method Based on Fast Condensed Nearest Neighbor Rule
Authors: Lu Si, Jie Yu, Shasha Li, Jun Ma, Lei Luo, Qingbo Wu, Yongqi Ma, Zhengji Liu
Abstract:
Instance selection (IS) technique is used to reduce the data size to improve the performance of data mining methods. Recently, to process very large data set, several proposed methods divide the training set into some disjoint subsets and apply IS algorithms independently to each subset. In this paper, we analyze the limitation of these methods and give our viewpoint about how to divide and conquer in IS procedure. Then, based on fast condensed nearest neighbor (FCNN) rule, we propose a large data sets instance selection method with MapReduce framework. Besides ensuring the prediction accuracy and reduction rate, it has two desirable properties: First, it reduces the work load in the aggregation node; Second and most important, it produces the same result with the sequential version, which other parallel methods cannot achieve. We evaluate the performance of FCNN-MR on one small data set and two large data sets. The experimental results show that it is effective and practical.Keywords: instance selection, data reduction, MapReduce, kNN
Procedia PDF Downloads 25424732 Cost Effective Microfabrication Technique for Lab on Chip (LOC) Devices Using Epoxy Polymers
Authors: Charmi Chande, Ravindra Phadke
Abstract:
Microfluidics devices are fabricated by using multiple fabrication methods. Photolithography is one of the common methods wherein SU8 is widely used for making master which in turn is used for making working chip by the process of soft lithography. The high-aspect ratio features of SU-8 makes it suitable to be used as micro moulds for injection moulding, hot embossing, and moulds to form polydimethylsiloxane (PDMS) structures for bioMEMS (Microelectromechanical systems) applications. But due to high cost, difficulty in procuring and need for clean room, restricts the use of this polymer especially in developing countries and small research labs. ‘Bisphenol –A’ based polymers in mixture with curing agent are used in various industries like Paints and coatings, Adhesives, Electrical systems and electronics, Industrial tooling and composites. We present the novel use of ‘Bisphenol – A’ based polymer in fabricating micro channels for Lab On Chip(LOC) devices. The present paper describes the prototype for production of microfluidics chips using range of ‘Bisphenol-A’ based polymers viz. GY 250, ATUL B11, DER 331, DER 330 in mixture with cationic photo initiators. All the steps of chip production were carried out using an inexpensive approach that uses low cost chemicals and equipment. This even excludes the need of clean room. The produced chips using all above mentioned polymers were validated with respect to height and the chip giving least height was selected for further experimentation. The lowest height achieved was 7 micrometers by GY250. The cost of the master fabricated was $ 0.20 and working chip was $. 0.22. The best working chip was used for morphological identification and profiling of microorganisms from environmental samples like soil, marine water and salt water pan sites. The current chip can be adapted for various microbiological screening experiments like biochemical based microbial identification, studying uncultivable microorganisms at single cell/community level.Keywords: bisphenol–A based epoxy, cationic photoinitiators, microfabrication, photolithography
Procedia PDF Downloads 28724731 A Design Framework for an Open Market Platform of Enriched Card-Based Transactional Data for Big Data Analytics and Open Banking
Authors: Trevor Toy, Josef Langerman
Abstract:
Around a quarter of the world’s data is generated by financial with an estimated 708.5 billion global non-cash transactions reached between 2018 and. And with Open Banking still a rapidly developing concept within the financial industry, there is an opportunity to create a secure mechanism for connecting its stakeholders to openly, legitimately and consensually share the data required to enable it. Integration and data sharing of anonymised transactional data are still operated in silos and centralised between the large corporate entities in the ecosystem that have the resources to do so. Smaller fintechs generating data and businesses looking to consume data are largely excluded from the process. Therefore there is a growing demand for accessible transactional data for analytical purposes and also to support the rapid global adoption of Open Banking. The following research has provided a solution framework that aims to provide a secure decentralised marketplace for 1.) data providers to list their transactional data, 2.) data consumers to find and access that data, and 3.) data subjects (the individuals making the transactions that generate the data) to manage and sell the data that relates to themselves. The platform also provides an integrated system for downstream transactional-related data from merchants, enriching the data product available to build a comprehensive view of a data subject’s spending habits. A robust and sustainable data market can be developed by providing a more accessible mechanism for data producers to monetise their data investments and encouraging data subjects to share their data through the same financial incentives. At the centre of the platform is the market mechanism that connects the data providers and their data subjects to the data consumers. This core component of the platform is developed on a decentralised blockchain contract with a market layer that manages transaction, user, pricing, payment, tagging, contract, control, and lineage features that pertain to the user interactions on the platform. One of the platform’s key features is enabling the participation and management of personal data by the individuals from whom the data is being generated. This framework developed a proof-of-concept on the Etheruem blockchain base where an individual can securely manage access to their own personal data and that individual’s identifiable relationship to the card-based transaction data provided by financial institutions. This gives data consumers access to a complete view of transactional spending behaviour in correlation to key demographic information. This platform solution can ultimately support the growth, prosperity, and development of economies, businesses, communities, and individuals by providing accessible and relevant transactional data for big data analytics and open banking.Keywords: big data markets, open banking, blockchain, personal data management
Procedia PDF Downloads 7424730 Experimental Evaluation of Succinct Ternary Tree
Authors: Dmitriy Kuptsov
Abstract:
Tree data structures, such as binary or in general k-ary trees, are essential in computer science. The applications of these data structures can range from data search and retrieval to sorting and ranking algorithms. Naive implementations of these data structures can consume prohibitively large volumes of random access memory limiting their applicability in certain solutions. Thus, in these cases, more advanced representation of these data structures is essential. In this paper we present the design of the compact version of ternary tree data structure and demonstrate the results for the experimental evaluation using static dictionary problem. We compare these results with the results for binary and regular ternary trees. The conducted evaluation study shows that our design, in the best case, consumes up to 12 times less memory (for the dictionary used in our experimental evaluation) than a regular ternary tree and in certain configuration shows performance comparable to regular ternary trees. We have evaluated the performance of the algorithms using both 32 and 64 bit operating systems.Keywords: algorithms, data structures, succinct ternary tree, per- formance evaluation
Procedia PDF Downloads 16024729 Predicting Data Center Resource Usage Using Quantile Regression to Conserve Energy While Fulfilling the Service Level Agreement
Authors: Ahmed I. Alutabi, Naghmeh Dezhabad, Sudhakar Ganti
Abstract:
Data centers have been growing in size and dema nd continuously in the last two decades. Planning for the deployment of resources has been shallow and always resorted to over-provisioning. Data center operators try to maximize the availability of their services by allocating multiple of the needed resources. One resource that has been wasted, with little thought, has been energy. In recent years, programmable resource allocation has paved the way to allow for more efficient and robust data centers. In this work, we examine the predictability of resource usage in a data center environment. We use a number of models that cover a wide spectrum of machine learning categories. Then we establish a framework to guarantee the client service level agreement (SLA). Our results show that using prediction can cut energy loss by up to 55%.Keywords: machine learning, artificial intelligence, prediction, data center, resource allocation, green computing
Procedia PDF Downloads 10924728 Prosperous Digital Image Watermarking Approach by Using DCT-DWT
Authors: Prabhakar C. Dhavale, Meenakshi M. Pawar
Abstract:
In this paper, everyday tons of data is embedded on digital media or distributed over the internet. The data is so distributed that it can easily be replicated without error, putting the rights of their owners at risk. Even when encrypted for distribution, data can easily be decrypted and copied. One way to discourage illegal duplication is to insert information known as watermark, into potentially valuable data in such a way that it is impossible to separate the watermark from the data. These challenges motivated researchers to carry out intense research in the field of watermarking. A watermark is a form, image or text that is impressed onto paper, which provides evidence of its authenticity. Digital watermarking is an extension of the same concept. There are two types of watermarks visible watermark and invisible watermark. In this project, we have concentrated on implementing watermark in image. The main consideration for any watermarking scheme is its robustness to various attacksKeywords: watermarking, digital, DCT-DWT, security
Procedia PDF Downloads 42324727 Machine Learning Data Architecture
Authors: Neerav Kumar, Naumaan Nayyar, Sharath Kashyap
Abstract:
Most companies see an increase in the adoption of machine learning (ML) applications across internal and external-facing use cases. ML applications vend output either in batch or real-time patterns. A complete batch ML pipeline architecture comprises data sourcing, feature engineering, model training, model deployment, model output vending into a data store for downstream application. Due to unclear role expectations, we have observed that scientists specializing in building and optimizing models are investing significant efforts into building the other components of the architecture, which we do not believe is the best use of scientists’ bandwidth. We propose a system architecture created using AWS services that bring industry best practices to managing the workflow and simplifies the process of model deployment and end-to-end data integration for an ML application. This narrows down the scope of scientists’ work to model building and refinement while specialized data engineers take over the deployment, pipeline orchestration, data quality, data permission system, etc. The pipeline infrastructure is built and deployed as code (using terraform, cdk, cloudformation, etc.) which makes it easy to replicate and/or extend the architecture to other models that are used in an organization.Keywords: data pipeline, machine learning, AWS, architecture, batch machine learning
Procedia PDF Downloads 6524726 Methylation Profiling and Validation of Candidate Tissue-Specific Differentially Methylated Regions for Identification of Human Blood, Saliva, Semen and Vaginal Fluid and Its Application in Forensics
Authors: Meenu Joshi, Natalie Naidoo, Farzeen Kader
Abstract:
Identification of body fluids is an essential step in forensic investigation to aid in crime reconstruction. Tissue-specific differentially methylated regions (tDMRs) of the human genome can be targeted to be used as biomarkers to differentiate between body fluids. The present study was undertaken to establish the methylation status of potential tDMRs in blood, semen, saliva, and vaginal fluid by using methylation-specific PCR (MSP) and bisulfite sequencing (BS). The methylation statuses of 3 potential tDMRS in genes ZNF282, PTPRS, and HPCAL1 were analysed in 10 samples of each body fluid. With MSP analysis, the ZNF282, and PTPRS1 tDMR displayed semen-specific hypomethylation while HPCAL1 tDMR showed saliva-specific hypomethylation. With quantitative analysis by BS, the ZNF282 tDMR showed statistically significant difference in overall methylation between semen and all other body fluids as well as at individual CpG sites (p < 0.05). To evaluate the effect of environmental conditions on the stability of methylation profiles of the ZNF282 tDMR, five samples of each body fluid were subjected to five different forensic simulated conditions (dry at room temperature, wet in an exsiccator, outside on the ground, sprayed with alcohol, and sprayed with bleach) for 50 days. Vaginal fluid showed highest DNA recovery under all conditions while semen had least DNA quantity. Under outside on the ground condition, all body fluids except semen showed a decrease in methylation level; however, a significant decrease in methylation level was observed for saliva. A statistical significant difference was observed for saliva and semen (p < 0.05) for outside on the ground condition. No differences in methylation level were observed for the ZNF282 tDMR under all conditions for vaginal fluid samples. Thus, in the present study ZNF282 tDMR has been identified as a novel and stable semen-specific hypomethylation marker.Keywords: body fluids, bisulphite sequencing, forensics, tDMRs, MSP
Procedia PDF Downloads 16324725 Evaluation of Genetic Fidelity and Phytochemical Profiling of Micropropagated Plants of Cephalantheropsis obcordata: An Endangered Medicinal Orchid
Authors: Gargi Prasad, Ashiho A. Mao, Deepu Vijayan, S. Mandal
Abstract:
The main objective of the present study was to optimize and develop an efficient protocol for in vitro propagation of a medicinally important orchid Cephalantheropsis obcordata (Lindl.) Ormerod along with genetic stability analysis of regenerated plants. This plant has been traditionally used in Chinese folk medicine and the decoction of whole plant is known to possess anticancer activity. Nodal segments used as explants were inoculated on Murashige and Skoog (MS) medium supplemented with various concentrations of isopentenyl adenine (2iP). The rooted plants were successfully acclimatized in the greenhouse with 100% survival rate. Inter-simple sequence repeats (ISSR) markers were used to assess the genetic fidelity of in vitro raised plants and the mother plant. It was revealed that monomorphic bands showing the absence of polymorphism in all in vitro raised plantlets analyzed, confirming the genetic uniformity among the regenerants. Phytochemical analysis was done to compare the antioxidant activities and HPLC fingerprinting assay of 80% aqueous ethanol extract of the leaves and stem of in vitro and in vivo grown C. obcordata. The extracts of the plants were examined for their antioxidant activities by using free radical 1, 1-diphenyl-2-picryl hydrazyl (DPPH) scavenging method, 2,2’-azino-bis (3-ethylbenzothiazoline-6-sulfonic acid) (ABTS) radical scavenging ability, reducing power capacity, estimation of total phenolic content, flavonoid content and flavonol content. A simplified method for the detection of ascorbic acid, phenolic acids and flavonoids content was also developed by using reversed phase high-performance liquid chromatography (HPLC). This is the first report on the micropropagation, genetic integrity study and quantitative phytochemical analysis of in vitro regenerated plants of C. obcordata.Keywords: Cephalantheropsis obcordata, genetic fidelity, ISSR markers, HPLC
Procedia PDF Downloads 15724724 A Comparison of Image Data Representations for Local Stereo Matching
Authors: André Smith, Amr Abdel-Dayem
Abstract:
The stereo matching problem, while having been present for several decades, continues to be an active area of research. The goal of this research is to find correspondences between elements found in a set of stereoscopic images. With these pairings, it is possible to infer the distance of objects within a scene, relative to the observer. Advancements in this field have led to experimentations with various techniques, from graph-cut energy minimization to artificial neural networks. At the basis of these techniques is a cost function, which is used to evaluate the likelihood of a particular match between points in each image. While at its core, the cost is based on comparing the image pixel data; there is a general lack of consistency as to what image data representation to use. This paper presents an experimental analysis to compare the effectiveness of more common image data representations. The goal is to determine the effectiveness of these data representations to reduce the cost for the correct correspondence relative to other possible matches.Keywords: colour data, local stereo matching, stereo correspondence, disparity map
Procedia PDF Downloads 37124723 Business-Intelligence Mining of Large Decentralized Multimedia Datasets with a Distributed Multi-Agent System
Authors: Karima Qayumi, Alex Norta
Abstract:
The rapid generation of high volume and a broad variety of data from the application of new technologies pose challenges for the generation of business-intelligence. Most organizations and business owners need to extract data from multiple sources and apply analytical methods for the purposes of developing their business. Therefore, the recently decentralized data management environment is relying on a distributed computing paradigm. While data are stored in highly distributed systems, the implementation of distributed data-mining techniques is a challenge. The aim of this technique is to gather knowledge from every domain and all the datasets stemming from distributed resources. As agent technologies offer significant contributions for managing the complexity of distributed systems, we consider this for next-generation data-mining processes. To demonstrate agent-based business intelligence operations, we use agent-oriented modeling techniques to develop a new artifact for mining massive datasets.Keywords: agent-oriented modeling (AOM), business intelligence model (BIM), distributed data mining (DDM), multi-agent system (MAS)
Procedia PDF Downloads 43224722 Timing and Noise Data Mining Algorithm and Software Tool in Very Large Scale Integration (VLSI) Design
Authors: Qing K. Zhu
Abstract:
Very Large Scale Integration (VLSI) design becomes very complex due to the continuous integration of millions of gates in one chip based on Moore’s law. Designers have encountered numerous report files during design iterations using timing and noise analysis tools. This paper presented our work using data mining techniques combined with HTML tables to extract and represent critical timing/noise data. When we apply this data-mining tool in real applications, the running speed is important. The software employs table look-up techniques in the programming for the reasonable running speed based on performance testing results. We added several advanced features for the application in one industry chip design.Keywords: VLSI design, data mining, big data, HTML forms, web, VLSI, EDA, timing, noise
Procedia PDF Downloads 25424721 Introduction of Electronic Health Records to Improve Data Quality in Emergency Department Operations
Authors: Anuruddha Jagoda, Samiddhi Samarakoon, Anil Jasinghe
Abstract:
In its simplest form, data quality can be defined as 'fitness for use' and it is a concept with multi-dimensions. Emergency Departments(ED) require information to treat patients and on the other hand it is the primary source of information regarding accidents, injuries, emergencies etc. Also, it is the starting point of various patient registries, databases and surveillance systems. This interventional study was carried out to improve data quality at the ED of the National Hospital of Sri Lanka (NHSL) by introducing an e health solution to improve data quality. The NHSL is the premier trauma care centre in Sri Lanka. The study consisted of three components. A research study was conducted to assess the quality of data in relation to selected five dimensions of data quality namely accuracy, completeness, timeliness, legibility and reliability. The intervention was to develop and deploy an electronic emergency department information system (eEDIS). Post assessment of the intervention confirmed that all five dimensions of data quality had improved. The most significant improvements are noticed in accuracy and timeliness dimensions.Keywords: electronic health records, electronic emergency department information system, emergency department, data quality
Procedia PDF Downloads 27624720 Data Presentation of Lane-Changing Events Trajectories Using HighD Dataset
Authors: Basma Khelfa, Antoine Tordeux, Ibrahima Ba
Abstract:
We present a descriptive analysis data of lane-changing events in multi-lane roads. The data are provided from The Highway Drone Dataset (HighD), which are microscopic trajectories in highway. This paper describes and analyses the role of the different parameters and their significance. Thanks to HighD data, we aim to find the most frequent reasons that motivate drivers to change lanes. We used the programming language R for the processing of these data. We analyze the involvement and relationship of different variables of each parameter of the ego vehicle and the four vehicles surrounding it, i.e., distance, speed difference, time gap, and acceleration. This was studied according to the class of the vehicle (car or truck), and according to the maneuver it undertook (overtaking or falling back).Keywords: autonomous driving, physical traffic model, prediction model, statistical learning process
Procedia PDF Downloads 26224719 Evaluation of Golden Beam Data for the Commissioning of 6 and 18 MV Photons Beams in Varian Linear Accelerator
Authors: Shoukat Ali, Abdul Qadir Jandga, Amjad Hussain
Abstract:
Objective: The main purpose of this study is to compare the Percent Depth dose (PDD) and In-plane and cross-plane profiles of Varian Golden beam data to the measured data of 6 and 18 MV photons for the commissioning of Eclipse treatment planning system. Introduction: Commissioning of treatment planning system requires an extensive acquisition of beam data for the clinical use of linear accelerators. Accurate dose delivery require to enter the PDDs, Profiles and dose rate tables for open and wedges fields into treatment planning system, enabling to calculate the MUs and dose distribution. Varian offers a generic set of beam data as a reference data, however not recommend for clinical use. In this study, we compared the generic beam data with the measured beam data to evaluate the reliability of generic beam data to be used for the clinical purpose. Methods and Material: PDDs and Profiles of Open and Wedge fields for different field sizes and at different depths measured as per Varian’s algorithm commissioning guideline. The measurement performed with PTW 3D-scanning water phantom with semi-flex ion chamber and MEPHYSTO software. The online available Varian Golden Beam Data compared with the measured data to evaluate the accuracy of the golden beam data to be used for the commissioning of Eclipse treatment planning system. Results: The deviation between measured vs. golden beam data was in the range of 2% max. In PDDs, the deviation increases more in the deeper depths than the shallower depths. Similarly, profiles have the same trend of increasing deviation at large field sizes and increasing depths. Conclusion: Study shows that the percentage deviation between measured and golden beam data is within the acceptable tolerance and therefore can be used for the commissioning process; however, verification of small subset of acquired data with the golden beam data should be mandatory before clinical use.Keywords: percent depth dose, flatness, symmetry, golden beam data
Procedia PDF Downloads 49024718 Variable-Fidelity Surrogate Modelling with Kriging
Authors: Selvakumar Ulaganathan, Ivo Couckuyt, Francesco Ferranti, Tom Dhaene, Eric Laermans
Abstract:
Variable-fidelity surrogate modelling offers an efficient way to approximate function data available in multiple degrees of accuracy each with varying computational cost. In this paper, a Kriging-based variable-fidelity surrogate modelling approach is introduced to approximate such deterministic data. Initially, individual Kriging surrogate models, which are enhanced with gradient data of different degrees of accuracy, are constructed. Then these Gradient enhanced Kriging surrogate models are strategically coupled using a recursive CoKriging formulation to provide an accurate surrogate model for the highest fidelity data. While, intuitively, gradient data is useful to enhance the accuracy of surrogate models, the primary motivation behind this work is to investigate if it is also worthwhile incorporating gradient data of varying degrees of accuracy.Keywords: Kriging, CoKriging, Surrogate modelling, Variable- fidelity modelling, Gradients
Procedia PDF Downloads 55824717 Robust Barcode Detection with Synthetic-to-Real Data Augmentation
Authors: Xiaoyan Dai, Hsieh Yisan
Abstract:
Barcode processing of captured images is a huge challenge, as different shooting conditions can result in different barcode appearances. This paper proposes a deep learning-based barcode detection using synthetic-to-real data augmentation. We first augment barcodes themselves; we then augment images containing the barcodes to generate a large variety of data that is close to the actual shooting environments. Comparisons with previous works and evaluations with our original data show that this approach achieves state-of-the-art performance in various real images. In addition, the system uses hybrid resolution for barcode “scan” and is applicable to real-time applications.Keywords: barcode detection, data augmentation, deep learning, image-based processing
Procedia PDF Downloads 17324716 The Impact of the Fitness Center Ownership Structure on the Service Quality Perception in the Fitness in Serbia
Authors: Dragan Zivotic, Mirjana Ilic, Aleksandra Perovic, Predrag Gavrilovic
Abstract:
As with the provision of other services, the service quality perception is one of the key factors that the modern manager must pay attention to. Countries in which the state regulation is in transition also have specific features in providing fitness services. Identification of the dimensions in which the most significant different service quality perception between different types of fitness centers, enables managers to profile the offer according to the wishes and expectations of users. The aim of the paper was the comparison of the quality of services perception in the field of fitness in Serbia between three categories of fitness centers: the privately owned centers, the publicly owned centers, and the Public-private partnership centers. For this research 350 respondents of both genders (174 men and 176 women) were interviewed, aged between 18 and 68 years, being beneficiaries of fitness services for at least 1 year. Administered questionnaire with 100 items provided information about the 15 basic areas in which they expressed the service quality perception in the gym. The core sample was composed of 212 service users in private fitness centers, 69 service users in public fitness centers and 69 service users in the public-private partnership. Sub-samples were equal in representation of women and men, as well as by age and length of use of fitness services. The obtained results were subject of univariate analysis with the Kruskal-Wallis non-parametric analysis of variance. Significant differences between the analyzed sub-samples were not found solely in the areas of rapid response and quality outcomes. In the multivariate model, the results were processed by backward stepwise discriminant analysis that extracted 3 areas that maximize the differences between sub-samples: material and technical basis, secondary facilities and coaches. By applying the classification function 93.87% of private centers services users, 62.32% of public centers services users and 85.51% of the public-private partnership centers users of services were correctly classified (total 86.00%). These results allow optimizing the allocation of the necessary resources in profiling offers of a fitness center in order to optimally adjust it to the user’s needs and expectations.Keywords: fitness, quality perception, management, public ownership, private ownership, public-private partnership, discriminative analysis
Procedia PDF Downloads 294