Search results for: maximal data sets
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 25281

Search results for: maximal data sets

24531 Bringing Together Student Collaboration and Research Opportunities to Promote Scientific Understanding and Outreach Through a Seismological Community

Authors: Michael Ray Brunt

Abstract:

China has been the site of some of the most significant earthquakes in history; however, earthquake monitoring has long been the provenance of universities and research institutions. The China Digital Seismographic Network was initiated in 1983 and improved significantly during 1992-1993. Data from the CDSN is widely used by government and research institutions, and, generally, this data is not readily accessible to middle and high school students. An educational seismic network in China is needed to provide collaboration and research opportunities for students and engaging students around the country in scientific understanding of earthquake hazards and risks while promoting community awareness. In 2022, the Tsinghua International School (THIS) Seismology Team, made up of enthusiastic students and facilitated by two experienced teachers, was established. As a group, the team’s objective is to install seismographs in schools throughout China, thus creating an educational seismic network that shares data from the THIS Educational Seismic Network (THIS-ESN) and facilitates collaboration. The THIS-ESN initiative will enhance education and outreach in China about earthquake risks and hazards, introduce seismology to a wider audience, stimulate interest in research among students, and develop students’ programming, data collection and analysis skills. It will also encourage and inspire young minds to pursue science, technology, engineering, the arts, and math (STEAM) career fields. The THIS-ESN utilizes small, low-cost RaspberryShake seismographs as a powerful tool linked into a global network, giving schools and the public access to real-time seismic data from across China, increasing earthquake monitoring capabilities in the perspective areas and adding to the available data sets regionally and worldwide helping create a denser seismic network. The RaspberryShake seismograph is compatible with free seismic data viewing platforms such as SWARM, RaspberryShake web programs and mobile apps are designed specifically towards teaching seismology and seismic data interpretation, providing opportunities to enhance understanding. The RaspberryShake is powered by an operating system embedded in the Raspberry Pi, which makes it an easy platform to teach students basic computer communication concepts by utilizing processing tools to investigate, plot, and manipulate data. THIS Seismology Team believes strongly in creating opportunities for committed students to become part of the seismological community by engaging in analysis of real-time scientific data with tangible outcomes. Students will feel proud of the important work they are doing to understand the world around them and become advocates spreading their knowledge back into their homes and communities, helping to improve overall community resilience. We trust that, in studying the results seismograph stations yield, students will not only grasp how subjects like physics and computer science apply in real life, and by spreading information, we hope students across the country can appreciate how and why earthquakes bear on their lives, develop practical skills in STEAM, and engage in the global seismic monitoring effort. By providing such an opportunity to schools across the country, we are confident that we will be an agent of change for society.

Keywords: collaboration, outreach, education, seismology, earthquakes, public awareness, research opportunities

Procedia PDF Downloads 58
24530 Utilization of Informatics to Transform Clinical Data into a Simplified Reporting System to Examine the Analgesic Prescribing Practices of a Single Urban Hospital’s Emergency Department

Authors: Rubaiat S. Ahmed, Jemer Garrido, Sergey M. Motov

Abstract:

Clinical informatics (CI) enables the transformation of data into a systematic organization that improves the quality of care and the generation of positive health outcomes.Innovative technology through informatics that compiles accurate data on analgesic utilization in the emergency department can enhance pain management in this important clinical setting. We aim to establish a simplified reporting system through CI to examine and assess the analgesic prescribing practices in the EDthrough executing a U.S. federal grant project on opioid reduction initiatives. Queried data points of interest from a level-one trauma ED’s electronic medical records were used to create data sets and develop informational/visual reporting dashboards (on Microsoft Excel and Google Sheets) concerning analgesic usage across several pre-defined parameters and performance metrics using CI. The data was then qualitatively analyzed to evaluate ED analgesic prescribing trends by departmental clinicians and leadership. During a 12-month reporting period (Dec. 1, 2020 – Nov. 30, 2021) for the ongoing project, about 41% of all ED patient visits (N = 91,747) were for pain conditions, of which 81.6% received analgesics in the ED and at discharge (D/C). Of those treated with analgesics, 24.3% received opioids compared to 75.7% receiving opioid alternatives in the ED and at D/C, including non-pharmacological modalities. Demographics showed among patients receiving analgesics, 56.7% were aged between 18-64, 51.8% were male, 51.7% were white, and 66.2% had government funded health insurance. Ninety-one percent of all opioids prescribed were in the ED, with intravenous (IV) morphine, IV fentanyl, and morphine sulfate immediate release (MSIR) tablets accounting for 88.0% of ED dispensed opioids. With 9.3% of all opioids prescribed at D/C, MSIR was dispensed 72.1% of the time. Hydrocodone, oxycodone, and tramadol usage to only 10-15% of the time, and hydromorphone at 0%. Of opioid alternatives, non-steroidal anti-inflammatory drugs were utilized 60.3% of the time, 23.5% with local anesthetics and ultrasound-guided nerve blocks, and 7.9% with acetaminophen as the primary non-opioid drug categories prescribed by ED providers. Non-pharmacological analgesia included virtual reality and other modalities. An average of 18.5 ED opioid orders and 1.9 opioid D/C prescriptions per 102.4 daily ED patient visits was observed for the period. Compared to other specialties within our institution, 2.0% of opioid D/C prescriptions are given by ED providers, compared to the national average of 4.8%. Opioid alternatives accounted for 69.7% and 30.3% usage, versus 90.7% and 9.3% for opioids in the ED and D/C, respectively.There is a pressing need for concise, relevant, and reliable clinical data on analgesic utilization for ED providers and leadership to evaluate prescribing practices and make data-driven decisions. Basic computer software can be used to create effective visual reporting dashboards with indicators that convey relevant and timely information in an easy-to-digest manner. We accurately examined our ED's analgesic prescribing practices using CI through dashboard reporting. Such reporting tools can quickly identify key performance indicators and prioritize data to enhance pain management and promote safe prescribing practices in the emergency setting.

Keywords: clinical informatics, dashboards, emergency department, health informatics, healthcare informatics, medical informatics, opioids, pain management, technology

Procedia PDF Downloads 129
24529 Genomic Prediction Reliability Using Haplotypes Defined by Different Methods

Authors: Sohyoung Won, Heebal Kim, Dajeong Lim

Abstract:

Genomic prediction is an effective way to measure the abilities of livestock for breeding based on genomic estimated breeding values, statistically predicted values from genotype data using best linear unbiased prediction (BLUP). Using haplotypes, clusters of linked single nucleotide polymorphisms (SNPs), as markers instead of individual SNPs can improve the reliability of genomic prediction since the probability of a quantitative trait loci to be in strong linkage disequilibrium (LD) with markers is higher. To efficiently use haplotypes in genomic prediction, finding optimal ways to define haplotypes is needed. In this study, 770K SNP chip data was collected from Hanwoo (Korean cattle) population consisted of 2506 cattle. Haplotypes were first defined in three different ways using 770K SNP chip data: haplotypes were defined based on 1) length of haplotypes (bp), 2) the number of SNPs, and 3) k-medoids clustering by LD. To compare the methods in parallel, haplotypes defined by all methods were set to have comparable sizes; in each method, haplotypes defined to have an average number of 5, 10, 20 or 50 SNPs were tested respectively. A modified GBLUP method using haplotype alleles as predictor variables was implemented for testing the prediction reliability of each haplotype set. Also, conventional genomic BLUP (GBLUP) method, which uses individual SNPs were tested to evaluate the performance of the haplotype sets on genomic prediction. Carcass weight was used as the phenotype for testing. As a result, using haplotypes defined by all three methods showed increased reliability compared to conventional GBLUP. There were not many differences in the reliability between different haplotype defining methods. The reliability of genomic prediction was highest when the average number of SNPs per haplotype was 20 in all three methods, implying that haplotypes including around 20 SNPs can be optimal to use as markers for genomic prediction. When the number of alleles generated by each haplotype defining methods was compared, clustering by LD generated the least number of alleles. Using haplotype alleles for genomic prediction showed better performance, suggesting improved accuracy in genomic selection. The number of predictor variables was decreased when the LD-based method was used while all three haplotype defining methods showed similar performances. This suggests that defining haplotypes based on LD can reduce computational costs and allows efficient prediction. Finding optimal ways to define haplotypes and using the haplotype alleles as markers can provide improved performance and efficiency in genomic prediction.

Keywords: best linear unbiased predictor, genomic prediction, haplotype, linkage disequilibrium

Procedia PDF Downloads 131
24528 Most Recent Lifespan Estimate for the Itaipu Hydroelectric Power Plant Computed by Using Borland and Miller Method and Mass Balance in Brazil, Paraguay

Authors: Anderson Braga Mendes

Abstract:

Itaipu Hydroelectric Power Plant is settled on the Paraná River, which is a natural boundary between Brazil and Paraguay; thus, the facility is shared by both countries. Itaipu Power Plant is the biggest hydroelectric generator in the world, and provides clean and renewable electrical energy supply for 17% and 76% of Brazil and Paraguay, respectively. The plant started its generation in 1984. It counts on 20 Francis turbines and has installed capacity of 14,000 MWh. Its historic generation record occurred in 2016 (103,098,366 MWh), and since the beginning of its operation until the last day of 2016 the plant has achieved the sum of 2,415,789,823 MWh. The distinct sedimentologic aspects of the drainage area of Itaipu Power Plant, from its stretch upstream (Porto Primavera and Rosana dams) to downstream (Itaipu dam itself), were taken into account in order to best estimate the increase/decrease in the sediment yield by using data from 2001 to 2016. Such data are collected through a network of 14 automatic sedimentometric stations managed by the company itself and operating in an hourly basis, covering an area of around 136,000 km² (92% of the incremental drainage area of the undertaking). Since 1972, a series of lifespan studies for the Itaipu Power Plant have been made, being first assessed by Sir Hans Albert Einstein, at the time of the feasibility studies for the enterprise. From that date onwards, eight further studies were made through the last 44 years aiming to confer more precision upon the estimates based on more updated data sets. From the analysis of each monitoring station, it was clearly noticed strong increase tendencies in the sediment yield through the last 14 years, mainly in the Iguatemi, Ivaí, São Francisco Falso and Carapá Rivers, the latter situated in Paraguay, whereas the others are utterly in Brazilian territory. Five lifespan scenarios considering different sediment yield tendencies were simulated with the aid of the softwares SEDIMENT and DPOSIT, both developed by the author of the present work. Such softwares thoroughly follow the Borland & Miller methodology (empirical method of area-reduction). The soundest scenario out of the five ones under analysis indicated a lifespan foresight of 168 years, being the reservoir only 1.8% silted by the end of 2016, after 32 years of operation. Besides, the mass balance in the reservoir (water inflows minus outflows) between 1986 and 2016 shows that 2% of the whole Itaipu lake is silted nowadays. Owing to the convergence of both results, which were acquired by using different methodologies and independent input data, it is worth concluding that the mathematical modeling is satisfactory and calibrated, thus assigning credibility to this most recent lifespan estimate.

Keywords: Borland and Miller method, hydroelectricity, Itaipu Power Plant, lifespan, mass balance

Procedia PDF Downloads 261
24527 Vertical and Lateral Vibration Analysis of Conventional Elevator

Authors: Mohammadreza Saviz, Sina Najafian

Abstract:

This paper presents an analytical study of vibration moving elevator and shows the elevator 2D dynamic model to evaluate the vertical and lateral motion. Most elevators applied to tall buildings include compensating ropes to satisfy the balanced rope tension between the car and the counterweight. The elasticity of these ropes and springs of sets that connect cabin to ropes make the elevator car to vibrate. A two-dimensional model is derived to calculate vibrations and displacements. The simulation results were validated by the results of similar works.

Keywords: elevator, vibration, simulation, analytical solution, 2D modeling

Procedia PDF Downloads 293
24526 Comparison of Multivariate Adaptive Regression Splines and Random Forest Regression in Predicting Forced Expiratory Volume in One Second

Authors: P. V. Pramila , V. Mahesh

Abstract:

Pulmonary Function Tests are important non-invasive diagnostic tests to assess respiratory impairments and provides quantifiable measures of lung function. Spirometry is the most frequently used measure of lung function and plays an essential role in the diagnosis and management of pulmonary diseases. However, the test requires considerable patient effort and cooperation, markedly related to the age of patients esulting in incomplete data sets. This paper presents, a nonlinear model built using Multivariate adaptive regression splines and Random forest regression model to predict the missing spirometric features. Random forest based feature selection is used to enhance both the generalization capability and the model interpretability. In the present study, flow-volume data are recorded for N= 198 subjects. The ranked order of feature importance index calculated by the random forests model shows that the spirometric features FVC, FEF 25, PEF,FEF 25-75, FEF50, and the demographic parameter height are the important descriptors. A comparison of performance assessment of both models prove that, the prediction ability of MARS with the `top two ranked features namely the FVC and FEF 25 is higher, yielding a model fit of R2= 0.96 and R2= 0.99 for normal and abnormal subjects. The Root Mean Square Error analysis of the RF model and the MARS model also shows that the latter is capable of predicting the missing values of FEV1 with a notably lower error value of 0.0191 (normal subjects) and 0.0106 (abnormal subjects). It is concluded that combining feature selection with a prediction model provides a minimum subset of predominant features to train the model, yielding better prediction performance. This analysis can assist clinicians with a intelligence support system in the medical diagnosis and improvement of clinical care.

Keywords: FEV, multivariate adaptive regression splines pulmonary function test, random forest

Procedia PDF Downloads 296
24525 Application of Artificial Neural Network Technique for Diagnosing Asthma

Authors: Azadeh Bashiri

Abstract:

Introduction: Lack of proper diagnosis and inadequate treatment of asthma leads to physical and financial complications. This study aimed to use data mining techniques and creating a neural network intelligent system for diagnosis of asthma. Methods: The study population is the patients who had visited one of the Lung Clinics in Tehran. Data were analyzed using the SPSS statistical tool and the chi-square Pearson's coefficient was the basis of decision making for data ranking. The considered neural network is trained using back propagation learning technique. Results: According to the analysis performed by means of SPSS to select the top factors, 13 effective factors were selected, in different performances, data was mixed in various forms, so the different models were made for training the data and testing networks and in all different modes, the network was able to predict correctly 100% of all cases. Conclusion: Using data mining methods before the design structure of system, aimed to reduce the data dimension and the optimum choice of the data, will lead to a more accurate system. Therefore, considering the data mining approaches due to the nature of medical data is necessary.

Keywords: asthma, data mining, Artificial Neural Network, intelligent system

Procedia PDF Downloads 260
24524 Interpreting Privacy Harms from a Non-Economic Perspective

Authors: Christopher Muhawe, Masooda Bashir

Abstract:

With increased Internet Communication Technology(ICT), the virtual world has become the new normal. At the same time, there is an unprecedented collection of massive amounts of data by both private and public entities. Unfortunately, this increase in data collection has been in tandem with an increase in data misuse and data breach. Regrettably, the majority of data breach and data misuse claims have been unsuccessful in the United States courts for the failure of proof of direct injury to physical or economic interests. The requirement to express data privacy harms from an economic or physical stance negates the fact that not all data harms are physical or economic in nature. The challenge is compounded by the fact that data breach harms and risks do not attach immediately. This research will use a descriptive and normative approach to show that not all data harms can be expressed in economic or physical terms. Expressing privacy harms purely from an economic or physical harm perspective negates the fact that data insecurity may result into harms which run counter the functions of privacy in our lives. The promotion of liberty, selfhood, autonomy, promotion of human social relations and the furtherance of the existence of a free society. There is no economic value that can be placed on these functions of privacy. The proposed approach addresses data harms from a psychological and social perspective.

Keywords: data breach and misuse, economic harms, privacy harms, psychological harms

Procedia PDF Downloads 178
24523 Toward a Measure of Appropriateness of User Interfaces Adaptations Solutions

Authors: Abderrahim Siam, Ramdane Maamri, Zaidi Sahnoun

Abstract:

The development of adaptive user interfaces (UI) presents for a long time an important research area in which researcher attempt to call upon the full resources and skills of several disciplines. The adaptive UI community holds a thorough knowledge regarding the adaptation of UIs with users and with contexts of use. Several solutions, models, formalisms, techniques, and mechanisms were proposed to develop adaptive UI. In this paper, we propose an approach based on the fuzzy set theory for modeling the concept of the appropriateness of different solutions of UI adaptation with different situations for which interactive systems have to adapt their UIs.

Keywords: adaptive user interfaces, adaptation solution’s appropriateness, fuzzy sets

Procedia PDF Downloads 467
24522 Machine Learning Analysis of Student Success in Introductory Calculus Based Physics I Course

Authors: Chandra Prayaga, Aaron Wade, Lakshmi Prayaga, Gopi Shankar Mallu

Abstract:

This paper presents the use of machine learning algorithms to predict the success of students in an introductory physics course. Data having 140 rows pertaining to the performance of two batches of students was used. The lack of sufficient data to train robust machine learning models was compensated for by generating synthetic data similar to the real data. CTGAN and CTGAN with Gaussian Copula (Gaussian) were used to generate synthetic data, with the real data as input. To check the similarity between the real data and each synthetic dataset, pair plots were made. The synthetic data was used to train machine learning models using the PyCaret package. For the CTGAN data, the Ada Boost Classifier (ADA) was found to be the ML model with the best fit, whereas the CTGAN with Gaussian Copula yielded Logistic Regression (LR) as the best model. Both models were then tested for accuracy with the real data. ROC-AUC analysis was performed for all the ten classes of the target variable (Grades A, A-, B+, B, B-, C+, C, C-, D, F). The ADA model with CTGAN data showed a mean AUC score of 0.4377, but the LR model with the Gaussian data showed a mean AUC score of 0.6149. ROC-AUC plots were obtained for each Grade value separately. The LR model with Gaussian data showed consistently better AUC scores compared to the ADA model with CTGAN data, except in two cases of the Grade value, C- and A-.

Keywords: machine learning, student success, physics course, grades, synthetic data, CTGAN, gaussian copula CTGAN

Procedia PDF Downloads 29
24521 Fuzzy Decision Making to the Construction Project Management: Glass Facade Selection

Authors: Katarina Rogulj, Ivana Racetin, Jelena Kilic

Abstract:

In this study, the fuzzy logic approach (FLA) was developed for construction project management (CPM) under uncertainty and duality. The focus was on decision making in selecting the type of the glass facade for a residential-commercial building in the main design. The adoption of fuzzy sets was capable of reflecting construction managers’ reliability level over subjective judgments, and thus the robustness of the system can be achieved. An α-cuts method was utilized for discretizing the fuzzy sets in FLA. This method can communicate all uncertain information in the optimization process, taking into account the values of this information. Furthermore, FLA provides in-depth analyses of diverse policy scenarios that are related to various levels of economic aspects when it comes to the construction projects' valid decision making. The developed approach is applied to CPM to demonstrate its applicability. Analyzing the materials of glass facades, variants were defined. The development of the FLA for the CPM included relevant construction projec'ts stakeholders that were involved in the criteria definition to evaluate each variant. Using fuzzy Decision-Making Trial and Evaluation Laboratory Method (DEMATEL) comparison of the glass facade was conducted. This way, a rank, according to the priorities for inclusion into the main design, of variants is obtained. The concept was tested on a residential-commercial building in the city of Rijeka, Croatia. The newly developed methodology was then compared with the existing one. The aim of the research was to define an approach that will improve current judgments and decisions when it comes to the material selection of buildings facade as one of the most important architectural and engineering tasks in the main design. The advantage of the new methodology compared to the old one is that it includes the subjective side of the managers’ decisions, as an inevitable factor in each decision making. The proposed approach can help construction projects managers to identify the desired type of glass facade according to their preference and practical conditions, as well as facilitate in-depth analyses of tradeoffs between economic efficiency and architectural design.

Keywords: construction projects management, DEMATEL, fuzzy logic approach, glass façade selection

Procedia PDF Downloads 125
24520 Data Access, AI Intensity, and Scale Advantages

Authors: Chuping Lo

Abstract:

This paper presents a simple model demonstrating that ceteris paribus countries with lower barriers to accessing global data tend to earn higher incomes than other countries. Therefore, large countries that inherently have greater data resources tend to have higher incomes than smaller countries, such that the former may be more hesitant than the latter to liberalize cross-border data flows to maintain this advantage. Furthermore, countries with higher artificial intelligence (AI) intensity in production technologies tend to benefit more from economies of scale in data aggregation, leading to higher income and more trade as they are better able to utilize global data.

Keywords: digital intensity, digital divide, international trade, scale of economics

Procedia PDF Downloads 53
24519 Secured Transmission and Reserving Space in Images Before Encryption to Embed Data

Authors: G. R. Navaneesh, E. Nagarajan, C. H. Rajam Raju

Abstract:

Nowadays the multimedia data are used to store some secure information. All previous methods allocate a space in image for data embedding purpose after encryption. In this paper, we propose a novel method by reserving space in image with a boundary surrounded before encryption with a traditional RDH algorithm, which makes it easy for the data hider to reversibly embed data in the encrypted images. The proposed method can achieve real time performance, that is, data extraction and image recovery are free of any error. A secure transmission process is also discussed in this paper, which improves the efficiency by ten times compared to other processes as discussed.

Keywords: secure communication, reserving room before encryption, least significant bits, image encryption, reversible data hiding

Procedia PDF Downloads 399
24518 Identity Verification Using k-NN Classifiers and Autistic Genetic Data

Authors: Fuad M. Alkoot

Abstract:

DNA data have been used in forensics for decades. However, current research looks at using the DNA as a biometric identity verification modality. The goal is to improve the speed of identification. We aim at using gene data that was initially used for autism detection to find if and how accurate is this data for identification applications. Mainly our goal is to find if our data preprocessing technique yields data useful as a biometric identification tool. We experiment with using the nearest neighbor classifier to identify subjects. Results show that optimal classification rate is achieved when the test set is corrupted by normally distributed noise with zero mean and standard deviation of 1. The classification rate is close to optimal at higher noise standard deviation reaching 3. This shows that the data can be used for identity verification with high accuracy using a simple classifier such as the k-nearest neighbor (k-NN). 

Keywords: biometrics, genetic data, identity verification, k nearest neighbor

Procedia PDF Downloads 238
24517 A Review on Intelligent Systems for Geoscience

Authors: R Palson Kennedy, P.Kiran Sai

Abstract:

This article introduces machine learning (ML) researchers to the hurdles that geoscience problems present, as well as the opportunities for improvement in both ML and geosciences. This article presents a review from the data life cycle perspective to meet that need. Numerous facets of geosciences present unique difficulties for the study of intelligent systems. Geosciences data is notoriously difficult to analyze since it is frequently unpredictable, intermittent, sparse, multi-resolution, and multi-scale. The first half addresses data science’s essential concepts and theoretical underpinnings, while the second section contains key themes and sharing experiences from current publications focused on each stage of the data life cycle. Finally, themes such as open science, smart data, and team science are considered.

Keywords: Data science, intelligent system, machine learning, big data, data life cycle, recent development, geo science

Procedia PDF Downloads 123
24516 Effect of the Aluminum Fraction “X” on the Laser Wavelengths in GaAs/AlxGa1-xAs Superlattices

Authors: F.Bendahma, S.Bentata

Abstract:

In this paper, we study numerically the eigenstates existing in a GaAs/AlxGa1-xAs superlattice with structural disorder in trimer height barrier (THB). Aluminium concentration x takes at random two different values, one of them appears only in triply and remains inferior to the second in the studied structure. In spite of the presence of disorder, the system exhibits two kinds of sets of propagating states lying below the barrier due to the characteristic structure of the superlattice. This result allows us to note the existence of a single laser emission in trimer and wavelengths are obtained in the mid-infrared.

Keywords: infrared (IR), laser emission, superlattice, trimer

Procedia PDF Downloads 435
24515 Brain Tumor Segmentation Based on Minimum Spanning Tree

Authors: Simeon Mayala, Ida Herdlevær, Jonas Bull Haugsøen, Shamundeeswari Anandan, Sonia Gavasso, Morten Brun

Abstract:

In this paper, we propose a minimum spanning tree-based method for segmenting brain tumors. The proposed method performs interactive segmentation based on the minimum spanning tree without tuning parameters. The steps involve preprocessing, making a graph, constructing a minimum spanning tree, and a newly implemented way of interactively segmenting the region of interest. In the preprocessing step, a Gaussian filter is applied to 2D images to remove the noise. Then, the pixel neighbor graph is weighted by intensity differences and the corresponding minimum spanning tree is constructed. The image is loaded in an interactive window for segmenting the tumor. The region of interest and the background are selected by clicking to split the minimum spanning tree into two trees. One of these trees represents the region of interest and the other represents the background. Finally, the segmentation given by the two trees is visualized. The proposed method was tested by segmenting two different 2D brain T1-weighted magnetic resonance image data sets. The comparison between our results and the standard gold segmentation confirmed the validity of the minimum spanning tree approach. The proposed method is simple to implement and the results indicate that it is accurate and efficient.

Keywords: brain tumor, brain tumor segmentation, minimum spanning tree, segmentation, image processing

Procedia PDF Downloads 110
24514 Data Quality as a Pillar of Data-Driven Organizations: Exploring the Benefits of Data Mesh

Authors: Marc Bachelet, Abhijit Kumar Chatterjee, José Manuel Avila

Abstract:

Data quality is a key component of any data-driven organization. Without data quality, organizations cannot effectively make data-driven decisions, which often leads to poor business performance. Therefore, it is important for an organization to ensure that the data they use is of high quality. This is where the concept of data mesh comes in. Data mesh is an organizational and architectural decentralized approach to data management that can help organizations improve the quality of data. The concept of data mesh was first introduced in 2020. Its purpose is to decentralize data ownership, making it easier for domain experts to manage the data. This can help organizations improve data quality by reducing the reliance on centralized data teams and allowing domain experts to take charge of their data. This paper intends to discuss how a set of elements, including data mesh, are tools capable of increasing data quality. One of the key benefits of data mesh is improved metadata management. In a traditional data architecture, metadata management is typically centralized, which can lead to data silos and poor data quality. With data mesh, metadata is managed in a decentralized manner, ensuring accurate and up-to-date metadata, thereby improving data quality. Another benefit of data mesh is the clarification of roles and responsibilities. In a traditional data architecture, data teams are responsible for managing all aspects of data, which can lead to confusion and ambiguity in responsibilities. With data mesh, domain experts are responsible for managing their own data, which can help provide clarity in roles and responsibilities and improve data quality. Additionally, data mesh can also contribute to a new form of organization that is more agile and adaptable. By decentralizing data ownership, organizations can respond more quickly to changes in their business environment, which in turn can help improve overall performance by allowing better insights into business as an effect of better reports and visualization tools. Monitoring and analytics are also important aspects of data quality. With data mesh, monitoring, and analytics are decentralized, allowing domain experts to monitor and analyze their own data. This will help in identifying and addressing data quality problems in quick time, leading to improved data quality. Data culture is another major aspect of data quality. With data mesh, domain experts are encouraged to take ownership of their data, which can help create a data-driven culture within the organization. This can lead to improved data quality and better business outcomes. Finally, the paper explores the contribution of AI in the coming years. AI can help enhance data quality by automating many data-related tasks, like data cleaning and data validation. By integrating AI into data mesh, organizations can further enhance the quality of their data. The concepts mentioned above are illustrated by AEKIDEN experience feedback. AEKIDEN is an international data-driven consultancy that has successfully implemented a data mesh approach. By sharing their experience, AEKIDEN can help other organizations understand the benefits and challenges of implementing data mesh and improving data quality.

Keywords: data culture, data-driven organization, data mesh, data quality for business success

Procedia PDF Downloads 119
24513 Parameters of Main Stage of Discharge between Artificial Charged Aerosol Cloud and Ground in Presence of Model Hydrometeor Arrays

Authors: D. S. Zhuravkova, A. G. Temnikov, O. S. Belova, L. L. Chernensky, T. K. Gerastenok, I. Y. Kalugina, N. Y. Lysov, A.V. Orlov

Abstract:

Investigation of the discharges from the artificial charged water aerosol clouds in presence of the arrays of the model hydrometeors could help to receive the new data about the peculiarities of the return stroke formation between the thundercloud and the ground when the large volumes of the hail particles participate in the lightning discharge initiation and propagation stimulation. Artificial charged water aerosol clouds of the negative or positive polarity with the potential up to one million volts have been used. Hail has been simulated by the group of the conductive model hydrometeors of the different form. Parameters of the impulse current of the main stage of the discharge between the artificial positively and negatively charged water aerosol clouds and the ground in presence of the model hydrometeors array and of its corresponding electromagnetic radiation have been determined. It was established that the parameters of the array of the model hydrometeors influence on the parameters of the main stage of the discharge between the artificial thundercloud cell and the ground. The maximal values of the main stage current impulse parameters and the electromagnetic radiation registered by the plate antennas have been found for the array of the model hydrometeors of the cylinder revolution form for the negatively charged aerosol cloud and for the array of the hydrometeors of the plate rhombus form for the positively charged aerosol cloud, correspondingly. It was found that parameters of the main stage of the discharge between the artificial charged water aerosol cloud and the ground in presence of the model hydrometeor array of the different considered forms depend on the polarity of the artificial charged aerosol cloud. In average, for all forms of the investigated model hydrometeors arrays, the values of the amplitude and the current rise of the main stage impulse current and the amplitude of the corresponding electromagnetic radiation for the artificial charged aerosol cloud of the positive polarity were in 1.1-1.9 times higher than for the charged aerosol cloud of the negative polarity. Thus, the received results could indicate to the possible more important role of the big volumes of the large hail arrays in the thundercloud on the parameters of the return stroke for the positive lightning.

Keywords: main stage of discharge, hydrometeor form, lightning parameters, negative and positive artificial charged aerosol cloud

Procedia PDF Downloads 246
24512 A Fuzzy Inference System for Predicting Air Traffic Demand Based on Socioeconomic Drivers

Authors: Nur Mohammad Ali, Md. Shafiqul Alam, Jayanta Bhusan Deb, Nowrin Sharmin

Abstract:

The past ten years have seen significant expansion in the aviation sector, which during the previous five years has steadily pushed emerging countries closer to economic independence. It is crucial to accurately forecast the potential demand for air travel to make long-term financial plans. To forecast market demand for low-cost passenger carriers, this study suggests working with low-cost airlines, airports, consultancies, and governmental institutions' strategic planning divisions. The study aims to develop an artificial intelligence-based methods, notably fuzzy inference systems (FIS), to determine the most accurate forecasting technique for domestic low-cost carrier demand in Bangladesh. To give end users real-world applications, the study includes nine variables, two sub-FIS, and one final Mamdani Fuzzy Inference System utilizing a graphical user interface (GUI) made with the app designer tool. The evaluation criteria used in this inquiry included mean square error (MSE), accuracy, precision, sensitivity, and specificity. The effectiveness of the developed air passenger demand prediction FIS is assessed using 240 data sets, and the accuracy, precision, sensitivity, specificity, and MSE values are 90.83%, 91.09%, 90.77%, and 2.09%, respectively.

Keywords: aviation industry, fuzzy inference system, membership function, graphical user interference

Procedia PDF Downloads 55
24511 Big Data Analysis with RHadoop

Authors: Ji Eun Shin, Byung Ho Jung, Dong Hoon Lim

Abstract:

It is almost impossible to store or analyze big data increasing exponentially with traditional technologies. Hadoop is a new technology to make that possible. R programming language is by far the most popular statistical tool for big data analysis based on distributed processing with Hadoop technology. With RHadoop that integrates R and Hadoop environment, we implemented parallel multiple regression analysis with different sizes of actual data. Experimental results showed our RHadoop system was much faster as the number of data nodes increases. We also compared the performance of our RHadoop with lm function and big lm packages available on big memory. The results showed that our RHadoop was faster than other packages owing to paralleling processing with increasing the number of map tasks as the size of data increases.

Keywords: big data, Hadoop, parallel regression analysis, R, RHadoop

Procedia PDF Downloads 421
24510 A Mutually Exclusive Task Generation Method Based on Data Augmentation

Authors: Haojie Wang, Xun Li, Rui Yin

Abstract:

In order to solve the memorization overfitting in the meta-learning MAML algorithm, a method of generating mutually exclusive tasks based on data augmentation is proposed. This method generates a mutex task by corresponding one feature of the data to multiple labels, so that the generated mutex task is inconsistent with the data distribution in the initial dataset. Because generating mutex tasks for all data will produce a large number of invalid data and, in the worst case, lead to exponential growth of computation, this paper also proposes a key data extraction method, that only extracts part of the data to generate the mutex task. The experiments show that the method of generating mutually exclusive tasks can effectively solve the memorization overfitting in the meta-learning MAML algorithm.

Keywords: data augmentation, mutex task generation, meta-learning, text classification.

Procedia PDF Downloads 80
24509 Efficient Positioning of Data Aggregation Point for Wireless Sensor Network

Authors: Sifat Rahman Ahona, Rifat Tasnim, Naima Hassan

Abstract:

Data aggregation is a helpful technique for reducing the data communication overhead in wireless sensor network. One of the important tasks of data aggregation is positioning of the aggregator points. There are a lot of works done on data aggregation. But, efficient positioning of the aggregators points is not focused so much. In this paper, authors are focusing on the positioning or the placement of the aggregation points in wireless sensor network. Authors proposed an algorithm to select the aggregators positions for a scenario where aggregator nodes are more powerful than sensor nodes.

Keywords: aggregation point, data communication, data aggregation, wireless sensor network

Procedia PDF Downloads 144
24508 Comparison of Whole-Body Vibration and Plyometric Exercises on Explosive Power in Non-Athlete Girl Students

Authors: Fereshteh Zarei, Mahdi Kohandel

Abstract:

The aim of this study was investigate and compare plyometric and vibration exercises on muscle explosive power in non-athlete female students. For this purpose, 45 female students from non-athletes selected target then divided in to the three groups, two experimental and one control groups. From all groups were getting pre-tested. Experimental A did whole-body vibration exercises involved standing on one of machine vibration with frequency 30 Hz, amplitude 10 mm and in 5 different postures. Training for each position was 40 seconds with 60 seconds rest between it, and each season 5 seconds was added to duration of each body condition, until time up to 2 minutes for each postures. Exercises were done three times a week for 2 month. Experimental group B did plyometric exercises that include jumping, such as horizontal, vertical, and skipping .They included 10 times repeat for 5 set in each season. Intensity with increasing repetitions and sets were added. At this time, asked from control group that keep a daily activity and avoided strength training, explosive power and. after do exercises by groups we measured factors again. One-way analysis of variance and paired t statistical methods were used to analyze the data. There was significant difference in the amount of explosive power between the control and vibration groups (p=0/048) there was significant difference between the control and plyometric groups (019/0 = p). But between vibration and plyometric groups didn't observe significant difference in the amount of explosive power.

Keywords: vibration, plyometric, exercises, explosive power, non-athlete

Procedia PDF Downloads 437
24507 Spatial Econometric Approaches for Count Data: An Overview and New Directions

Authors: Paula Simões, Isabel Natário

Abstract:

This paper reviews a number of theoretical aspects for implementing an explicit spatial perspective in econometrics for modelling non-continuous data, in general, and count data, in particular. It provides an overview of the several spatial econometric approaches that are available to model data that are collected with reference to location in space, from the classical spatial econometrics approaches to the recent developments on spatial econometrics to model count data, in a Bayesian hierarchical setting. Considerable attention is paid to the inferential framework, necessary for structural consistent spatial econometric count models, incorporating spatial lag autocorrelation, to the corresponding estimation and testing procedures for different assumptions, to the constrains and implications embedded in the various specifications in the literature. This review combines insights from the classical spatial econometrics literature as well as from hierarchical modeling and analysis of spatial data, in order to look for new possible directions on the processing of count data, in a spatial hierarchical Bayesian econometric context.

Keywords: spatial data analysis, spatial econometrics, Bayesian hierarchical models, count data

Procedia PDF Downloads 576
24506 A NoSQL Based Approach for Real-Time Managing of Robotics's Data

Authors: Gueidi Afef, Gharsellaoui Hamza, Ben Ahmed Samir

Abstract:

This paper deals with the secret of the continual progression data that new data management solutions have been emerged: The NoSQL databases. They crossed several areas like personalization, profile management, big data in real-time, content management, catalog, view of customers, mobile applications, internet of things, digital communication and fraud detection. Nowadays, these database management systems are increasing. These systems store data very well and with the trend of big data, a new challenge’s store demands new structures and methods for managing enterprise data. The new intelligent machine in the e-learning sector, thrives on more data, so smart machines can learn more and faster. The robotics are our use case to focus on our test. The implementation of NoSQL for Robotics wrestle all the data they acquire into usable form because with the ordinary type of robotics; we are facing very big limits to manage and find the exact information in real-time. Our original proposed approach was demonstrated by experimental studies and running example used as a use case.

Keywords: NoSQL databases, database management systems, robotics, big data

Procedia PDF Downloads 333
24505 Fuzzy Optimization Multi-Objective Clustering Ensemble Model for Multi-Source Data Analysis

Authors: C. B. Le, V. N. Pham

Abstract:

In modern data analysis, multi-source data appears more and more in real applications. Multi-source data clustering has emerged as a important issue in the data mining and machine learning community. Different data sources provide information about different data. Therefore, multi-source data linking is essential to improve clustering performance. However, in practice multi-source data is often heterogeneous, uncertain, and large. This issue is considered a major challenge from multi-source data. Ensemble is a versatile machine learning model in which learning techniques can work in parallel, with big data. Clustering ensemble has been shown to outperform any standard clustering algorithm in terms of accuracy and robustness. However, most of the traditional clustering ensemble approaches are based on single-objective function and single-source data. This paper proposes a new clustering ensemble method for multi-source data analysis. The fuzzy optimized multi-objective clustering ensemble method is called FOMOCE. Firstly, a clustering ensemble mathematical model based on the structure of multi-objective clustering function, multi-source data, and dark knowledge is introduced. Then, rules for extracting dark knowledge from the input data, clustering algorithms, and base clusterings are designed and applied. Finally, a clustering ensemble algorithm is proposed for multi-source data analysis. The experiments were performed on the standard sample data set. The experimental results demonstrate the superior performance of the FOMOCE method compared to the existing clustering ensemble methods and multi-source clustering methods.

Keywords: clustering ensemble, multi-source, multi-objective, fuzzy clustering

Procedia PDF Downloads 167
24504 Some Results on the Generalized Higher Rank Numerical Ranges

Authors: Mohsen Zahraei

Abstract:

‎In this paper, ‎the notion of ‎rank-k numerical range of rectangular complex matrix polynomials‎ ‎are introduced. ‎Some algebraic and geometrical properties are investigated. ‎Moreover, ‎for ε>0 the notion of Birkhoff-James approximate orthogonality sets for ε-higher ‎rank numerical ranges of rectangular matrix polynomials is also introduced and studied. ‎The proposed definitions yield a natural generalization of the standard higher rank numerical ranges.

Keywords: ‎‎Rank-k numerical range‎, ‎isometry‎, ‎numerical range‎, ‎rectangular matrix polynomials

Procedia PDF Downloads 442
24503 Modeling Activity Pattern Using XGBoost for Mining Smart Card Data

Authors: Eui-Jin Kim, Hasik Lee, Su-Jin Park, Dong-Kyu Kim

Abstract:

Smart-card data are expected to provide information on activity pattern as an alternative to conventional person trip surveys. The focus of this study is to propose a method for training the person trip surveys to supplement the smart-card data that does not contain the purpose of each trip. We selected only available features from smart card data such as spatiotemporal information on the trip and geographic information system (GIS) data near the stations to train the survey data. XGboost, which is state-of-the-art tree-based ensemble classifier, was used to train data from multiple sources. This classifier uses a more regularized model formalization to control the over-fitting and show very fast execution time with well-performance. The validation results showed that proposed method efficiently estimated the trip purpose. GIS data of station and duration of stay at the destination were significant features in modeling trip purpose.

Keywords: activity pattern, data fusion, smart-card, XGboost

Procedia PDF Downloads 227
24502 Modelling Agricultural Commodity Price Volatility with Markov-Switching Regression, Single Regime GARCH and Markov-Switching GARCH Models: Empirical Evidence from South Africa

Authors: Yegnanew A. Shiferaw

Abstract:

Background: commodity price volatility originating from excessive commodity price fluctuation has been a global problem especially after the recent financial crises. Volatility is a measure of risk or uncertainty in financial analysis. It plays a vital role in risk management, portfolio management, and pricing equity. Objectives: the core objective of this paper is to examine the relationship between the prices of agricultural commodities with oil price, gas price, coal price and exchange rate (USD/Rand). In addition, the paper tries to fit an appropriate model that best describes the log return price volatility and estimate Value-at-Risk and expected shortfall. Data and methods: the data used in this study are the daily returns of agricultural commodity prices from 02 January 2007 to 31st October 2016. The data sets consists of the daily returns of agricultural commodity prices namely: white maize, yellow maize, wheat, sunflower, soya, corn, and sorghum. The paper applies the three-state Markov-switching (MS) regression, the standard single-regime GARCH and the two regime Markov-switching GARCH (MS-GARCH) models. Results: to choose the best fit model, the log-likelihood function, Akaike information criterion (AIC), Bayesian information criterion (BIC) and deviance information criterion (DIC) are employed under three distributions for innovations. The results indicate that: (i) the price of agricultural commodities was found to be significantly associated with the price of coal, price of natural gas, price of oil and exchange rate, (ii) for all agricultural commodities except sunflower, k=3 had higher log-likelihood values and lower AIC and BIC values. Thus, the three-state MS regression model outperformed the two-state MS regression model (iii) MS-GARCH(1,1) with generalized error distribution (ged) innovation performs best for white maize and yellow maize; MS-GARCH(1,1) with student-t distribution (std) innovation performs better for sorghum; MS-gjrGARCH(1,1) with ged innovation performs better for wheat, sunflower and soya and MS-GARCH(1,1) with std innovation performs better for corn. In conclusion, this paper provided a practical guide for modelling agricultural commodity prices by MS regression and MS-GARCH processes. This paper can be good as a reference when facing modelling agricultural commodity price problems.

Keywords: commodity prices, MS-GARCH model, MS regression model, South Africa, volatility

Procedia PDF Downloads 190