Search results for: outliers
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 79

Search results for: outliers

19 Communication of Expected Survival Time to Cancer Patients: How It Is Done and How It Should Be Done

Authors: Geir Kirkebøen

Abstract:

Most patients with serious diagnoses want to know their prognosis, in particular their expected survival time. As part of the informed consent process, physicians are legally obligated to communicate such information to patients. However, there is no established (evidence based) ‘best practice’ for how to do this. The two questions explored in this study are: How do physicians communicate expected survival time to patients, and how should it be done? We explored the first, descriptive question in a study with Norwegian oncologists as participants. The study had a scenario and a survey part. In the scenario part, the doctors should imagine that a patient, recently diagnosed with a serious cancer diagnosis, has asked them: ‘How long can I expect to live with such a diagnosis? I want an honest answer from you!’ The doctors should assume that the diagnosis is certain, and that from an extensive recent study they had optimal statistical knowledge, described in detail as a right-skewed survival curve, about how long such patients with this kind of diagnosis could be expected to live. The main finding was that very few of the oncologists would explain to the patient the variation in survival time as described by the survival curve. The majority would not give the patient an answer at all. Of those who gave an answer, the typical answer was that survival time varies a lot, that it is hard to say in a specific case, that we will come back to it later etc. The survey part of the study clearly indicates that the main reason why the oncologists would not deliver the mortality prognosis was discomfort with its uncertainty. The scenario part of the study confirmed this finding. The majority of the oncologists explicitly used the uncertainty, the variation in survival time, as a reason to not give the patient an answer. Many studies show that patients want realistic information about their mortality prognosis, and that they should be given hope. The question then is how to communicate the uncertainty of the prognosis in a realistic and optimistic – hopeful – way. Based on psychological research, our hypothesis is that the best way to do this is by explicitly describing the variation in survival time, the (usually) right skewed survival curve of the prognosis, and emphasize to the patient the (small) possibility of being a ‘lucky outlier’. We tested this hypothesis in two scenario studies with lay people as participants. The data clearly show that people prefer to receive expected survival time as a median value together with explicit information about the survival curve’s right skewedness (e.g., concrete examples of ‘positive outliers’), and that communicating expected survival time this way not only provides people with hope, but also gives them a more realistic understanding compared with the typical way expected survival time is communicated. Our data indicate that it is not the existence of the uncertainty regarding the mortality prognosis that is the problem for patients, but how this uncertainty is, or is not, communicated and explained.

Keywords: cancer patients, decision psychology, doctor-patient communication, mortality prognosis

Procedia PDF Downloads 293
18 Index t-SNE: Tracking Dynamics of High-Dimensional Datasets with Coherent Embeddings

Authors: Gaelle Candel, David Naccache

Abstract:

t-SNE is an embedding method that the data science community has widely used. It helps two main tasks: to display results by coloring items according to the item class or feature value; and for forensic, giving a first overview of the dataset distribution. Two interesting characteristics of t-SNE are the structure preservation property and the answer to the crowding problem, where all neighbors in high dimensional space cannot be represented correctly in low dimensional space. t-SNE preserves the local neighborhood, and similar items are nicely spaced by adjusting to the local density. These two characteristics produce a meaningful representation, where the cluster area is proportional to its size in number, and relationships between clusters are materialized by closeness on the embedding. This algorithm is non-parametric. The transformation from a high to low dimensional space is described but not learned. Two initializations of the algorithm would lead to two different embeddings. In a forensic approach, analysts would like to compare two or more datasets using their embedding. A naive approach would be to embed all datasets together. However, this process is costly as the complexity of t-SNE is quadratic and would be infeasible for too many datasets. Another approach would be to learn a parametric model over an embedding built with a subset of data. While this approach is highly scalable, points could be mapped at the same exact position, making them indistinguishable. This type of model would be unable to adapt to new outliers nor concept drift. This paper presents a methodology to reuse an embedding to create a new one, where cluster positions are preserved. The optimization process minimizes two costs, one relative to the embedding shape and the second relative to the support embedding’ match. The embedding with the support process can be repeated more than once, with the newly obtained embedding. The successive embedding can be used to study the impact of one variable over the dataset distribution or monitor changes over time. This method has the same complexity as t-SNE per embedding, and memory requirements are only doubled. For a dataset of n elements sorted and split into k subsets, the total embedding complexity would be reduced from O(n²) to O(n²=k), and the memory requirement from n² to 2(n=k)², which enables computation on recent laptops. The method showed promising results on a real-world dataset, allowing to observe the birth, evolution, and death of clusters. The proposed approach facilitates identifying significant trends and changes, which empowers the monitoring high dimensional datasets’ dynamics.

Keywords: concept drift, data visualization, dimension reduction, embedding, monitoring, reusability, t-SNE, unsupervised learning

Procedia PDF Downloads 110
17 Parallel Fuzzy Rough Support Vector Machine for Data Classification in Cloud Environment

Authors: Arindam Chaudhuri

Abstract:

Classification of data has been actively used for most effective and efficient means of conveying knowledge and information to users. The prima face has always been upon techniques for extracting useful knowledge from data such that returns are maximized. With emergence of huge datasets the existing classification techniques often fail to produce desirable results. The challenge lies in analyzing and understanding characteristics of massive data sets by retrieving useful geometric and statistical patterns. We propose a supervised parallel fuzzy rough support vector machine (PFRSVM) for data classification in cloud environment. The classification is performed by PFRSVM using hyperbolic tangent kernel. The fuzzy rough set model takes care of sensitiveness of noisy samples and handles impreciseness in training samples bringing robustness to results. The membership function is function of center and radius of each class in feature space and is represented with kernel. It plays an important role towards sampling the decision surface. The success of PFRSVM is governed by choosing appropriate parameter values. The training samples are either linear or nonlinear separable. The different input points make unique contributions to decision surface. The algorithm is parallelized with a view to reduce training times. The system is built on support vector machine library using Hadoop implementation of MapReduce. The algorithm is tested on large data sets to check its feasibility and convergence. The performance of classifier is also assessed in terms of number of support vectors. The challenges encountered towards implementing big data classification in machine learning frameworks are also discussed. The experiments are done on the cloud environment available at University of Technology and Management, India. The results are illustrated for Gaussian RBF and Bayesian kernels. The effect of variability in prediction and generalization of PFRSVM is examined with respect to values of parameter C. It effectively resolves outliers’ effects, imbalance and overlapping class problems, normalizes to unseen data and relaxes dependency between features and labels. The average classification accuracy for PFRSVM is better than other classifiers for both Gaussian RBF and Bayesian kernels. The experimental results on both synthetic and real data sets clearly demonstrate the superiority of the proposed technique.

Keywords: FRSVM, Hadoop, MapReduce, PFRSVM

Procedia PDF Downloads 454
16 Applying GIS Geographic Weighted Regression Analysis to Assess Local Factors Impeding Smallholder Farmers from Participating in Agribusiness Markets: A Case Study of Vihiga County, Western Kenya

Authors: Mwehe Mathenge, Ben G. J. S. Sonneveld, Jacqueline E. W. Broerse

Abstract:

Smallholder farmers are important drivers of agriculture productivity, food security, and poverty reduction in Sub-Saharan Africa. However, they are faced with myriad challenges in their efforts at participating in agribusiness markets. How the geographic explicit factors existing at the local level interact to impede smallholder farmers' decision to participates (or not) in agribusiness markets is not well understood. Deconstructing the spatial complexity of the local environment could provide a deeper insight into how geographically explicit determinants promote or impede resource-poor smallholder farmers from participating in agribusiness. This paper’s objective was to identify, map, and analyze local spatial autocorrelation in factors that impede poor smallholders from participating in agribusiness markets. Data were collected using geocoded researcher-administered survey questionnaires from 392 households in Western Kenya. Three spatial statistics methods in geographic information system (GIS) were used to analyze data -Global Moran’s I, Cluster and Outliers Analysis (Anselin Local Moran’s I), and geographically weighted regression. The results of Global Moran’s I reveal the presence of spatial patterns in the dataset that was not caused by spatial randomness of data. Subsequently, Anselin Local Moran’s I result identified spatially and statistically significant local spatial clustering (hot spots and cold spots) in factors hindering smallholder participation. Finally, the geographically weighted regression results unearthed those specific geographic explicit factors impeding market participation in the study area. The results confirm that geographically explicit factors are indispensable in influencing the smallholder farming decisions, and policymakers should take cognizance of them. Additionally, this research demonstrated how geospatial explicit analysis conducted at the local level, using geographically disaggregated data, could help in identifying households and localities where the most impoverished and resource-poor smallholder households reside. In designing spatially targeted interventions, policymakers could benefit from geospatial analysis methods in understanding complex geographic factors and processes that interact to influence smallholder farmers' decision-making processes and choices.

Keywords: agribusiness markets, GIS, smallholder farmers, spatial statistics, disaggregated spatial data

Procedia PDF Downloads 106
15 Integrating Computer-Aided Manufacturing and Computer-Aided Design for Streamlined Carpentry Production in Ghana

Authors: Benson Tette, Thomas Mensah

Abstract:

As a developing country, Ghana has a high potential to harness the economic value of every industry. Two of the industries that produce below capacity are handicrafts (for instance, carpentry) and information technology (i.e., computer science). To boost production and maintain competitiveness, the carpentry sector in Ghana needs more effective manufacturing procedures that are also more affordable. This issue can be resolved using computer-aided manufacturing (CAM) technology, which automates the fabrication process and decreases the amount of time and labor needed to make wood goods. Yet, the integration of CAM in carpentry-related production is rarely explored. To streamline the manufacturing process, this research investigates the equipment and technology that are currently used in the Ghanaian carpentry sector for automated fabrication. The research looks at the various CAM technologies, such as Computer Numerical Control routers, laser cutters, and plasma cutters, that are accessible to Ghanaian carpenters yet unexplored. We also investigate their potential to enhance the production process. To achieve the objective, 150 carpenters, 15 software engineers, and 10 policymakers were interviewed using structured questionnaires. The responses provided by the 175 respondents were processed to eliminate outliers and omissions were corrected using multiple imputations techniques. The processed responses were analyzed through thematic analysis. The findings showed that adaptation and integration of CAD software with CAM technologies would speed up the design-to-manufacturing process for carpenters. It must be noted that achieving such results entails first; examining the capabilities of current CAD software, then determining what new functions and resources are required to improve the software's suitability for carpentry tasks. Responses from both carpenters and computer scientists showed that it is highly practical and achievable to streamline the design-to-manufacturing process through processes such as modifying and combining CAD software with CAM technology. Making the carpentry-software integration program more useful for carpentry projects would necessitate investigating the capabilities of the current CAD software and identifying additional features in the Ghanaian ecosystem and tools that are required. In conclusion, the Ghanaian carpentry sector has a chance to increase productivity and competitiveness through the integration of CAM technology with CAD software. Carpentry companies may lower labor costs and boost production capacity by automating the fabrication process, giving them a competitive advantage. This study offers implementation-ready and representative recommendations for successful implementation as well as important insights into the equipment and technologies available for automated fabrication in the Ghanaian carpentry sector.

Keywords: carpentry, computer-aided manufacturing (CAM), Ghana, information technology(IT)

Procedia PDF Downloads 58
14 AI-Enabled Smart Contracts for Reliable Traceability in the Industry 4.0

Authors: Harris Niavis, Dimitra Politaki

Abstract:

The manufacturing industry was collecting vast amounts of data for monitoring product quality thanks to the advances in the ICT sector and dedicated IoT infrastructure is deployed to track and trace the production line. However, industries have not yet managed to unleash the full potential of these data due to defective data collection methods and untrusted data storage and sharing. Blockchain is gaining increasing ground as a key technology enabler for Industry 4.0 and the smart manufacturing domain, as it enables the secure storage and exchange of data between stakeholders. On the other hand, AI techniques are more and more used to detect anomalies in batch and time-series data that enable the identification of unusual behaviors. The proposed scheme is based on smart contracts to enable automation and transparency in the data exchange, coupled with anomaly detection algorithms to enable reliable data ingestion in the system. Before sensor measurements are fed to the blockchain component and the smart contracts, the anomaly detection mechanism uniquely combines artificial intelligence models to effectively detect unusual values such as outliers and extreme deviations in data coming from them. Specifically, Autoregressive integrated moving average, Long short-term memory (LSTM) and Dense-based autoencoders, as well as Generative adversarial networks (GAN) models, are used to detect both point and collective anomalies. Towards the goal of preserving the privacy of industries' information, the smart contracts employ techniques to ensure that only anonymized pointers to the actual data are stored on the ledger while sensitive information remains off-chain. In the same spirit, blockchain technology guarantees the security of the data storage through strong cryptography as well as the integrity of the data through the decentralization of the network and the execution of the smart contracts by the majority of the blockchain network actors. The blockchain component of the Data Traceability Software is based on the Hyperledger Fabric framework, which lays the ground for the deployment of smart contracts and APIs to expose the functionality to the end-users. The results of this work demonstrate that such a system can increase the quality of the end-products and the trustworthiness of the monitoring process in the smart manufacturing domain. The proposed AI-enabled data traceability software can be employed by industries to accurately trace and verify records about quality through the entire production chain and take advantage of the multitude of monitoring records in their databases.

Keywords: blockchain, data quality, industry4.0, product quality

Procedia PDF Downloads 145
13 Data Refinement Enhances The Accuracy of Short-Term Traffic Latency Prediction

Authors: Man Fung Ho, Lap So, Jiaqi Zhang, Yuheng Zhao, Huiyang Lu, Tat Shing Choi, K. Y. Michael Wong

Abstract:

Nowadays, a tremendous amount of data is available in the transportation system, enabling the development of various machine learning approaches to make short-term latency predictions. A natural question is then the choice of relevant information to enable accurate predictions. Using traffic data collected from the Taiwan Freeway System, we consider the prediction of short-term latency of a freeway segment with a length of 17 km covering 5 measurement points, each collecting vehicle-by-vehicle data through the electronic toll collection system. The processed data include the past latencies of the freeway segment with different time lags, the traffic conditions of the individual segments (the accumulations, the traffic fluxes, the entrance and exit rates), the total accumulations, and the weekday latency profiles obtained by Gaussian process regression of past data. We arrive at several important conclusions about how data should be refined to obtain accurate predictions, which have implications for future system-wide latency predictions. (1) We find that the prediction of median latency is much more accurate and meaningful than the prediction of average latency, as the latter is plagued by outliers. This is verified by machine-learning prediction using XGBoost that yields a 35% improvement in the mean square error of the 5-minute averaged latencies. (2) We find that the median latency of the segment 15 minutes ago is a very good baseline for performance comparison, and we have evidence that further improvement is achieved by machine learning approaches such as XGBoost and Long Short-Term Memory (LSTM). (3) By analyzing the feature importance score in XGBoost and calculating the mutual information between the inputs and the latencies to be predicted, we identify a sequence of inputs ranked in importance. It confirms that the past latencies are most informative of the predicted latencies, followed by the total accumulation, whereas inputs such as the entrance and exit rates are uninformative. It also confirms that the inputs are much less informative of the average latencies than the median latencies. (4) For predicting the latencies of segments composed of two or three sub-segments, summing up the predicted latencies of each sub-segment is more accurate than the one-step prediction of the whole segment, especially with the latency prediction of the downstream sub-segments trained to anticipate latencies several minutes ahead. The duration of the anticipation time is an increasing function of the traveling time of the upstream segment. The above findings have important implications to predicting the full set of latencies among the various locations in the freeway system.

Keywords: data refinement, machine learning, mutual information, short-term latency prediction

Procedia PDF Downloads 140
12 Quantification of Lawsone and Adulterants in Commercial Henna Products

Authors: Ruchi B. Semwal, Deepak K. Semwal, Thobile A. N. Nkosi, Alvaro M. Viljoen

Abstract:

The use of Lawsonia inermis L. (Lythraeae), commonly known as henna, has many medicinal benefits and is used as a remedy for the treatment of diarrhoea, cancer, inflammation, headache, jaundice and skin diseases in folk medicine. Although widely used for hair dyeing and temporary tattooing, henna body art has popularized over the last 15 years and changed from being a traditional bridal and festival adornment to an exotic fashion accessory. The naphthoquinone, lawsone, is one of the main constituents of the plant and responsible for its dyeing property. Henna leaves typically contain 1.8–1.9% lawsone, which is used as a marker compound for the quality control of henna products. Adulteration of henna with various toxic chemicals such as p-phenylenediamine, p-methylaminophenol, p-aminobenzene and p-toluenodiamine to produce a variety of colours, is very common and has resulted in serious health problems, including allergic reactions. This study aims to assess the quality of henna products collected from different parts of the world by determining the lawsone content, as well as the concentrations of any adulterants present. Ultra high performance liquid chromatography-mass spectrometry (UPLC-MS) was used to determine the lawsone concentrations in 172 henna products. Separation of the chemical constituents was achieved on an Acquity UPLC BEH C18 column using gradient elution (0.1% formic acid and acetonitrile). The results from UPLC-MS revealed that of 172 henna products, 11 contained 1.0-1.8% lawsone, 110 contained 0.1-0.9% lawsone, whereas 51 samples did not contain detectable levels of lawsone. High performance thin layer chromatography was investigated as a cheaper, more rapid technique for the quality control of henna in relation to the lawsone content. The samples were applied using an automatic TLC Sampler 4 (CAMAG) to pre-coated silica plates, which were subsequently developed with acetic acid, acetone and toluene (0.5: 1.0: 8.5 v/v). A Reprostar 3 digital system allowed the images to be captured. The results obtained corresponded to those from UPLC-MS analysis. Vibrational spectroscopy analysis (MIR or NIR) of the powdered henna, followed by chemometric modelling of the data, indicates that this technique shows promise as an alternative quality control method. Principal component analysis (PCA) was used to investigate the data by observing clustering and identifying outliers. Partial least squares (PLS) multivariate calibration models were constructed for the quantification of lawsone. In conclusion, only a few of the samples analysed contain lawsone in high concentrations, indicating that they are of poor quality. Currently, the presence of adulterants that may have been added to enhance the dyeing properties of the products, is being investigated.

Keywords: Lawsonia inermis, paraphenylenediamine, temporary tattooing, lawsone

Procedia PDF Downloads 427
11 The Influence of Operational Changes on Efficiency and Sustainability of Manufacturing Firms

Authors: Dimitrios Kafetzopoulos

Abstract:

Nowadays, companies are more concerned with adopting their own strategies for increased efficiency and sustainability. Dynamic environments are fertile fields for developing operational changes. For this purpose, organizations need to implement an advanced management philosophy that boosts changes to companies’ operation. Changes refer to new applications of knowledge, ideas, methods, and skills that can generate unique capabilities and leverage an organization’s competitiveness. So, in order to survive and compete in the global and niche markets, companies should incorporate the adoption of operational changes into their strategy with regard to their products and their processes. Creating the appropriate culture for changes in terms of products and processes helps companies to gain a sustainable competitive advantage in the market. Thus, the purpose of this study is to investigate the role of both incremental and radical changes into operations of a company, taking into consideration not only product changes but also process changes, and continues by measuring the impact of these two types of changes on business efficiency and sustainability of Greek manufacturing companies. The above discussion leads to the following hypotheses: H1: Radical operational changes have a positive impact on firm efficiency. H2: Incremental operational changes have a positive impact on firm efficiency. H3: Radical operational changes have a positive impact on firm sustainability. H4: Incremental operational changes have a positive impact on firm sustainability. In order to achieve the objectives of the present study, a research study was carried out in Greek manufacturing firms. A total of 380 valid questionnaires were received while a seven-point Likert scale was used to measure all the questionnaire items of the constructs (radical changes, incremental changes, efficiency and sustainability). The constructs of radical and incremental operational changes, each one as one variable, has been subdivided into product and process changes. Non-response bias, common method variance, multicollinearity, multivariate normal distribution and outliers have been checked. Moreover, the unidimensionality, reliability and validity of the latent factors were assessed. Exploratory Factor Analysis and Confirmatory Factor Analysis were applied to check the factorial structure of the constructs and the factor loadings of the items. In order to test the research hypotheses, the SEM technique was applied (maximum likelihood method). The goodness of fit of the basic structural model indicates an acceptable fit of the proposed model. According to the present study findings, radical operational changes and incremental operational changes significantly influence both efficiency and sustainability of Greek manufacturing firms. However, it is in the dimension of radical operational changes, meaning those in process and product, that the most significant contributors to firm efficiency are to be found, while its influence on sustainability is low albeit statistically significant. On the contrary, incremental operational changes influence sustainability more than firms’ efficiency. From the above, it is apparent that the embodiment of the concept of the changes into the products and processes operational practices of a firm has direct and positive consequences for what it achieves from efficiency and sustainability perspective.

Keywords: incremental operational changes, radical operational changes, efficiency, sustainability

Procedia PDF Downloads 98
10 Changes in Geospatial Structure of Households in the Czech Republic: Findings from Population and Housing Census

Authors: Jaroslav Kraus

Abstract:

Spatial information about demographic processes are a standard part of outputs in the Czech Republic. That was also the case of Population and Housing Census which was held on 2011. This is a starting point for a follow up study devoted to two basic types of households: single person households and households of one completed family. Single person households and one family households create more than 80 percent of all households, but the share and spatial structure is in long-term changing. The increase of single households is results of long-term fertility decrease and divorce increase, but also possibility of separate living. There are regions in the Czech Republic with traditional demographic behavior, and regions like capital Prague and some others with changing pattern. Population census is based - according to international standards - on the concept of currently living population. Three types of geospatial approaches will be used for analysis: (i) firstly measures of geographic distribution, (ii) secondly mapping clusters to identify the locations of statistically significant hot spots, cold spots, spatial outliers, and similar features and (iii) finally analyzing pattern approach as a starting point for more in-depth analyses (geospatial regression) in the future will be also applied. For analysis of this type of data, number of households by types should be distinct objects. All events in a meaningful delimited study region (e.g. municipalities) will be included in an analysis. Commonly produced measures of central tendency and spread will include: identification of the location of the center of the point set (by NUTS3 level); identification of the median center and standard distance, weighted standard distance and standard deviational ellipses will be also used. Identifying that clustering exists in census households datasets does not provide a detailed picture of the nature and pattern of clustering but will be helpful to apply simple hot-spot (and cold spot) identification techniques to such datasets. Once the spatial structure of households will be determined, any particular measure of autocorrelation can be constructed by defining a way of measuring the difference between location attribute values. The most widely used measure is Moran’s I that will be applied to municipal units where numerical ratio is calculated. Local statistics arise naturally out of any of the methods for measuring spatial autocorrelation and will be applied to development of localized variants of almost any standard summary statistic. Local Moran’s I will give an indication of household data homogeneity and diversity on a municipal level.

Keywords: census, geo-demography, households, the Czech Republic

Procedia PDF Downloads 77
9 Novel Framework for MIMO-Enhanced Robust Selection of Critical Control Factors in Auto Plastic Injection Moulding Quality Optimization

Authors: Seyed Esmail Seyedi Bariran, Khairul Salleh Mohamed Sahari

Abstract:

Apparent quality defects such as warpage, shrinkage, weld line, etc. are such an irresistible phenomenon in mass production of auto plastic appearance parts. These frequently occurred manufacturing defects should be satisfied concurrently so as to achieve a final product with acceptable quality standards. Determining the significant control factors that simultaneously affect multiple quality characteristics can significantly improve the optimization results by eliminating the deviating effect of the so-called ineffective outliers. Hence, a robust quantitative approach needs to be developed upon which major control factors and their level can be effectively determined to help improve the reliability of the optimal processing parameter design. Hence, the primary objective of current study was to develop a systematic methodology for selection of significant control factors (SCF) relevant to multiple quality optimization of auto plastic appearance part. Auto bumper was used as a specimen with the most identical quality and production characteristics to APAP group. A preliminary failure modes and effect analysis (FMEA) was conducted to nominate a database of pseudo significant significant control factors prior to the optimization phase. Later, CAE simulation Moldflow analysis was implemented to manipulate four rampant plastic injection quality defects concerned with APAP group including warpage deflection, volumetric shrinkage, sink mark and weld line. Furthermore, a step-backward elimination searching method (SESME) has been developed for systematic pre-optimization selection of SCF based on hierarchical orthogonal array design and priority-based one-way analysis of variance (ANOVA). The development of robust parameter design in the second phase was based on DOE module powered by Minitab v.16 statistical software. Based on the F-test (F 0.05, 2, 14) one-way ANOVA results, it was concluded that for warpage deflection, material mixture percentage was the most significant control factor yielding a 58.34% of contribution while for the other three quality defects, melt temperature was the most significant control factor with a 25.32%, 84.25%, and 34.57% contribution for sin mark, shrinkage and weld line strength control. Also, the results on the he least significant control factors meaningfully revealed injection fill time as the least significant factor for both warpage and sink mark with respective 1.69% and 6.12% contribution. On the other hand, for shrinkage and weld line defects, the least significant control factors were holding pressure and mold temperature with a 0.23% and 4.05% overall contribution accordingly.

Keywords: plastic injection moulding, quality optimization, FMEA, ANOVA, SESME, APAP

Procedia PDF Downloads 317
8 The Data Quality Model for the IoT based Real-time Water Quality Monitoring Sensors

Authors: Rabbia Idrees, Ananda Maiti, Saurabh Garg, Muhammad Bilal Amin

Abstract:

IoT devices are the basic building blocks of IoT network that generate enormous volume of real-time and high-speed data to help organizations and companies to take intelligent decisions. To integrate this enormous data from multisource and transfer it to the appropriate client is the fundamental of IoT development. The handling of this huge quantity of devices along with the huge volume of data is very challenging. The IoT devices are battery-powered and resource-constrained and to provide energy efficient communication, these IoT devices go sleep or online/wakeup periodically and a-periodically depending on the traffic loads to reduce energy consumption. Sometime these devices get disconnected due to device battery depletion. If the node is not available in the network, then the IoT network provides incomplete, missing, and inaccurate data. Moreover, many IoT applications, like vehicle tracking and patient tracking require the IoT devices to be mobile. Due to this mobility, If the distance of the device from the sink node become greater than required, the connection is lost. Due to this disconnection other devices join the network for replacing the broken-down and left devices. This make IoT devices dynamic in nature which brings uncertainty and unreliability in the IoT network and hence produce bad quality of data. Due to this dynamic nature of IoT devices we do not know the actual reason of abnormal data. If data are of poor-quality decisions are likely to be unsound. It is highly important to process data and estimate data quality before bringing it to use in IoT applications. In the past many researchers tried to estimate data quality and provided several Machine Learning (ML), stochastic and statistical methods to perform analysis on stored data in the data processing layer, without focusing the challenges and issues arises from the dynamic nature of IoT devices and how it is impacting data quality. A comprehensive review on determining the impact of dynamic nature of IoT devices on data quality is done in this research and presented a data quality model that can deal with this challenge and produce good quality of data. This research presents the data quality model for the sensors monitoring water quality. DBSCAN clustering and weather sensors are used in this research to make data quality model for the sensors monitoring water quality. An extensive study has been done in this research on finding the relationship between the data of weather sensors and sensors monitoring water quality of the lakes and beaches. The detailed theoretical analysis has been presented in this research mentioning correlation between independent data streams of the two sets of sensors. With the help of the analysis and DBSCAN, a data quality model is prepared. This model encompasses five dimensions of data quality: outliers’ detection and removal, completeness, patterns of missing values and checks the accuracy of the data with the help of cluster’s position. At the end, the statistical analysis has been done on the clusters formed as the result of DBSCAN, and consistency is evaluated through Coefficient of Variation (CoV).

Keywords: clustering, data quality, DBSCAN, and Internet of things (IoT)

Procedia PDF Downloads 102
7 A Generalized Framework for Adaptive Machine Learning Deployments in Algorithmic Trading

Authors: Robert Caulk

Abstract:

A generalized framework for adaptive machine learning deployments in algorithmic trading is introduced, tested, and released as open-source code. The presented software aims to test the hypothesis that recent data contains enough information to form a probabilistically favorable short-term price prediction. Further, the framework contains various adaptive machine learning techniques that are geared toward generating profit during strong trends and minimizing losses during trend changes. Results demonstrate that this adaptive machine learning approach is capable of capturing trends and generating profit. The presentation also discusses the importance of defining the parameter space associated with the dynamic training data-set and using the parameter space to identify and remove outliers from prediction data points. Meanwhile, the generalized architecture enables common users to exploit the powerful machinery while focusing on high-level feature engineering and model testing. The presentation also highlights common strengths and weaknesses associated with the presented technique and presents a broad range of well-tested starting points for feature set construction, target setting, and statistical methods for enforcing risk management and maintaining probabilistically favorable entry and exit points. The presentation also describes the end-to-end data processing tools associated with FreqAI, including automatic data fetching, data aggregation, feature engineering, safe and robust data pre-processing, outlier detection, custom machine learning and statistical tools, data post-processing, and adaptive training backtest emulation, and deployment of adaptive training in live environments. Finally, the generalized user interface is also discussed in the presentation. Feature engineering is simplified so that users can seed their feature sets with common indicator libraries (e.g. TA-lib, pandas-ta). The user also feeds data expansion parameters to fill out a large feature set for the model, which can contain as many as 10,000+ features. The presentation describes the various object-oriented programming techniques employed to make FreqAI agnostic to third-party libraries and external data sources. In other words, the back-end is constructed in such a way that users can leverage a broad range of common regression libraries (Catboost, LightGBM, Sklearn, etc) as well as common Neural Network libraries (TensorFlow, PyTorch) without worrying about the logistical complexities associated with data handling and API interactions. The presentation finishes by drawing conclusions about the most important parameters associated with a live deployment of the adaptive learning framework and provides the road map for future development in FreqAI.

Keywords: machine learning, market trend detection, open-source, adaptive learning, parameter space exploration

Procedia PDF Downloads 59
6 Transforming Gender Norms through Play: Qualitative Findings from Primary Schools in Rwanda, Ghana, and Mozambique

Authors: Geetanjali Gill

Abstract:

International non-governmental organizations (INGOs) and development assistance donors have been implementing education projects in Sub-Saharan Africa and the global South that respond to gender-based inequities and that attempt to transform socio-cultural norms for greater gender equality in schools and communities. These efforts are in line with the United Nations Sustainable Development Goal number four, quality education, and goal number five, gender equality. Some INGOs and donors have also championed the use of play-based pedagogies for improved and more gender equal education outcomes. The study used the qualitative methods of life history interviews and focus groups to gauge social norm change amongst male and female adolescents, families, and teachers in primary schools that have been using gender-responsive play-based pedagogies in Rwanda, Ghana, and Mozambique. School administrators and project managers from the INGO Right to Play International were consulted in the selection of two primary schools per country (in both rural and urban contexts), as well as the selection of ten male and ten female students in grades four to six in each school, using specific parameters of social norm adherence. The parents (or guardians) and grandparents of four male and four female students in each school who were determined to be ‘outliers’ in their adherence to social norms were also interviewed. Additionally, sex-specific focus groups were held with thirty-six teachers. The study found that gender-responsive play-based pedagogies positively impactedsocio-cultural norms that shape gender relations amongst adolescents, their families, and teachers. Female and male students who spoke about their beliefs about gender equality in the roles and educational and career aspirations of men/boys and women/girls made linkages to the play-based pedagogies and approaches used by their teachers. Also, the parents and grandparents of these students were aware of generational differences in gender norms, and many were accepting of changed gender norms. Teachers who were effectively implementing gender-responsive play-based pedagogies in their classrooms spoke about changes to their own gender norms and how they were able to influence the gender norms of parents and community members. Life history interviews were found to be well-suited for the examination of changes to socio-cultural norms and gender relations. However, an appropriate framing of questions and topics specific to each target group was instrumental for the collection of data on socio-cultural norms and gender. The findings of this study can spur further academic inquiry of linkages between gender norms and education outcomes. The findings are also relevant for the work of INGOs and donors in the global South and for the development of gender-responsive education policies and programs.

Keywords: education, gender equality, ghana, international development, life histories, mozambique, rwanda, socio-cultural norms, sub-saharan africa, qualitative research

Procedia PDF Downloads 151
5 Assessing Spatial Associations of Mortality Patterns in Municipalities of the Czech Republic

Authors: Jitka Rychtarikova

Abstract:

Regional differences in mortality in the Czech Republic (CR) may be moderate from a broader European perspective, but important discrepancies in life expectancy can be found between smaller territorial units. In this study territorial units are based on Administrative Districts of Municipalities with Extended Powers (MEP). This definition came into force January 1, 2003. There are 205 units and the city of Prague. MEP represents the smallest unit for which mortality patterns based on life tables can be investigated and the Czech Statistical Office has been calculating such life tables (every five-years) since 2004. MEP life tables from 2009-2013 for males and females allowed the investigation of three main life cycles with the use of temporary life expectancies between the exact ages of 0 and 35; 35 and 65; and the life expectancy at exact age 65. The results showed regional survival inequalities primarily in adult and older ages. Consequently, only mortality indicators for adult and elderly population were related to census 2011 unlinked data for the same age groups. The most relevant socio-economic factors taken from the census are: having a partner, educational level and unemployment rate. The unemployment rate was measured for adults aged 35-64 completed years. Exploratory spatial data analysis methods were used to detect regional patterns in spatially contiguous units of MEP. The presence of spatial non-stationarity (spatial autocorrelation) of mortality levels for male and female adults (35-64), and elderly males and females (65+) was tested using global Moran’s I. Spatial autocorrelation of mortality patterns was mapped using local Moran’s I with the intention to depict clusters of low or high mortality and spatial outliers for two age groups (35-64 and 65+). The highest Moran’s I was observed for male temporary life expectancy between exact ages 35 and 65 (0.52) and the lowest was among women with life expectancy of 65 (0.26). Generally, men showed stronger spatial autocorrelation compared to women. The relationship between mortality indicators such as life expectancies and socio-economic factors like the percentage of males/females having a partner; percentage of males/females with at least higher secondary education; and percentage of unemployed males/females from economically active population aged 35-64 years, was evaluated using multiple regression (OLS). The results were then compared to outputs from geographically weighted regression (GWR). In the Czech Republic, there are two broader territories North-West Bohemia (NWB) and North Moravia (NM), in which excess mortality is well established. Results of the t-test of spatial regression showed that for males aged 30-64 the association between mortality and unemployment (when adjusted for education and partnership) was stronger in NM compared to NWB, while educational level impacted the length of survival more in NWB. Geographic variation and relationships in mortality of the CR MEP will also be tested using the spatial Durbin approach. The calculations were conducted by means of ArcGIS 10.6 and SAS 9.4.

Keywords: Czech Republic, mortality, municipality, socio-economic factors, spatial analysis

Procedia PDF Downloads 88
4 Empirical Study on Causes of Project Delays

Authors: Khan Farhan Rafat, Riaz Ahmed

Abstract:

Renowned offshore organizations are drifting towards collaborative exertion to win and implement international projects for business gains. However, devoid of financial constraints, with the availability of skilled professionals, and despite improved project management practices through state-of-the-art tools and techniques, project delays have become a norm these days. This situation calls for exploring the factor(s) affecting the bonding between project management performance and project success. In the context of the well-known 3M’s of project management (that is, manpower, machinery, and materials), machinery and materials are dependent upon manpower. Because the body of knowledge inveterate on the influence of national culture on men, hence, the realization of the impact on the link between project management performance and project success need to be investigated in detail to arrive at the possible cause(s) of project delays. This research initiative was, therefore, undertaken to fill the research gap. The unit of analysis for the proposed research excretion was the individuals who had worked on skyscraper construction projects. In reverent studies, project management is best described using construction examples. It is due to this reason that the project oriented city of Dubai was chosen to reconnoiter on causes of project delays. A structured questionnaire survey was disseminated online with the courtesy of the Project Management Institute local chapter to carry out the cross-sectional study. The Construction Industry Institute, Austin, of the United States of America along with 23 high-rise builders in Dubai were also contacted by email requesting for their contribution to the study and providing them with the online link to the survey questionnaire. The reliability of the instrument was warranted using Cronbach’s alpha coefficient of 0.70. The appropriateness of sampling adequacy and homogeneity in variance was ensured by keeping Kaiser–Meyer–Olkin (KMO) and Bartlett’s test of sphericity in the range ≥ 0.60 and < 0.05, respectively. Factor analysis was used to verify construct validity. During exploratory factor analysis, all items were loaded using a threshold of 0.4. Four hundred and seventeen respondents, including members from top management, project managers, and project staff, contributed to the study. The link between project management performance and project success was significant at 0.01 level (2-tailed), and 0.05 level (2-tailed) for Pearson’s correlation. Before initiating the moderator analysis test for linearity, multicollinearity, outliers, leverage points and influential cases, test for homoscedasticity and normality were carried out which are prerequisites for conducting moderator review. The moderator analysis, using a macro named PROCESS, was performed to verify the hypothesis that national culture has an influence on the said link. The empirical findings, when compared with Hofstede's results, showed high power distance as the cause of construction project delays in Dubai. The research outcome calls for the project sponsors and top management to reshape their project management strategy and allow for low power distance between management and project personnel for timely completion of projects.

Keywords: causes of construction project delays, construction industry, construction management, power distance

Procedia PDF Downloads 179
3 Early Diagnosis of Myocardial Ischemia Based on Support Vector Machine and Gaussian Mixture Model by Using Features of ECG Recordings

Authors: Merve Begum Terzi, Orhan Arikan, Adnan Abaci, Mustafa Candemir

Abstract:

Acute myocardial infarction is a major cause of death in the world. Therefore, its fast and reliable diagnosis is a major clinical need. ECG is the most important diagnostic methodology which is used to make decisions about the management of the cardiovascular diseases. In patients with acute myocardial ischemia, temporary chest pains together with changes in ST segment and T wave of ECG occur shortly before the start of myocardial infarction. In this study, a technique which detects changes in ST/T sections of ECG is developed for the early diagnosis of acute myocardial ischemia. For this purpose, a database of real ECG recordings that contains a set of records from 75 patients presenting symptoms of chest pain who underwent elective percutaneous coronary intervention (PCI) is constituted. 12-lead ECG’s of the patients were recorded before and during the PCI procedure. Two ECG epochs, which are the pre-inflation ECG which is acquired before any catheter insertion and the occlusion ECG which is acquired during balloon inflation, are analyzed for each patient. By using pre-inflation and occlusion recordings, ECG features that are critical in the detection of acute myocardial ischemia are identified and the most discriminative features for the detection of acute myocardial ischemia are extracted. A classification technique based on support vector machine (SVM) approach operating with linear and radial basis function (RBF) kernels to detect ischemic events by using ST-T derived joint features from non-ischemic and ischemic states of the patients is developed. The dataset is randomly divided into training and testing sets and the training set is used to optimize SVM hyperparameters by using grid-search method and 10fold cross-validation. SVMs are designed specifically for each patient by tuning the kernel parameters in order to obtain the optimal classification performance results. As a result of implementing the developed classification technique to real ECG recordings, it is shown that the proposed technique provides highly reliable detections of the anomalies in ECG signals. Furthermore, to develop a detection technique that can be used in the absence of ECG recording obtained during healthy stage, the detection of acute myocardial ischemia based on ECG recordings of the patients obtained during ischemia is also investigated. For this purpose, a Gaussian mixture model (GMM) is used to represent the joint pdf of the most discriminating ECG features of myocardial ischemia. Then, a Neyman-Pearson type of approach is developed to provide detection of outliers that would correspond to acute myocardial ischemia. Neyman – Pearson decision strategy is used by computing the average log likelihood values of ECG segments and comparing them with a range of different threshold values. For different discrimination threshold values and number of ECG segments, probability of detection and probability of false alarm values are computed, and the corresponding ROC curves are obtained. The results indicate that increasing number of ECG segments provide higher performance for GMM based classification. Moreover, the comparison between the performances of SVM and GMM based classification showed that SVM provides higher classification performance results over ECG recordings of considerable number of patients.

Keywords: ECG classification, Gaussian mixture model, Neyman–Pearson approach, support vector machine

Procedia PDF Downloads 116
2 Using Statistical Significance and Prediction to Test Long/Short Term Public Services and Patients' Cohorts: A Case Study in Scotland

Authors: Raptis Sotirios

Abstract:

Health and social care (HSc) services planning and scheduling are facing unprecedented challenges due to the pandemic pressure and also suffer from unplanned spending that is negatively impacted by the global financial crisis. Data-driven can help to improve policies, plan and design services provision schedules using algorithms assist healthcare managers’ to face unexpected demands using fewer resources. The paper discusses services packing using statistical significance tests and machine learning (ML) to evaluate demands similarity and coupling. This is achieved by predicting the range of the demand (class) using ML methods such as CART, random forests (RF), and logistic regression (LGR). The significance tests Chi-Squared test and Student test are used on data over a 39 years span for which HSc services data exist for services delivered in Scotland. The demands are probabilistically associated through statistical hypotheses that assume that the target service’s demands are statistically dependent on other demands as a NULL hypothesis. This linkage can be confirmed or not by the data. Complementarily, ML methods are used to linearly predict the above target demands from the statistically found associations and extend the linear dependence of the target’s demand to independent demands forming, thus groups of services. Statistical tests confirm ML couplings making the prediction also statistically meaningful and prove that a target service can be matched reliably to other services, and ML shows these indicated relationships can also be linear ones. Zero paddings were used for missing years records and illustrated better such relationships both for limited years and in the entire span offering long term data visualizations while limited years groups explained how well patients numbers can be related in short periods or can change over time as opposed to behaviors across more years. The prediction performance of the associations is measured using Receiver Operating Characteristic(ROC) AUC and ACC metrics as well as the statistical tests, Chi-Squared and Student. Co-plots and comparison tables for RF, CART, and LGR as well as p-values and Information Exchange(IE), are provided showing the specific behavior of the ML and of the statistical tests and the behavior using different learning ratios. The impact of k-NN and cross-correlation and C-Means first groupings is also studied over limited years and the entire span. It was found that CART was generally behind RF and LGR, but in some interesting cases, LGR reached an AUC=0 falling below CART, while the ACC was as high as 0.912, showing that ML methods can be confused padding or by data irregularities or outliers. On average, 3 linear predictors were sufficient, LGR was found competing RF well, and CART followed with the same performance at higher learning ratios. Services were packed only if when significance level(p-value) of their association coefficient was more than 0.05. Social factors relationships were observed between home care services and treatment of old people, birth weights, alcoholism, drug abuse, and emergency admissions. The work found that different HSc services can be well packed as plans of limited years, across various services sectors, learning configurations, as confirmed using statistical hypotheses.

Keywords: class, cohorts, data frames, grouping, prediction, prob-ability, services

Procedia PDF Downloads 198
1 Optical Imaging Based Detection of Solder Paste in Printed Circuit Board Jet-Printing Inspection

Authors: D. Heinemann, S. Schramm, S. Knabner, D. Baumgarten

Abstract:

Purpose: Applying solder paste to printed circuit boards (PCB) with stencils has been the method of choice over the past years. A new method uses a jet printer to deposit tiny droplets of solder paste through an ejector mechanism onto the board. This allows for more flexible PCB layouts with smaller components. Due to the viscosity of the solder paste, air blisters can be trapped in the cartridge. This can lead to missing solder joints or deviations in the applied solder volume. Therefore, a built-in and real-time inspection of the printing process is needed to minimize uncertainties and increase the efficiency of the process by immediate correction. The objective of the current study is the design of an optimal imaging system and the development of an automatic algorithm for the detection of applied solder joints from optical from the captured images. Methods: In a first approach, a camera module connected to a microcomputer and LED strips are employed to capture images of the printed circuit board under four different illuminations (white, red, green and blue). Subsequently, an improved system including a ring light, an objective lens, and a monochromatic camera was set up to acquire higher quality images. The obtained images can be divided into three main components: the PCB itself (i.e., the background), the reflections induced by unsoldered positions or screw holes and the solder joints. Non-uniform illumination is corrected by estimating the background using a morphological opening and subtraction from the input image. Image sharpening is applied in order to prevent error pixels in the subsequent segmentation. The intensity thresholds which divide the main components are obtained from the multimodal histogram using three probability density functions. Determining the intersections delivers proper thresholds for the segmentation. Remaining edge gradients produces small error areas which are removed by another morphological opening. For quantitative analysis of the segmentation results, the dice coefficient is used. Results: The obtained PCB images show a significant gradient in all RGB channels, resulting from ambient light. Using different lightings and color channels 12 images of a single PCB are available. A visual inspection and the investigation of 27 specific points show the best differentiation between those points using a red lighting and a green color channel. Estimating two thresholds from analyzing the multimodal histogram of the corrected images and using them for segmentation precisely extracts the solder joints. The comparison of the results to manually segmented images yield high sensitivity and specificity values. Analyzing the overall result delivers a Dice coefficient of 0.89 which varies for single object segmentations between 0.96 for a good segmented solder joints and 0.25 for single negative outliers. Conclusion: Our results demonstrate that the presented optical imaging system and the developed algorithm can robustly detect solder joints on printed circuit boards. Future work will comprise a modified lighting system which allows for more precise segmentation results using structure analysis.

Keywords: printed circuit board jet-printing, inspection, segmentation, solder paste detection

Procedia PDF Downloads 305