Search results for: R data science
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 26154

Search results for: R data science

22524 Characterization of the Groundwater Aquifers at El Sadat City by Joint Inversion of VES and TEM Data

Authors: Usama Massoud, Abeer A. Kenawy, El-Said A. Ragab, Abbas M. Abbas, Heba M. El-Kosery

Abstract:

Vertical Electrical Sounding (VES) and Transient Electro Magnetic (TEM) survey have been applied for characterizing the groundwater aquifers at El Sadat industrial area. El-Sadat city is one of the most important industrial cities in Egypt. It has been constructed more than three decades ago at about 80 km northwest of Cairo along the Cairo–Alexandria desert road. Groundwater is the main source of water supplies required for domestic, municipal, and industrial activities in this area due to the lack of surface water sources. So, it is important to maintain this vital resource in order to sustain the development plans of this city. In this study, VES and TEM data were identically measured at 24 stations along three profiles trending NE–SW with the elongation of the study area. The measuring points were arranged in a grid like pattern with both inter-station spacing and line–line distance of about 2 km. After performing the necessary processing steps, the VES and TEM data sets were inverted individually to multi-layer models, followed by a joint inversion of both data sets. Joint inversion process has succeeded to overcome the model-equivalence problem encountered in the inversion of individual data set. Then, the joint models were used for the construction of a number of cross sections and contour maps showing the lateral and vertical distribution of the geo-electrical parameters in the subsurface medium. Interpretation of the obtained results and correlation with the available geological and hydrogeological information revealed TWO aquifer systems in the area. The shallow Pleistocene aquifer consists of sand and gravel saturated with fresh water and exhibits large thickness exceeding 200 m. The deep Pliocene aquifer is composed of clay and sand and shows low resistivity values. The water bearing layer of the Pleistocene aquifer and the upper surface of Pliocene aquifer are continuous and no structural features have cut this continuity through the investigated area.

Keywords: El Sadat city, joint inversion, VES, TEM

Procedia PDF Downloads 357
22523 Positive Outcomes of Internship for Students Majoring in Mathematics

Authors: Irina Peterburgsky

Abstract:

We have been working on finding internship positions for our math and computer science majors. Among many other positive outcomes of internship for students majoring in mathematics, there are: students see new applications of mathematics to real life and see new scientific problems; they learn new methods, tools, etc. that they have not seen in their classes; they appreciate the power of mathematics that increases their interest in learning mathematics; they make decisions to take more advanced math courses; students understand better what their potentials, strong points, and limitations are; learn what work ethic is; learn how to work as a member of a team at a workplace; understand better how to offer their help and how to ask for help; start building their professional relationship; build self-confidence as young professionals, and what is the most important - they get a better understanding of their goals in their future professional careers.

Keywords: internship, mathematics, positive outcoms for students, workplace

Procedia PDF Downloads 166
22522 Developing and Testing a Questionnaire of Music Memorization and Practice

Authors: Diana Santiago, Tania Lisboa, Sophie Lee, Alexander P. Demos, Monica C. S. Vasconcelos

Abstract:

Memorization has long been recognized as an arduous and anxiety-evoking task for musicians, and yet, it is an essential aspect of performance. Research shows that musicians are often not taught how to memorize. While memorization and practice strategies of professionals have been studied, little research has been done to examine how student musicians learn to practice and memorize music in different cultural settings. We present the process of developing and testing a questionnaire of music memorization and musical practice for student musicians in the UK and Brazil. A survey was developed for a cross-cultural research project aiming at examining how young orchestral musicians (aged 7–18 years) in different learning environments and cultures engage in instrumental practice and memorization. The questionnaire development included members of a UK/US/Brazil research team of music educators and performance science researchers. A pool of items was developed for each aspect of practice and memorization identified, based on literature, personal experiences, and adapted from existing questionnaires. Item development took the varying levels of cognitive and social development of the target populations into consideration. It also considered the diverse target learning environments. Items were initially grouped in accordance with a single underlying construct/behavior. The questionnaire comprised three sections: a demographics section, a section on practice (containing 29 items), and a section on memorization (containing 40 items). Next, the response process was considered and a 5-point Likert scale ranging from ‘always’ to ‘never’ with a verbal label and an image assigned to each response option was selected, following effective questionnaire design for children and youths. Finally, a pilot study was conducted with young orchestral musicians from diverse learning environments in Brazil and the United Kingdom. Data collection took place in either one-to-one or group settings to facilitate the participants. Cognitive interviews were utilized to establish response process validity by confirming the readability and accurate comprehension of the questionnaire items or highlighting the need for item revision. Internal reliability was investigated by measuring the consistency of the item groups using the statistical test Cronbach’s alpha. The pilot study successfully relied on the questionnaire to generate data about the engagement of young musicians of different levels and instruments, across different learning and cultural environments, in instrumental practice and memorization. Interaction analysis of the cognitive interviews undertaken with these participants, however, exposed the fact that certain items, and the response scale, could be interpreted in multiple ways. The questionnaire text was, therefore, revised accordingly. The low Cronbach’s Alpha scores of many item groups indicated another issue with the original questionnaire: its low level of internal reliability. Several reasons for each poor reliability can be suggested, including the issues with item interpretation revealed through interaction analysis of the cognitive interviews, the small number of participants (34), and the elusive nature of the construct in question. The revised questionnaire measures 78 specific behaviors or opinions. It can be seen to provide an efficient means of gathering information about the engagement of young musicians in practice and memorization on a large scale.

Keywords: cross-cultural, memorization, practice, questionnaire, young musicians

Procedia PDF Downloads 113
22521 Application of Signature Verification Models for Document Recognition

Authors: Boris M. Fedorov, Liudmila P. Goncharenko, Sergey A. Sybachin, Natalia A. Mamedova, Ekaterina V. Makarenkova, Saule Rakhimova

Abstract:

In modern economic conditions, the question of the possibility of correct recognition of a signature on digital documents in order to verify the expression of will or confirm a certain operation is relevant. The additional complexity of processing lies in the dynamic variability of the signature for each individual, as well as in the way information is processed because the signature refers to biometric data. The article discusses the issues of using artificial intelligence models in order to improve the quality of signature confirmation in document recognition. The analysis of several possible options for using the model is carried out. The results of the study are given, in which it is possible to correctly determine the authenticity of the signature on small samples.

Keywords: signature recognition, biometric data, artificial intelligence, neural networks

Procedia PDF Downloads 134
22520 A Case Study: Social Network Analysis of Construction Design Teams

Authors: Elif D. Oguz Erkal, David Krackhardt, Erica Cochran-Hameen

Abstract:

Even though social network analysis (SNA) is an abundantly studied concept for many organizations and industries, a clear SNA approach to the project teams has not yet been adopted by the construction industry. The main challenges for performing SNA in construction and the apparent reason for this gap is the unique and complex structure of each construction project, the comparatively high circulation of project team members/contributing parties and the variety of authentic problems for each project. Additionally, there are stakeholders from a variety of professional backgrounds collaborating in a high-stress environment fueled by time and cost constraints. Within this case study on Project RE, a design & build project performed at the Urban Design Build Studio of Carnegie Mellon University, social network analysis of the project design team will be performed with the main goal of applying social network theory to construction project environments. The research objective is to determine a correlation between the network of how individuals relate to each other on one’s perception of their own professional strengths and weaknesses and the communication patterns within the team and the group dynamics. Data is collected through a survey performed over four rounds conducted monthly, detailed follow-up interviews and constant observations to assess the natural alteration in the network with the effect of time. The data collected is processed by the means of network analytics and in the light of the qualitative data collected with observations and individual interviews. This paper presents the full ethnography of this construction design team of fourteen architecture students based on an elaborate social network data analysis over time. This study is expected to be used as an initial step to perform a refined, targeted and large-scale social network data collection in construction projects in order to deduce the impacts of social networks on project performance and suggest better collaboration structures for construction project teams henceforth.

Keywords: construction design teams, construction project management, social network analysis, team collaboration, network analytics

Procedia PDF Downloads 187
22519 Computational Fluid Dynamics Simulations and Analysis of Air Bubble Rising in a Column of Liquid

Authors: Baha-Aldeen S. Algmati, Ahmed R. Ballil

Abstract:

Multiphase flows occur widely in many engineering and industrial processes as well as in the environment we live in. In particular, bubbly flows are considered to be crucial phenomena in fluid flow applications and can be studied and analyzed experimentally, analytically, and computationally. In the present paper, the dynamic motion of an air bubble rising within a column of liquid is numerically simulated using an open-source CFD modeling tool 'OpenFOAM'. An interface tracking numerical algorithm called MULES algorithm, which is built-in OpenFOAM, is chosen to solve an appropriate mathematical model based on the volume of fluid (VOF) numerical method. The bubbles initially have a spherical shape and starting from rest in the stagnant column of liquid. The algorithm is initially verified against numerical results and is also validated against available experimental data. The comparison revealed that this algorithm provides results that are in a very good agreement with the 2D numerical data of other CFD codes. Also, the results of the bubble shape and terminal velocity obtained from the 3D numerical simulation showed a very good qualitative and quantitative agreement with the experimental data. The simulated rising bubbles yield a very small percentage of error in the bubble terminal velocity compared with the experimental data. The obtained results prove the capability of OpenFOAM as a powerful tool to predict the behavior of rising characteristics of the spherical bubbles in the stagnant column of liquid. This will pave the way for a deeper understanding of the phenomenon of the rise of bubbles in liquids.

Keywords: CFD simulations, multiphase flows, OpenFOAM, rise of bubble, volume of fluid method, VOF

Procedia PDF Downloads 111
22518 Estimating Groundwater Seepage Rates: Case Study at Zegveld, Netherlands

Authors: Wondmyibza Tsegaye Bayou, Johannes C. Nonner, Joost Heijkers

Abstract:

This study aimed to identify and estimate dynamic groundwater seepage rates using four comparative methods; the Darcian approach, the water balance approach, the tracer method, and modeling. The theoretical background to these methods is put together in this study. The methodology was applied to a case study area at Zegveld following the advice of the Water Board Stichtse Rijnlanden. Data collection has been from various offices and a field campaign in the winter of 2008/09. In this complex confining layer of the study area, the location of the phreatic groundwater table is at a shallow depth compared to the piezometric water level. Data were available for the model years 1989 to 2000 and winter 2008/09. The higher groundwater table shows predominately-downward seepage in the study area. Results of the study indicated that net recharge to the groundwater table (precipitation excess) and the ditch system are the principal sources for seepage across the complex confining layer. Especially in the summer season, the contribution from the ditches is significant. Water is supplied from River Meije through a pumping system to meet the ditches' water demand. The groundwater seepage rate was distributed unevenly throughout the study area at the nature reserve averaging 0.60 mm/day for the model years 1989 to 2000 and 0.70 mm/day for winter 2008/09. Due to data restrictions, the seepage rates were mainly determined based on the Darcian method. Furthermore, the water balance approach and the tracer methods are applied to compute the flow exchange within the ditch system. The site had various validated groundwater levels and vertical flow resistance data sources. The phreatic groundwater level map compared with TNO-DINO groundwater level data values overestimated the groundwater level depth by 28 cm. The hydraulic resistance values obtained based on the 3D geological map compared with the TNO-DINO data agreed with the model values before calibration. On the other hand, the calibrated model significantly underestimated the downward seepage in the area compared with the field-based computations following the Darcian approach.

Keywords: groundwater seepage, phreatic water table, piezometric water level, nature reserve, Zegveld, The Netherlands

Procedia PDF Downloads 70
22517 Incorporating Spatial Transcriptome Data into Ligand-Receptor Analyses to Discover Regional Activation in Cells

Authors: Eric Bang

Abstract:

Interactions between receptors and ligands are crucial for many essential biological processes, including neurotransmission and metabolism. Ligand-receptor analyses that examine cell behavior and interactions often utilize cell type-specific RNA expressions from single-cell RNA sequencing (scRNA-seq) data. Using CellPhoneDB, a public repository consisting of ligands, receptors, and ligand-receptor interactions, the cell-cell interactions were explored in a specific scRNA-seq dataset from kidney tissue and portrayed the results with dot plots and heat maps. Depending on the type of cell, each ligand-receptor pair was aligned with the interacting cell type and calculated the positori probabilities of these associations, with corresponding P values reflecting average expression values between the triads and their significance. Using single-cell data (sample kidney cell references), genes in the dataset were cross-referenced with ones in the existing CellPhoneDB dataset. For example, a gene such as Pleiotrophin (PTN) present in the single-cell data also needed to be present in the CellPhoneDB dataset. Using the single-cell transcriptomics data via slide-seq and reference data, the CellPhoneDB program defines cell types and plots them in different formats, with the two main ones being dot plots and heat map plots. The dot plot displays derived measures of the cell to cell interaction scores and p values. For the dot plot, each row shows a ligand-receptor pair, and each column shows the two interacting cell types. CellPhoneDB defines interactions and interaction levels from the gene expression level, so since the p-value is on a -log10 scale, the larger dots represent more significant interactions. By performing an interaction analysis, a significant interaction was discovered for myeloid and T-cell ligand-receptor pairs, including those between Secreted Phosphoprotein 1 (SPP1) and Fibronectin 1 (FN1), which is consistent with previous findings. It was proposed that an effective protocol would involve a filtration step where cell types would be filtered out, depending on which ligand-receptor pair is activated in that part of the tissue, as well as the incorporation of the CellPhoneDB data in a streamlined workflow pipeline. The filtration step would be in the form of a Python script that expedites the manual process necessary for dataset filtration. Being in Python allows it to be integrated with the CellPhoneDB dataset for future workflow analysis. The manual process involves filtering cell types based on what ligand/receptor pair is activated in kidney cells. One limitation of this would be the fact that some pairings are activated in multiple cells at a time, so the manual manipulation of the data is reflected prior to analysis. Using the filtration script, accurate sorting is incorporated into the CellPhoneDB database rather than waiting until the output is produced and then subsequently applying spatial data. It was envisioned that this would reveal wherein the cell various ligands and receptors are interacting with different cell types, allowing for easier identification of which cells are being impacted and why, for the purpose of disease treatment. The hope is this new computational method utilizing spatially explicit ligand-receptor association data can be used to uncover previously unknown specific interactions within kidney tissue.

Keywords: bioinformatics, Ligands, kidney tissue, receptors, spatial transcriptome

Procedia PDF Downloads 128
22516 Geographic Information System (GIS) for Structural Typology of Buildings

Authors: Néstor Iván Rojas, Wilson Medina Sierra

Abstract:

Managing spatial information is described through a Geographic Information System (GIS), for some neighborhoods in the city of Tunja, in relation to the structural typology of the buildings. The use of GIS provides tools that facilitate the capture, processing, analysis and dissemination of cartographic information, product quality evaluation of the classification of buildings. Allows the development of a method that unifies and standardizes processes information. The project aims to generate a geographic database that is useful to the entities responsible for planning and disaster prevention and care for vulnerable populations, also seeks to be a basis for seismic vulnerability studies that can contribute in a study of urban seismic microzonation. The methodology consists in capturing the plat including road naming, neighborhoods, blocks and buildings, to which were added as attributes, the product of the evaluation of each of the housing data such as the number of inhabitants and classification, year of construction, the predominant structural systems, the type of mezzanine board and state of favorability, the presence of geo-technical problems, the type of cover, the use of each building, damage to structural and non-structural elements . The above data are tabulated in a spreadsheet that includes cadastral number, through which are systematically included in the respective building that also has that attribute. Geo-referenced data base is obtained, from which graphical outputs are generated, producing thematic maps for each evaluated data, which clearly show the spatial distribution of the information obtained. Using GIS offers important advantages for spatial information management and facilitates consultation and update. Usefulness of the project is recognized as a basis for studies on issues of planning and prevention.

Keywords: microzonation, buildings, geo-processing, cadastral number

Procedia PDF Downloads 321
22515 Jan’s Life-History: Changing Faces of Managerial Masculinities and Consequences for Health

Authors: Susanne Gustafsson

Abstract:

Life-history research is an extraordinarily fruitful method to use for social analysis and gendered health analysis in particular. Its potential is illustrated through a case study drawn from a Swedish project. It reveals an old type of masculinity that faces difficulties when carrying out two sets of demands simultaneously, as a worker/manager and as a father/husband. The paper illuminates the historical transformation of masculinity and the consequences of this for health. We draw on the idea of the “changing faces of masculinity” to explore the dynamism and complexity of gendered health. An empirical case is used for its illustrative abilities. Jan, a middle-level manager and father employed in the energy sector in urban Sweden is the subject of this paper. Jan’s story is one of 32 semi-structured interviews included in an extended study focusing on well-being at work. The results reveal a face of masculinity conceived of in middle-level management as tacitly linked to the neoliberal doctrine. Over a couple of decades, the idea of “flexibility” was turned into a valuable characteristic that everyone was supposed to strive for. This resulted in increased workloads. Quite a few employees, and managers, in particular, find themselves working both day and night. This may explain why not having enough time to spend with children and family members is a recurring theme in the data. Can this way of doing be linked to masculinity and health? The first author’s research has revealed that the use of gender in health science is not sufficiently or critically questioned. This lack of critical questioning is a serious problem, especially since ways of doing gender affect health. We suggest that gender reproduction and gender transformation are interconnected, regardless of how they affect health. They are recognized as two sides of the same phenomenon, and minor movements in one direction or the other become crucial for understanding its relation to health. More or less, at the same time, as Jan’s masculinity was reproduced in response to workplace practices, Jan’s family position was transformed—not totally but by a degree or two, and these degrees became significant for the family’s health and well-being. By moving back and forth between varied events in Jan’s biographical history and his sociohistorical life span, it becomes possible to show that in a time of gender transformations, power relations can be renegotiated, leading to consequences for health.

Keywords: changing faces of masculinity, gendered health, life-history research method, subverter

Procedia PDF Downloads 100
22514 Improving Cell Type Identification of Single Cell Data by Iterative Graph-Based Noise Filtering

Authors: Annika Stechemesser, Rachel Pounds, Emma Lucas, Chris Dawson, Julia Lipecki, Pavle Vrljicak, Jan Brosens, Sean Kehoe, Jason Yap, Lawrence Young, Sascha Ott

Abstract:

Advances in technology make it now possible to retrieve the genetic information of thousands of single cancerous cells. One of the key challenges in single cell analysis of cancerous tissue is to determine the number of different cell types and their characteristic genes within the sample to better understand the tumors and their reaction to different treatments. For this analysis to be possible, it is crucial to filter out background noise as it can severely blur the downstream analysis and give misleading results. In-depth analysis of the state-of-the-art filtering methods for single cell data showed that they do, in some cases, not separate noisy and normal cells sufficiently. We introduced an algorithm that filters and clusters single cell data simultaneously without relying on certain genes or thresholds chosen by eye. It detects communities in a Shared Nearest Neighbor similarity network, which captures the similarities and dissimilarities of the cells by optimizing the modularity and then identifies and removes vertices with a weak clustering belonging. This strategy is based on the fact that noisy data instances are very likely to be similar to true cell types but do not match any of these wells. Once the clustering is complete, we apply a set of evaluation metrics on the cluster level and accept or reject clusters based on the outcome. The performance of our algorithm was tested on three datasets and led to convincing results. We were able to replicate the results on a Peripheral Blood Mononuclear Cells dataset. Furthermore, we applied the algorithm to two samples of ovarian cancer from the same patient before and after chemotherapy. Comparing the standard approach to our algorithm, we found a hidden cell type in the ovarian postchemotherapy data with interesting marker genes that are potentially relevant for medical research.

Keywords: cancer research, graph theory, machine learning, single cell analysis

Procedia PDF Downloads 90
22513 Association between Anemia and Maternal Depression during Pregnancy: Systematic Review

Authors: Gebeyaw Molla Wondim, Damen Haile Mariam, Wubegzier Mekonnen, Catherine Arsenault

Abstract:

Introduction: Maternal depression is a common psychological disorder that mostly occurs during pregnancy and after childbirth. It affects approximately one in four women worldwide. There is inconsistent evidence regarding the association between anemia and maternal depression. The objective of this systematic review was to examine the association between anemia and depression during pregnancy. Method: A comprehensive search of articles published before March 8, 2024, was conducted in seven databases such as PubMed, Scopus, Web of Science, PsycINFO, CINAHL, Cochrane Library, and Google Scholar. The Boolean operators “AND” or “OR” and “NOT” were used to connect the MeSH terms and keywords. Rayyan software was used to screen articles for final retrieval, and the PRISMA diagram was used to show the article selection process. Data extraction and risk bias assessment were done by two reviewers independently. JBI critical appraisal tool was used to assess the methodological quality of the retrieved articles. Heterogenicity was assessed through visual inspection of the extracted result, and narrative analysis was used to synthesize the result. Result: A total of 2,413 articles were obtained from seven electronic databases. Among these articles, a total of 2,398 were removed due to duplication (702 articles), by title and abstract selection criteria (1,678 articles), and by full-text review (18 articles). Finally, in this systematic review, 15 articles with a total of 628,781 pregnant women were included: seven articles were cohort studies, two were case-control, and six studies were cross-sectional. All included studies were published between 2013 and 2022. Studies conducted in the United States, South Korea, Finland, and one in South India found no significant association between anemia and maternal depression during pregnancy. On the other hand, studies conducted in Australia, Canada, Finland, Israel, Turkey, Vietnam, Ethiopia, and South India showed a significant association between anemia and depression during pregnancy. Conclusion: The overall finding of the systematic review shows the burden of anemia and antenatal depression is much higher among pregnant women in developing countries. Around three-fourths of the studies show that anemia is positively associated with antenatal depression. Almost all studies conducted in LMICs show anemia positively associated with antenatal depression.

Keywords: pregnant, women, anemia, depression

Procedia PDF Downloads 16
22512 Data Collection Techniques for Robotics to Identify the Facial Expressions of Traumatic Brain Injured Patients

Authors: Chaudhary Muhammad Aqdus Ilyas, Matthias Rehm, Kamal Nasrollahi, Thomas B. Moeslund

Abstract:

This paper presents the investigation of data collection procedures, associated with robots when placed with traumatic brain injured (TBI) patients for rehabilitation purposes through facial expression and mood analysis. Rehabilitation after TBI is very crucial due to nature of injury and variation in recovery time. It is advantageous to analyze these emotional signals in a contactless manner, due to the non-supportive behavior of patients, limited muscle movements and increase in negative emotional expressions. This work aims at the development of framework where robots can recognize TBI emotions through facial expressions to perform rehabilitation tasks by physical, cognitive or interactive activities. The result of these studies shows that with customized data collection strategies, proposed framework identify facial and emotional expressions more accurately that can be utilized in enhancing recovery treatment and social interaction in robotic context.

Keywords: computer vision, convolution neural network- long short term memory network (CNN-LSTM), facial expression and mood recognition, multimodal (RGB-thermal) analysis, rehabilitation, robots, traumatic brain injured patients

Procedia PDF Downloads 137
22511 The Extended Skew Gaussian Process for Regression

Authors: M. T. Alodat

Abstract:

In this paper, we propose a generalization to the Gaussian process regression(GPR) model called the extended skew Gaussian process for regression(ESGPr) model. The ESGPR model works better than the GPR model when the errors are skewed. We derive the predictive distribution for the ESGPR model at a new input. Also we apply the ESGPR model to FOREX data and we find that it fits the Forex data better than the GPR model.

Keywords: extended skew normal distribution, Gaussian process for regression, predictive distribution, ESGPr model

Procedia PDF Downloads 538
22510 Training AI to Be Empathetic and Determining the Psychotype of a Person During a Conversation with a Chatbot

Authors: Aliya Grig, Konstantin Sokolov, Igor Shatalin

Abstract:

The report describes the methodology for collecting data and building an ML model for determining the personality psychotype using profiling and personality traits methods based on several short messages of a user communicating on an arbitrary topic with a chitchat bot. In the course of the experiments, the minimum amount of text was revealed to confidently determine aspects of personality. Model accuracy - 85%. Users' language of communication is English. AI for a personalized communication with a user based on his mood, personality, and current emotional state. Features investigated during the research: personalized communication; providing empathy; adaptation to a user; predictive analytics. In the report, we describe the processes that captures both structured and unstructured data pertaining to a user in large quantities and diverse forms. This data is then effectively processed through ML tools to construct a knowledge graph and draw inferences regarding users of text messages in a comprehensive manner. Specifically, the system analyzes users' behavioral patterns and predicts future scenarios based on this analysis. As a result of the experiments, we provide for further research on training AI models to be empathetic, creating personalized communication for a user

Keywords: AI, empathetic, chatbot, AI models

Procedia PDF Downloads 75
22509 Calpoly Autonomous Transportation Experience: Software for Driverless Vehicle Operating on Campus

Authors: F. Tang, S. Boskovich, A. Raheja, Z. Aliyazicioglu, S. Bhandari, N. Tsuchiya

Abstract:

Calpoly Autonomous Transportation Experience (CATE) is a driverless vehicle that we are developing to provide safe, accessible, and efficient transportation of passengers throughout the Cal Poly Pomona campus for events such as orientation tours. Unlike the other self-driving vehicles that are usually developed to operate with other vehicles and reside only on the road networks, CATE will operate exclusively on walk-paths of the campus (potentially narrow passages) with pedestrians traveling from multiple locations. Safety becomes paramount as CATE operates within the same environment as pedestrians. As driverless vehicles assume greater roles in today’s transportation, this project will contribute to autonomous driving with pedestrian traffic in a highly dynamic environment. The CATE project requires significant interdisciplinary work. Researchers from mechanical engineering, electrical engineering and computer science are working together to attack the problem from different perspectives (hardware, software and system). In this abstract, we describe the software aspects of the project, with a focus on the requirements and the major components. CATE shall provide a GUI interface for the average user to interact with the car and access its available functionalities, such as selecting a destination from any origin on campus. We have developed an interface that provides an aerial view of the campus map, the current car location, routes, and the goal location. Users can interact with CATE through audio or manual inputs. CATE shall plan routes from the origin to the selected destination for the vehicle to travel. We will use an existing aerial map for the campus and convert it to a spatial graph configuration where the vertices represent the landmarks and edges represent paths that the car should follow with some designated behaviors (such as stay on the right side of the lane or follow an edge). Graph search algorithms such as A* will be implemented as the default path planning algorithm. D* Lite will be explored to efficiently recompute the path when there are any changes to the map. CATE shall avoid any static obstacles and walking pedestrians within some safe distance. Unlike traveling along traditional roadways, CATE’s route directly coexists with pedestrians. To ensure the safety of the pedestrians, we will use sensor fusion techniques that combine data from both lidar and stereo vision for obstacle avoidance while also allowing CATE to operate along its intended route. We will also build prediction models for pedestrian traffic patterns. CATE shall improve its location and work under a GPS-denied situation. CATE relies on its GPS to give its current location, which has a precision of a few meters. We have implemented an Unscented Kalman Filter (UKF) that allows the fusion of data from multiple sensors (such as GPS, IMU, odometry) in order to increase the confidence of localization. We also noticed that GPS signals can easily get degraded or blocked on campus due to high-rise buildings or trees. UKF can also help here to generate a better state estimate. In summary, CATE will provide on-campus transportation experience that coexists with dynamic pedestrian traffic. In future work, we will extend it to multi-vehicle scenarios.

Keywords: driverless vehicle, path planning, sensor fusion, state estimate

Procedia PDF Downloads 128
22508 Anemia Among Pregnant Women in Kuwait: Findings from Kuwait Birth Cohort Study

Authors: Majeda Hammoud

Abstract:

Background: Anemia during pregnancy increases the risk of delivery by cesarean section, low birth weight, preterm birth, perinatal mortality, stillbirth, and maternal mortality. In this study, we aimed to assess the prevalence of anemia in pregnant women and its associated factors in the Kuwait birth cohort study. Methods: The Kuwait birth cohort (N=1108) was a prospective cohort study in which pregnant women were recruited in the third trimester. Data were collected through personal interviews with mothers who attend antenatal care visits, including data on socio-economic status and lifestyle factors. Blood samples were taken after the recruitment to measure multiple laboratory indicators. Clinical data were extracted from the medical records by a clinician including data on comorbidities. Anemia was defined as having Hemoglobin (Hb) <110 g/L with further classification as mild (100-109 g/L), moderate (70-99 g/L), or severe (<70 g/L). Predictors of anemia were classified as underlying or direct factors, and logistic regression was used to investigate their association with anemia. Results: The mean Hb level in the study group was 115.21 g/L (95%CI: 114.56- 115.87 g/L), with significant differences between age groups (p=0.034). The prevalence of anemia was 28.16% (95%CI: 25.53-30.91%), with no significant difference by age group (p=0.164). Of all 1108 pregnant women, 8.75% had moderate anemia, and 19.40% had mild anemia, but no pregnant women had severe anemia. In multivariable analysis, getting pregnant while using contraception, adjusted odds ratio (AOR) 1.73(95%CI:1.01-2.96); p=0.046 and current use of supplements, AOR 0.50 (95%CI: 0.26-0.95); p=0.035 were significantly associated with anemia (underlying factors). From the direct factors group, only iron and ferritin levels were significantly associated with anemia (P<0.001). Conclusion: Although the severe form of anemia is low among pregnant women in Kuwait, mild and moderate anemia remains a significant health problem despite free access to antenatal care.

Keywords: anemia, pregnancy, hemoglobin, ferritin

Procedia PDF Downloads 38
22507 Model-Driven and Data-Driven Approaches for Crop Yield Prediction: Analysis and Comparison

Authors: Xiangtuo Chen, Paul-Henry Cournéde

Abstract:

Crop yield prediction is a paramount issue in agriculture. The main idea of this paper is to find out efficient way to predict the yield of corn based meteorological records. The prediction models used in this paper can be classified into model-driven approaches and data-driven approaches, according to the different modeling methodologies. The model-driven approaches are based on crop mechanistic modeling. They describe crop growth in interaction with their environment as dynamical systems. But the calibration process of the dynamic system comes up with much difficulty, because it turns out to be a multidimensional non-convex optimization problem. An original contribution of this paper is to propose a statistical methodology, Multi-Scenarios Parameters Estimation (MSPE), for the parametrization of potentially complex mechanistic models from a new type of datasets (climatic data, final yield in many situations). It is tested with CORNFLO, a crop model for maize growth. On the other hand, the data-driven approach for yield prediction is free of the complex biophysical process. But it has some strict requirements about the dataset. A second contribution of the paper is the comparison of these model-driven methods with classical data-driven methods. For this purpose, we consider two classes of regression methods, methods derived from linear regression (Ridge and Lasso Regression, Principal Components Regression or Partial Least Squares Regression) and machine learning methods (Random Forest, k-Nearest Neighbor, Artificial Neural Network and SVM regression). The dataset consists of 720 records of corn yield at county scale provided by the United States Department of Agriculture (USDA) and the associated climatic data. A 5-folds cross-validation process and two accuracy metrics: root mean square error of prediction(RMSEP), mean absolute error of prediction(MAEP) were used to evaluate the crop prediction capacity. The results show that among the data-driven approaches, Random Forest is the most robust and generally achieves the best prediction error (MAEP 4.27%). It also outperforms our model-driven approach (MAEP 6.11%). However, the method to calibrate the mechanistic model from dataset easy to access offers several side-perspectives. The mechanistic model can potentially help to underline the stresses suffered by the crop or to identify the biological parameters of interest for breeding purposes. For this reason, an interesting perspective is to combine these two types of approaches.

Keywords: crop yield prediction, crop model, sensitivity analysis, paramater estimation, particle swarm optimization, random forest

Procedia PDF Downloads 220
22506 Academic Goal Setting Practices of University Students in Lagos State, Nigeria: Implications for Counselling

Authors: Asikhia Olubusayo Aduke

Abstract:

Students’ inability to set data-based (specific, measurable, attainable, reliable, and time-bound) personal improvement goals threatens their academic success. Hence, the study aimed to investigate year-one students’ academic goal-setting practices at Lagos State University of Education, Nigeria. Descriptive survey research was used in carrying out this study. The study population consisted of 3,101 year-one students of the University. A sample size of five hundred (501) participants was selected through a proportional and simple random sampling technique. The Formative Goal Setting Questionnaire (FGSQ) developed by Research Collaboration (2015) was adapted and used as an instrument for the study. Two main research questions were answered, while two null hypotheses were formulated and tested for the study. The study revealed higher data-based goals for all students than personal improvement goals. Nevertheless, data-based and personal improvement goal-setting for female students was higher than for male students. One sample test statistic and Anova used to analyse data for the two hypotheses also revealed that the mean difference between male and female year one students’ data-based and personal improvement goal-setting formation was statistically significant (p < 0.05). This means year one students’ data-based and personal improvement goals showed significant gender differences. Based on the findings of this study, it was recommended, among others, that therapeutic techniques that can help to change students’ faulty thinking and challenge their lack of desire for personal improvement should be sought to treat students who have problems with setting high personal improvement goals. Counsellors also need to advocate continued research into how to increase the goal-setting ability of male students and should focus more on counselling male students’ goal-setting ability. The main contributions of the study are higher institutions must prioritize early intervention in first-year students' academic goal setting. Researching gender differences in this practice reveals a crucial insight: male students often lag behind in setting meaningful goals, impacting their motivation and performance. Focusing on this demographic with data-driven personal improvement goals can be transformative. By promoting goal setting that is specific, measurable, and focused on self-growth (rather than competition), male students can unlock their full potential. Researchers and counselors play a vital role in detecting and supporting students with lower goal-setting tendencies. By prioritizing this intervention, we can empower all students to set ambitious, personalized goals that ignite their passion for learning and pave the way for academic success.

Keywords: academic goal setting, counselling, practice, university, year one students

Procedia PDF Downloads 45
22505 Estimating X-Ray Spectra for Digital Mammography by Using the Expectation Maximization Algorithm: A Monte Carlo Simulation Study

Authors: Chieh-Chun Chang, Cheng-Ting Shih, Yan-Lin Liu, Shu-Jun Chang, Jay Wu

Abstract:

With the widespread use of digital mammography (DM), radiation dose evaluation of breasts has become important. X-ray spectra are one of the key factors that influence the absorbed dose of glandular tissue. In this study, we estimated the X-ray spectrum of DM using the expectation maximization (EM) algorithm with the transmission measurement data. The interpolating polynomial model proposed by Boone was applied to generate the initial guess of the DM spectrum with the target/filter combination of Mo/Mo and the tube voltage of 26 kVp. The Monte Carlo N-particle code (MCNP5) was used to tally the transmission data through aluminum sheets of 0.2 to 3 mm. The X-ray spectrum was reconstructed by using the EM algorithm iteratively. The influence of the initial guess for EM reconstruction was evaluated. The percentage error of the average energy between the reference spectrum inputted for Monte Carlo simulation and the spectrum estimated by the EM algorithm was -0.14%. The normalized root mean square error (NRMSE) and the normalized root max square error (NRMaSE) between both spectra were 0.6% and 2.3%, respectively. We conclude that the EM algorithm with transmission measurement data is a convenient and useful tool for estimating x-ray spectra for DM in clinical practice.

Keywords: digital mammography, expectation maximization algorithm, X-Ray spectrum, X-Ray

Procedia PDF Downloads 714
22504 Single Imputation for Audiograms

Authors: Sarah Beaver, Renee Bryce

Abstract:

Audiograms detect hearing impairment, but missing values pose problems. This work explores imputations in an attempt to improve accuracy. This work implements Linear Regression, Lasso, Linear Support Vector Regression, Bayesian Ridge, K Nearest Neighbors (KNN), and Random Forest machine learning techniques to impute audiogram frequencies ranging from 125Hz to 8000Hz. The data contains patients who had or were candidates for cochlear implants. Accuracy is compared across two different Nested Cross-Validation k values. Over 4000 audiograms were used from 800 unique patients. Additionally, training on data combines and compares left and right ear audiograms versus single ear side audiograms. The accuracy achieved using Root Mean Square Error (RMSE) values for the best models for Random Forest ranges from 4.74 to 6.37. The R\textsuperscript{2} values for the best models for Random Forest ranges from .91 to .96. The accuracy achieved using RMSE values for the best models for KNN ranges from 5.00 to 7.72. The R\textsuperscript{2} values for the best models for KNN ranges from .89 to .95. The best imputation models received R\textsuperscript{2} between .89 to .96 and RMSE values less than 8dB. We also show that the accuracy of classification predictive models performed better with our best imputation models versus constant imputations by a two percent increase.

Keywords: machine learning, audiograms, data imputations, single imputations

Procedia PDF Downloads 68
22503 Role of Imaging in Alzheimer's Disease Trials: Impact on Trial Planning, Patient Recruitment and Retention

Authors: Kohkan Shamsi

Abstract:

Background: MRI and PET are now extensively utilized in Alzheimer's disease (AD) trials for patient eligibility, efficacy assessment, and safety evaluations but including imaging in AD trials impacts site selection process, patient recruitment, and patient retention. Methods: PET/MRI are performed at baseline and at multiple follow-up timepoints. This requires prospective site imaging qualification, evaluation of phantom data, training and continuous monitoring of machines for acquisition of standardized and consistent data. This also requires prospective patient/caregiver training as patients must go to multiple facilities for imaging examinations. We will share our experience form one of the largest AD programs. Lesson learned: Many neurological diseases have a similar presentation as AD or could confound the assessment of drug therapy. The inclusion of wrong patients has ethical and legal issues, and data could be excluded from the analysis. Centralized eligibility evaluation read process will be discussed. Amyloid related imaging abnormalities (ARIA) were observed in amyloid-β trials. FDA recommended regular monitoring of ARIA. Our experience in ARIA evaluations in large phase III study at > 350 sites will be presented. Efficacy evaluation: MRI is utilized to evaluate various volumes of the brain. FDG PET or amyloid PET agents has been used in AD trials. We will share our experience about site and central independent reads. Imaging logistic issues that need to be handled in the planning phase will also be discussed as it can impact patient compliance thereby increasing missing data and affecting study results. Conclusion: imaging must be prospectively planned to include standardizing imaging methodologies, site selection process and selecting assessment criteria. Training should be transparently conducted and documented. Prospective patient/caregiver awareness of imaging requirement is essential for patient compliance and reduction in missing imaging data.

Keywords: Alzheimer's disease, ARIA, MRI, PET, patient recruitment, retention

Procedia PDF Downloads 104
22502 The Thoughts and Feelings of 60-72 Month Old Children about School and Teacher

Authors: Ayse Ozturk Samur, Gozde Inal Kiziltepe

Abstract:

No matter what level of education it is, starting a school is an exciting process as it includes new experiences. In this process, child steps into a different environment and institution except from the family institution which he was born into and feels secure. That new environment is different from home; it is a social environment which has its own rules, and involves duties and responsibilities that should be fulfilled and new vital experiences. The children who have a positive attitude towards school and like school are more enthusiastic and eager to participate in classroom activities. Moreover, a close relationship with the teacher enables the child to have positive emotions and ideas about the teacher and school and helps children adapt to school easily. In this study, it is aimed to identify children’s perceptions of academic competence, attitudes towards school and ideas about their teachers. In accordance with the aim a mixed method that includes both qualitative and quantitative data collection methods are used. The study is supported with qualitative data after collecting quantitative data. The study group of the research consists of randomly chosen 250 children who are 60-72 month old and attending a preschool institution in a city center located West Anatolian region of Turkey. Quantitative data was collected using Feelings about School scale. The scale consists of 12 items and 4 dimensions; school, teacher, mathematic, and literacy. Reliability and validity study for the scale used in the study was conducted by the researchers with 318 children who were 60-72 months old. For content validity experts’ ideas were asked, for construct validity confirmatory factor analysis was utilized. Reliability of the scale was examined by calculating internal consistency coefficient (Cronbach alpha). At the end of the analyses it was found that FAS is a valid and reliable instrument to identify 60-72 month old children’ perception of their academic competency, attitude toward school and ideas about their teachers. For the qualitative dimension of the study, semi-structured interviews were done with 30 children aged 60-72 month. At the end of the study, it was identified that children’s’ perceptions of their academic competencies and attitudes towards school was medium-level and their ideas about their teachers were high. Based on the semi structured interviews done with children, it is identified that they have a positive perception of school and teacher. That means quantitatively gathered data is supported by qualitatively collected data.

Keywords: feelings, preschool education, school, teacher, thoughts

Procedia PDF Downloads 211
22501 Determination of Optimum Torque of an Internal Combustion Engine by Exergy Analysis

Authors: Veena Chaudhary, Rakesh P. Gakkhar

Abstract:

In this study, energy and exergy analysis are applied to the experimental data of an internal combustion engine operating on conventional diesel cycle. The experimental data are collected using an engine unit which enables accurate measurements of fuel flow rate, combustion air flow rate, engine load, engine speed and all relevant temperatures. First and second law efficiencies are calculated for different engine speed and compared. Results indicate that the first law (energy) efficiency is maximum at 1700 rpm whereas exergy efficiency is maximum and exergy destruction is minimum at 1900 rpm.

Keywords: diesel engine, exergy destruction, exergy efficiency, second law of thermodynamics

Procedia PDF Downloads 315
22500 Estimation Atmospheric parameters for Weather Study and Forecast over Equatorial Regions Using Ground-Based Global Position System

Authors: Asmamaw Yehun, Tsegaye Kassa, Addisu Hunegnaw, Martin Vermeer

Abstract:

There are various models to estimate the neutral atmospheric parameter values, such as in-suite and reanalysis datasets from numerical models. Accurate estimated values of the atmospheric parameters are useful for weather forecasting and, climate modeling and monitoring of climate change. Recently, Global Navigation Satellite System (GNSS) measurements have been applied for atmospheric sounding due to its robust data quality and wide horizontal and vertical coverage. The Global Positioning System (GPS) solutions that includes tropospheric parameters constitute a reliable set of data to be assimilated into climate models. The objective of this paper is, to estimate the neutral atmospheric parameters such as Wet Zenith Delay (WZD), Precipitable Water Vapour (PWV) and Total Zenith Delay (TZD) using six selected GPS stations in the equatorial regions, more precisely, the Ethiopian GPS stations from 2012 to 2015 observational data. Based on historic estimated GPS-derived values of PWV, we forecasted the PWV from 2015 to 2030. During data processing and analysis, we applied GAMIT-GLOBK software packages to estimate the atmospheric parameters. In the result, we found that the annual averaged minimum values of PWV are 9.72 mm for IISC and maximum 50.37 mm for BJCO stations. The annual averaged minimum values of WZD are 6 cm for IISC and maximum 31 cm for BDMT stations. In the long series of observations (from 2012 to 2015), we also found that there is a trend and cyclic patterns of WZD, PWV and TZD for all stations.

Keywords: atmosphere, GNSS, neutral atmosphere, precipitable water vapour

Procedia PDF Downloads 49
22499 Design and Implementation a Platform for Adaptive Online Learning Based on Fuzzy Logic

Authors: Budoor Al Abid

Abstract:

Educational systems are increasingly provided as open online services, providing guidance and support for individual learners. To adapt the learning systems, a proper evaluation must be made. This paper builds the evaluation model Fuzzy C Means Adaptive System (FCMAS) based on data mining techniques to assess the difficulty of the questions. The following steps are implemented; first using a dataset from an online international learning system called (slepemapy.cz) the dataset contains over 1300000 records with 9 features for students, questions and answers information with feedback evaluation. Next, a normalization process as preprocessing step was applied. Then FCM clustering algorithms are used to adaptive the difficulty of the questions. The result is three cluster labeled data depending on the higher Wight (easy, Intermediate, difficult). The FCM algorithm gives a label to all the questions one by one. Then Random Forest (RF) Classifier model is constructed on the clustered dataset uses 70% of the dataset for training and 30% for testing; the result of the model is a 99.9% accuracy rate. This approach improves the Adaptive E-learning system because it depends on the student behavior and gives accurate results in the evaluation process more than the evaluation system that depends on feedback only.

Keywords: machine learning, adaptive, fuzzy logic, data mining

Procedia PDF Downloads 176
22498 The Effects of Native Forests Conservation and Preservation Scenarios on Two Chilean Basins Water Cycle, under Climate Change Conditions

Authors: Hernández Marieta, Aguayo Mauricio, Pedreros María, Llompart Ovidio

Abstract:

The hydrological cycle is influenced by multiple factors, including climate change, land use changes, and anthropogenic activities, all of which threaten water availability and quality worldwide. In recent decades, numerous investigations have used landscape metrics and hydrological modeling to demonstrate the influence of landscape patterns on the hydrological cycle components' natural dynamics. Many of these investigations have determined the repercussions on the quality and availability of water, sedimentation, and erosion regime, mainly in Asian basins. In fact, there is progress in this branch of science, but there are still unanswered questions for our region. This study examines the hydrological response in Chilean basins under various land use change scenarios (LUCC) and the influence of climate change. The components of the water cycle were modeled using a physically distributed type hydrological and hydraulic simulation model based on and oriented to mountain basins TETIS model. Future climate data were derived from Chilean regional simulations using the WRF-MIROC5 model, forced with the RCP 8.5 scenario, at a 25 km resolution for the periods 2030-2060 and 2061-2091. LUCC scenarios were designed based on nature-based solutions, landscape pattern influences, current national and international water conservation legislation, and extreme scenarios of non-preservation and conservation of native forests. The scenarios that demonstrate greater water availability, even under climate change, are those promoting the restoration of native forests in over 30% of the basins, even alongside agricultural activities. Current legislation promoting the restoration of native forests only in riparian zones (30-60 m or 200 m in steeper areas) will not be resilient enough to address future water shortages. Evapotranspiration, direct runoff, and water availability at basin outlets showed the greatest variations due to LUCC. The relationship between hydrological modeling and landscape configuration is an effective tool for establishing future territorial planning that prioritizes water resource protection.

Keywords: TETIS, landscape pattern, hydrological process, water availability, Chilean basins

Procedia PDF Downloads 17
22497 Impact of Applying Bag House Filter Technology in Cement Industry on Ambient Air Quality - Case Study: Alexandria Cement Company

Authors: Haggag H. Mohamed, Ghatass F. Zekry, Shalaby A. Elsayed

Abstract:

Most sources of air pollution in Egypt are of anthropogenic origin. Alexandria Governorate is located at north of Egypt. The main contributing sectors of air pollution in Alexandria are industry, transportation and area source due to human activities. Alexandria includes more than 40% of the industrial activities in Egypt. Cement manufacture contributes a significant amount to the particulate pollution load. Alexandria Portland Cement Company (APCC) surrounding was selected to be the study area. APCC main kiln stack Total Suspended Particulate (TSP) continuous monitoring data was collected for assessment of dust emission control technology. Electro Static Precipitator (ESP) was fixed on the cement kiln since 2002. The collected data of TSP for first quarter of 2012 was compared to that one in first quarter of 2013 after installation of new bag house filter. In the present study, based on these monitoring data and metrological data a detailed air dispersion modeling investigation was carried out using the Industrial Source Complex Short Term model (ISC3-ST) to find out the impact of applying new bag house filter control technology on the neighborhood ambient air quality. The model results show a drastic reduction of the ambient TSP hourly average concentration from 44.94μg/m3 to 5.78μg/m3 which assures the huge positive impact on the ambient air quality by applying bag house filter technology on APCC cement kiln

Keywords: air pollution modeling, ambient air quality, baghouse filter, cement industry

Procedia PDF Downloads 255
22496 Comparative Coverage Analysis of Football and Other Sports by the Leading English Newspapers of India during FIFA World Cup 2014

Authors: Rajender Lal, Seema Kaushik

Abstract:

The FIFA World Cup, often simply called the World Cup, is an international association football competition contested by the senior men's national teams of the members of Fédération Internationale de Football Association (FIFA), the sport's global governing body. The championship has been awarded every four years since the inaugural tournament in 1930, except in 1942 and 1946 when it was not held because of the Second World War. Its 20th edition took place in Brazil from 12 June to 13 July 2014, which was won by Germany. The World Cup is the most widely viewed and followed sporting event in the world, exceeding even the Olympic Games; the cumulative audience of all matches of the 2006 FIFA World Cup was estimated to be 26.29 billion with an estimated 715.1 million people watching the final match, a ninth of the entire population of the planet. General-interest newspapers typically publish news articles and feature articles on national and international news as well as local news. The news includes political events and personalities, business and finance, crime, severe weather, and natural disasters; health and medicine, science, and technology; sports; and entertainment, society, food and cooking, clothing and home fashion, and the arts. It became curiosity to investigate that how much coverage is given to this most widely viewed international event as compared to other sports in India. Hence, the present study was conducted with the aim of examining the comparative coverage of FIFA World Cup 2014 and other sports in the four leading Newspapers of India including Hindustan Times, The Hindu, The Times of India, and The Tribune. Specific objectives were to measure the source of news, type of news items and the placement of news related to FIFA World Cup and other sports. Representative sample of ten editions each of the four English dailies was chosen for the purpose of the study. The analysis was based on the actual scanning of data from the representative sample of the dailies for the period of the competition. It can be concluded from the analysis that this event was given maximum coverage by the Hindustan Times while other sports were equally covered by The Hindu.

Keywords: coverage analysis, FIFA World Cup 2014, Hindustan Times, the Hindu, The Times of India, The Tribune

Procedia PDF Downloads 272
22495 An Alternative Credit Scoring System in China’s Consumer Lendingmarket: A System Based on Digital Footprint Data

Authors: Minjuan Sun

Abstract:

Ever since the late 1990s, China has experienced explosive growth in consumer lending, especially in short-term consumer loans, among which, the growth rate of non-bank lending has surpassed bank lending due to the development in financial technology. On the other hand, China does not have a universal credit scoring and registration system that can guide lenders during the processes of credit evaluation and risk control, for example, an individual’s bank credit records are not available for online lenders to see and vice versa. Given this context, the purpose of this paper is three-fold. First, we explore if and how alternative digital footprint data can be utilized to assess borrower’s creditworthiness. Then, we perform a comparative analysis of machine learning methods for the canonical problem of credit default prediction. Finally, we analyze, from an institutional point of view, the necessity of establishing a viable and nationally universal credit registration and scoring system utilizing online digital footprints, so that more people in China can have better access to the consumption loan market. Two different types of digital footprint data are utilized to match with bank’s loan default records. Each separately captures distinct dimensions of a person’s characteristics, such as his shopping patterns and certain aspects of his personality or inferred demographics revealed by social media features like profile image and nickname. We find both datasets can generate either acceptable or excellent prediction results, and different types of data tend to complement each other to get better performances. Typically, the traditional types of data banks normally use like income, occupation, and credit history, update over longer cycles, hence they can’t reflect more immediate changes, like the financial status changes caused by the business crisis; whereas digital footprints can update daily, weekly, or monthly, thus capable of providing a more comprehensive profile of the borrower’s credit capabilities and risks. From the empirical and quantitative examination, we believe digital footprints can become an alternative information source for creditworthiness assessment, because of their near-universal data coverage, and because they can by and large resolve the "thin-file" issue, due to the fact that digital footprints come in much larger volume and higher frequency.

Keywords: credit score, digital footprint, Fintech, machine learning

Procedia PDF Downloads 145