26 Time Series Regression with Meta-Clusters

Authors: Monika Chuchro


This paper presents a preliminary attempt to apply classification of time series using meta-clusters in order to improve the quality of regression models. In this case, clustering was performed as a method to obtain a subgroups of time series data with normal distribution from inflow into waste water treatment plant data which Composed of several groups differing by mean value. Two simple algorithms: K-mean and EM were chosen as a clustering method. The rand index was used to measure the similarity. After simple meta-clustering, regression model was performed for each subgroups. The final model was a sum of subgroups models. The quality of obtained model was compared with the regression model made using the same explanatory variables but with no clustering of data. Results were compared by determination coefficient (R2), measure of prediction accuracy mean absolute percentage error (MAPE) and comparison on linear chart. Preliminary results allows to foresee the potential of the presented technique.

Keywords: Data Mining, Data Analysis, Clustering, Predictive Models

25 Industrial Process Mining Based on Data Pattern Modeling and Nonlinear Analysis

Authors: Hyun-Woo Cho


Unexpected events may occur with serious impacts on industrial process. This work utilizes a data representation technique to model and to analyze process data pattern for the purpose of diagnosis. In this work, the use of triangular representation of process data is evaluated using simulation process. Furthermore, the effect of using different pre-treatment techniques based on such as linear or nonlinear reduced spaces was compared. This work extracted the fault pattern in the reduced space, not in the original data space. The results have shown that the non-linear technique based diagnosis method produced more reliable results and outperforms linear method.

Keywords: Data Analysis, Process monitoring, pattern modeling, fault, nonlinear techniques

24 High Performance Computing and Big Data Analytics

Authors: Branci Sarra, Branci Saadia


Because of the multiplied data growth, many computer science tools have been developed to process and analyze these Big Data. High-performance computing architectures have been designed to meet the treatment needs of Big Data (view transaction processing standpoint, strategic, and tactical analytics). The purpose of this article is to provide a historical and global perspective on the recent trend of high-performance computing architectures especially what has a relation with Analytics and Data Mining.

Keywords: Data Analysis, High Performance Computing, Big Data, HPC

23 A Qualitative Study Examining the Process of EFL Course Design from the Perspectives of Teachers

Authors: Iman Al Khalidi


Recently, English has become the language of globalization and technology. In turn, this has resulted in a seemingly bewildering array of influences and trends in the domain of TESOL curriculum. In light of these changes, higher education has to provide a new and more powerful kind of education. It should prepare students to be more engaged citizens, more capable to solve complex problems at work, and well prepared to lead meaningful life. In response to this, universities, colleges, schools, and departments have to work out in light of the requirements and challenges of the global and technological era. Consequently they have to focus on the adoption of contemporary curriculum which goes in line with the pedagogical shifts from teaching –centered approach to learning centered approach. Ideally, there has been noticeable emphasis on the crucial importance of developing and professionalizing teachers in order to engage them in the process of curriculum development and action research. This is a qualitative study that aims at understanding and exploring the process of designing EFL courses by teachers at the tertiary level from the perspectives of the participants in a professional context in TESOL, Department of English, a private college in Oman. It is a case study that stands on the philosophy of the qualitative approach. It employs multi methods for collecting qualitative data: semi-structured interviews with teachers, focus group discussions with students, and document analysis. The collected data have been analyzed qualitatively by adopting Miles and Huberman's Approach using procedures of reduction, coding, displaying and conclusion drawing and verification.

Keywords: Data Analysis, Case study, Course Design, components of course design

21 Spatial Behavioral Model-Based Dynamic Data-Driven Diagram Information Model

Authors: Chiung-Hui Chen


Diagram and drawing are important ways to communicate and the reproduce of architectural design, Due to the development of information and communication technology, the professional thinking of architecture and interior design are also change rapidly. In development process of design, diagram always play very important role. This study is based on diagram theories, observe and record interaction between man and objects, objects and space, and space and time in a modern nuclear family. Construct a method for diagram to systematically and visualized describe the space plan of a modern nuclear family toward a intelligent design, to assist designer to retrieve information and check/review event pattern of past and present.

Keywords: Data Analysis, information model, digital diagram, context aware

20 Generation of Quasi-Measurement Data for On-Line Process Data Analysis

Authors: Hyun-Woo Cho


For ensuring the safety of a manufacturing process one should quickly identify an assignable cause of a fault in an on-line basis. To this end, many statistical techniques including linear and nonlinear methods have been frequently utilized. However, such methods possessed a major problem of small sample size, which is mostly attributed to the characteristics of empirical models used for reference models. This work presents a new method to overcome the insufficiency of measurement data in the monitoring and diagnosis tasks. Some quasi-measurement data are generated from existing data based on the two indices of similarity and importance. The performance of the method is demonstrated using a real data set. The results turn out that the presented methods are able to handle the insufficiency problem successfully. In addition, it is shown to be quite efficient in terms of computational speed and memory usage, and thus on-line implementation of the method is straightforward for monitoring and diagnosis purposes.

Keywords: Data Analysis, Diagnosis, Quality Control, monitoring, process data

19 IT-Aided Business Process Enabling Real-Time Analysis of Candidates for Clinical Trials

Authors: Matthieu-P. Schapranow


Recruitment of participants for clinical trials requires the screening of a big number of potential candidates, i.e. the testing for trial-specific inclusion and exclusion criteria, which is a time-consuming and complex task. Today, a significant amount of time is spent on identification of adequate trial participants as their selection may affect the overall study results. We introduce a unique patient eligibility metric, which allows systematic ranking and classification of candidates based on trial-specific filter criteria. Our web application enables real-time analysis of patient data and assessment of candidates using freely definable inclusion and exclusion criteria. As a result, the overall time required for identifying eligible candidates is tremendously reduced whilst additional degrees of freedom for evaluating the relevance of individual candidates are introduced by our contribution.

Keywords: Data Analysis, clinical trials, Clustering, Screening, in-memory technology, eligibility metric

18 Separating Permanent and Induced Magnetic Signature: A Simple Approach

Authors: O. J. G. Somsen, G. P. M. Wagemakers


Magnetic signature detection provides sensitive detection of metal objects, especially in the natural environment. Our group is developing a tabletop setup for magnetic signatures of various small and model objects. A particular issue is the separation of permanent and induced magnetization. While the latter depends only on the composition and shape of the object, the former also depends on the magnetization history. With common deperming techniques, a significant permanent signature may still remain, which confuses measurements of the induced component. We investigate a basic technique of separating the two. Measurements were done by moving the object along an aluminum rail while the three field components are recorded by a detector attached near the center. This is done first with the rail parallel to the Earth magnetic field and then with anti-parallel orientation. The reversal changes the sign of the induced- but not the permanent magnetization so that the two can be separated. Our preliminary results on a small iron block show excellent reproducibility. A considerable permanent magnetization was indeed present, resulting in a complex asymmetric signature. After separation, a much more symmetric induced signature was obtained that can be studied in detail and compared with theoretical calculations.

Keywords: Data Analysis, magnetization, magnetic signature, deperming techniques

17 Quantifying Individual Performance of Pakistani Cricket Players

Authors: Kasif Khan, Azlan Allahwala, Moiz Ali, Hasan Lodhi, Umer Amjad


The number of runs scored by batsmen and wickets taken by bowlers serves as a natural way of quantifying the performance of a cricketer. Traditionally the batsmen and bowlers are rated on their batting or bowling average respectively. However, in a game like Cricket, it is not sufficient to evaluate performance on the basis of average. The biasness in selecting batsman and bowler on the basis of their past performance. The objective is to predict the best player and comparing their performance on the basis of venue, opponent, weather, and particular position. On the basis of predictions and analysis, and comparison the best team is selected for next upcoming series of Pakistan. The system is based and will be built to aid analyst in finding best possible team combination of Pakistan for a particular match and by providing them with advisories so that they can select the best possible team combination. This will also help the team management in identifying a perfect batting order and the bowling order for each match.

Keywords: Data Analysis, cricket, Pakistan cricket players, quantifying individual performance

16 The Application of Data Mining Technology in Building Energy Consumption Data Analysis

Authors: Liang Zhao, Jili Zhang, Chongquan Zhong


Energy consumption data, in particular those involving public buildings, are impacted by many factors: the building structure, climate/environmental parameters, construction, system operating condition, and user behavior patterns. Traditional methods for data analysis are insufficient. This paper delves into the data mining technology to determine its application in the analysis of building energy consumption data including energy consumption prediction, fault diagnosis, and optimal operation. Recent literature are reviewed and summarized, the problems faced by data mining technology in the area of energy consumption data analysis are enumerated, and research points for future studies are given.

Keywords: Data Mining, Data Analysis, Optimization, prediction, building operational performance

15 Exploration of RFID in Healthcare: A Data Mining Approach

Authors: Shilpa Balan


Radio Frequency Identification, also popularly known as RFID is used to automatically identify and track tags attached to items. This study focuses on the application of RFID in healthcare. The adoption of RFID in healthcare is a crucial technology to patient safety and inventory management. Data from RFID tags are used to identify the locations of patients and inventory in real time. Medical errors are thought to be a prominent cause of loss of life and injury. The major advantage of RFID application in healthcare industry is the reduction of medical errors. The healthcare industry has generated huge amounts of data. By discovering patterns and trends within the data, big data analytics can help improve patient care and lower healthcare costs. The number of increasing research publications leading to innovations in RFID applications shows the importance of this technology. This study explores the current state of research of RFID in healthcare using a text mining approach. No study has been performed yet on examining the current state of RFID research in healthcare using a data mining approach. In this study, related articles were collected on RFID from healthcare journal and news articles. Articles collected were from the year 2000 to 2015. Significant keywords on the topic of focus are identified and analyzed using open source data analytics software such as Rapid Miner. These analytical tools help extract pertinent information from massive volumes of data. It is seen that the main benefits of adopting RFID technology in healthcare include tracking medicines and equipment, upholding patient safety, and security improvement. The real-time tracking features of RFID allows for enhanced supply chain management. By productively using big data, healthcare organizations can gain significant benefits. Big data analytics in healthcare enables improved decisions by extracting insights from large volumes of data.

Keywords: Data Mining, Data Analysis, Healthcare, RFID

14 Statistical Analysis of Interferon-γ for the Effectiveness of an Anti-Tuberculous Treatment

Authors: Shishen Xie, Yingda L. Xie


Tuberculosis (TB) is a potentially serious infectious disease that remains a health concern. The Interferon Gamma Release Assay (IGRA) is a blood test to find out if an individual is tuberculous positive or negative. This study applies statistical analysis to the clinical data of interferon-gamma levels of seventy-three subjects who diagnosed pulmonary TB in an anti-tuberculous treatment. Data analysis is performed to determine if there is a significant decline in interferon-gamma levels for the subjects during a period of six months, and to infer if the anti-tuberculous treatment is effective.

Keywords: Data Analysis, Statistical Methods, interferon gamma release assay, tuberculosis infection

13 Development of Energy Benchmarks Using Mandatory Energy and Emissions Reporting Data: Ontario Post-Secondary Residences

Authors: C. Xavier Mendieta, J. J McArthur


Governments are playing an increasingly active role in reducing carbon emissions, and a key strategy has been the introduction of mandatory energy disclosure policies. These policies have resulted in a significant amount of publicly available data, providing researchers with a unique opportunity to develop location-specific energy and carbon emission benchmarks from this data set, which can then be used to develop building archetypes and used to inform urban energy models. This study presents the development of such a benchmark using the public reporting data. The data from Ontario’s Ministry of Energy for Post-Secondary Educational Institutions are being used to develop a series of building archetype dynamic building loads and energy benchmarks to fill a gap in the currently available building database. This paper presents the development of a benchmark for college and university residences within ASHRAE climate zone 6 areas in Ontario using the mandatory disclosure energy and greenhouse gas emissions data. The methodology presented includes data cleaning, statistical analysis, and benchmark development, and lessons learned from this investigation are presented and discussed to inform the development of future energy benchmarks from this larger data set. The key findings from this initial benchmarking study are: (1) the importance of careful data screening and outlier identification to develop a valid dataset; (2) the key features used to develop a model of the data are building age, size, and occupancy schedules and these can be used to estimate energy consumption; and (3) policy changes affecting the primary energy generation significantly affected greenhouse gas emissions, and consideration of these factors was critical to evaluate the validity of the reported data.

Keywords: Data Analysis, GHG emissions, building archetypes, energy benchmarks

12 Neural Network Based Control Algorithm for Inhabitable Spaces Applying Emotional Domotics

Authors: Sergio A. Navarro Tuch, Martin Rogelio Bustamante Bello, Leopoldo Julian Lechuga Lopez


In recent years, Mexico’s population has seen a rise of different physiological and mental negative states. Two main consequences of this problematic are deficient work performance and high levels of stress generating and important impact on a person’s physical, mental and emotional health. Several approaches, such as the use of audiovisual stimulus to induce emotions and modify a person’s emotional state, can be applied in an effort to decreases these negative effects. With the use of different non-invasive physiological sensors such as EEG, luminosity and face recognition we gather information of the subject’s current emotional state. In a controlled environment, a subject is shown a series of selected images from the International Affective Picture System (IAPS) in order to induce a specific set of emotions and obtain information from the sensors. The raw data obtained is statistically analyzed in order to filter only the specific groups of information that relate to a subject’s emotions and current values of the physical variables in the controlled environment such as, luminosity, RGB light color, temperature, oxygen level and noise. Finally, a neural network based control algorithm is given the data obtained in order to feedback the system and automate the modification of the environment variables and audiovisual content shown in an effort that these changes can positively alter the subject’s emotional state. During the research, it was found that the light color was directly related to the type of impact generated by the audiovisual content on the subject’s emotional state. Red illumination increased the impact of violent images and green illumination along with relaxing images decreased the subject’s levels of anxiety. Specific differences between men and women were found as to which type of images generated a greater impact in either gender. The population sample was mainly constituted by college students whose data analysis showed a decreased sensibility to violence towards humans. Despite the early stage of the control algorithm, the results obtained from the population sample give us a better insight into the possibilities of emotional domotics and the applications that can be created towards the improvement of performance in people’s lives. The objective of this research is to create a positive impact with the application of technology to everyday activities; nonetheless, an ethical problem arises since this can also be applied to control a person’s emotions and shift their decision making.

Keywords: Data Analysis, Neural Network, performance improvement, emotional domotics

11 Mobile Learning: Toward Better Understanding of Compression Techniques

Authors: Farouk Lawan Gambo


Data compression shrinks files into fewer bits then their original presentation. It has more advantage on internet because the smaller a file, the faster it can be transferred but learning most of the concepts in data compression are abstract in nature therefore making them difficult to digest by some students (Engineers in particular). To determine the best approach toward learning data compression technique, this paper first study the learning preference of engineering students who tend to have strong active, sensing, visual and sequential learning preferences, the paper also study the advantage that mobility of learning have experienced; Learning at the point of interest, efficiency, connection, and many more. A survey is carried out with some reasonable number of students, through random sampling to see whether considering the learning preference and advantages in mobility of learning will give a promising improvement over the traditional way of learning. Evidence from data analysis using Ms-Excel as a point of concern for error-free findings shows that there is significance different in the students after using learning content provided on smart phone, also the result of the findings presented in, bar charts and pie charts interpret that mobile learning has to be promising feature of learning.

Keywords: Data Analysis, compression techniques, learning content, traditional learning approach

10 Analyzing Keyword Networks for the Identification of Correlated Research Topics

Authors: Thiago M. R. Dias, Gray F. Moita, Patrícia M. Dias


The production and publication of scientific works have increased significantly in the last years, being the Internet the main factor of access and distribution of these works. Faced with this, there is a growing interest in understanding how scientific research has evolved, in order to explore this knowledge to encourage research groups to become more productive. Therefore, the objective of this work is to explore repositories containing data from scientific publications and to characterize keyword networks of these publications, in order to identify the most relevant keywords, and to highlight those that have the greatest impact on the network. To do this, each article in the study repository has its keywords extracted and in this way the network is  characterized, after which several metrics for social network analysis are applied for the identification of the highlighted keywords.

Keywords: Data Analysis, Bibliometrics, Scientometrics, extraction and data integration

9 Urban Noise and Air Quality: Correlation between Air and Noise Pollution; Sensors, Data Collection, Analysis and Mapping in Urban Planning

Authors: Massimiliano Condotta, Giovanni Borga, Chiara Scanagatta, Paolo Ruggeri


Architects and urban planners, when designing and renewing cities, have to face a complex set of problems, including the issues of noise and air pollution which are considered as hot topics (i.e., the Clean Air Act of London and the Soundscape definition). It is usually taken for granted that these problems go by together because the noise pollution present in cities is often linked to traffic and industries, and these produce air pollutants as well. Traffic congestion can create both noise pollution and air pollution, because NO₂ is mostly created from the oxidation of NO, and these two are notoriously produced by processes of combustion at high temperatures (i.e., car engines or thermal power stations). We can see the same process for industrial plants as well. What have to be investigated – and is the topic of this paper – is whether or not there really is a correlation between noise pollution and air pollution (taking into account NO₂) in urban areas. To evaluate if there is a correlation, some low-cost methodologies will be used. For noise measurements, the OpeNoise App will be installed on an Android phone. The smartphone will be positioned inside a waterproof box, to stay outdoor, with an external battery to allow it to collect data continuously. The box will have a small hole to install an external microphone, connected to the smartphone, which will be calibrated to collect the most accurate data. For air, pollution measurements will be used the AirMonitor device, an Arduino board to which the sensors, and all the other components, are plugged. After assembling the sensors, they will be coupled (one noise and one air sensor) and placed in different critical locations in the area of Mestre (Venice) to map the existing situation. The sensors will collect data for a fixed period of time to have an input for both week and weekend days, in this way it will be possible to see the changes of the situation during the week. The novelty is that data will be compared to check if there is a correlation between the two pollutants using graphs that should show the percentage of pollution instead of the values obtained with the sensors. To do so, the data will be converted to fit on a scale that goes up to 100% and will be shown thru a mapping of the measurement using GIS methods. Another relevant aspect is that this comparison can help to choose which are the right mitigation solutions to be applied in the area of the analysis because it will make it possible to solve both the noise and the air pollution problem making only one intervention. The mitigation solutions must consider not only the health aspect but also how to create a more livable space for citizens. The paper will describe in detail the methodology and the technical solution adopted for the realization of the sensors, the data collection, noise and pollution mapping and analysis.

Keywords: Data Analysis, Air quality, Noise Pollution, Particulate Matter, data collection, Noise Mapping, NO2

8 Applying Critical Realism to Qualitative Social Work Research: A Critical Realist Approach for Social Work Thematic Analysis Method

Authors: Lynne Soon-Chean Park


Critical Realism (CR) has emerged as an alternative to both the positivist and constructivist perspectives that have long dominated social work research. By unpacking the epistemic weakness of two dogmatic perspectives, CR provides a useful philosophical approach that incorporates the ontological objectivist and subjectivist stance. The CR perspective suggests an alternative approach for social work researchers who have long been looking to engage in the complex interplay between perceived reality at the empirical level and the objective reality that lies behind the empirical event as a causal mechanism. However, despite the usefulness of CR in informing social work research, little practical guidance is available about how CR can inform methodological considerations in social work research studies. This presentation aims to provide a detailed description of CR-informed thematic analysis by drawing examples from a social work doctoral research of Korean migrants’ experiences and understanding of trust associated with their settlement experience in New Zealand. Because of its theoretical flexibility and accessibility as a qualitative analysis method, thematic analysis can be applied as a method that works both to search for the demi-regularities of the collected data and to identify the causal mechanisms that lay behind the empirical data. In so doing, this presentation seeks to provide a concrete and detailed exemplar for social work researchers wishing to employ CR in their qualitative thematic analysis process.

Keywords: Data Analysis, Epistemology, Social Work Research, Research Methodology, Critical Realism, thematic analysis

7 Empirical Orthogonal Functions Analysis of Hydrophysical Characteristics in the Shira Lake in Southern Siberia

Authors: Olga S. Volodko, Lidiya A. Kompaniets, Ludmila V. Gavrilova


The method of empirical orthogonal functions is the method of data analysis with a complex spatial-temporal structure. This method allows us to decompose the data into a finite number of modes determined by empirically finding the eigenfunctions of data correlation matrix. The modes have different scales and can be associated with various physical processes. The empirical orthogonal function method has been widely used for the analysis of hydrophysical characteristics, for example, the analysis of sea surface temperatures in the Western North Atlantic, ocean surface currents in the North Carolina, the study of tropical wave disturbances etc. The method used in this study has been applied to the analysis of temperature and velocity measurements in saline Lake Shira (Southern Siberia, Russia). Shira is a shallow lake with the maximum depth of 25 m. The lake Shira can be considered as a closed water site because of it has one small river providing inflow and but it has no outflows. The main factor that causes the motion of fluid is variable wind flows. In summer the lake is strongly stratified by temperature and saline. Long-term measurements of the temperatures and currents were conducted at several points during summer 2014-2015. The temperature has been measured with an accuracy of 0.1 ºC. The data were analyzed using the empirical orthogonal function method in the real version. The first empirical eigenmode accounts for 70-80 % of the energy and can be interpreted as temperature distribution with a thermocline. A thermocline is a thermal layer where the temperature decreases rapidly from the mixed upper layer of the lake to much colder deep water. The higher order modes can be interpreted as oscillations induced by internal waves. The currents measurements were recorded using Acoustic Doppler Current Profilers 600 kHz and 1200 kHz. The data were analyzed using the empirical orthogonal function method in the complex version. The first empirical eigenmode accounts for about 40 % of the energy and corresponds to the Ekman spiral occurring in the case of a stationary homogeneous fluid. Other modes describe the effects associated with the stratification of fluids. The second and next empirical eigenmodes were associated with dynamical modes. These modes were obtained for a simplified model of inhomogeneous three-level fluid at a water site with a flat bottom.

Keywords: Data Analysis, stratified fluid, thermocline, Ekman spiral, empirical orthogonal functions

6 Fuzzy Set Qualitative Comparative Analysis in Business Models' Study

Authors: K. Debkowska


The aim of this article is presenting the possibilities of using Fuzzy Set Qualitative Comparative Analysis (fsQCA) in researches concerning business models of enterprises. FsQCA is a bridge between quantitative and qualitative researches. It's potential can be used in analysis and evaluation of business models. The article presents the results of a study conducted on the basis of enterprises belonging to different sectors: transport and logistics, industry, building construction, and trade. The enterprises have been researched taking into account the components of business models and the financial condition of companies. Business models are areas of complex and heterogeneous nature. The use of fsQCA has enabled to answer the following question: which components of a business model and in which configuration influence better financial condition of enterprises. The analysis has been performed separately for particular sectors. This enabled to compare the combinations of business models' components which actively influence the financial condition of enterprises in analyzed sectors. The following components of business models were analyzed for the purposes of the study: Key Partners, Key Activities, Key Resources, Value Proposition, Channels, Cost Structure, Revenue Streams, Customer Segment and Customer Relationships. These components of the study constituted the variables shaping the financial results of enterprises. The results of the study lead us to believe that fsQCA can help in analyzing and evaluating a business model, which is important in terms of making a business decision about the business model used or its change. In addition, results obtained by fsQCA can be applied by all stakeholders connected with the company.

Keywords: Data Analysis, Business Models, fsQCA, components of business models

5 Modeling the Demand for the Healthcare Services Using Data Analysis Techniques

Authors: Elizaveta S. Prokofyeva, Svetlana V. Maltseva, Roman D. Zaitsev


Rapidly evolving modern data analysis technologies in healthcare play a large role in understanding the operation of the system and its characteristics. Nowadays, one of the key tasks in urban healthcare is to optimize the resource allocation. Thus, the application of data analysis in medical institutions to solve optimization problems determines the significance of this study. The purpose of this research was to establish the dependence between the indicators of the effectiveness of the medical institution and its resources. Hospital discharges by diagnosis; hospital days of in-patients and in-patient average length of stay were selected as the performance indicators and the demand of the medical facility. The hospital beds by type of care, medical technology (magnetic resonance tomography, gamma cameras, angiographic complexes and lithotripters) and physicians characterized the resource provision of medical institutions for the developed models. The data source for the research was an open database of the statistical service Eurostat. The choice of the source is due to the fact that the databases contain complete and open information necessary for research tasks in the field of public health. In addition, the statistical database has a user-friendly interface that allows you to quickly build analytical reports. The study provides information on 28 European for the period from 2007 to 2016. For all countries included in the study, with the most accurate and complete data for the period under review, predictive models were developed based on historical panel data. An attempt to improve the quality and the interpretation of the models was made by cluster analysis of the investigated set of countries. The main idea was to assess the similarity of the joint behavior of the variables throughout the time period under consideration to identify groups of similar countries and to construct the separate regression models for them. Therefore, the original time series were used as the objects of clustering. The hierarchical agglomerate algorithm k-medoids was used. The sampled objects were used as the centers of the clusters obtained, since determining the centroid when working with time series involves additional difficulties. The number of clusters used the silhouette coefficient. After the cluster analysis it was possible to significantly improve the predictive power of the models: for example, in the one of the clusters, MAPE error was only 0,82%, which makes it possible to conclude that this forecast is highly reliable in the short term. The obtained predicted values of the developed models have a relatively low level of error and can be used to make decisions on the resource provision of the hospital by medical personnel. The research displays the strong dependencies between the demand for the medical services and the modern medical equipment variable, which highlights the importance of the technological component for the successful development of the medical facility. Currently, data analysis has a huge potential, which allows to significantly improving health services. Medical institutions that are the first to introduce these technologies will certainly have a competitive advantage.

Keywords: Data Analysis, Healthcare, medical facilities, demand modeling

4 Empirical Study of Running Correlations in Exam Marks: Same Statistical Pattern as Chance

Authors: Weisi Guo


It is well established that there may be running correlations in sequential exam marks due to students sitting in the order of course registration patterns. As such, a random and non-sequential sampling of exam marks is a standard recommended practice. Here, the paper examines a large number of exam data stretching several years across different modules to see the degree to which it is true. Using the real mark distribution as a generative process, it was found that random simulated data had no more sequential randomness than the real data. That is to say, the running correlations that one often observes are statistically identical to chance. Digging deeper, it was found that some high running correlations have students that indeed share a common course history and make similar mistakes. However, at the statistical scale of a module question, the combined effect is statistically similar to the random shuffling of papers. As such, there may not be the need to take random samples for marks, but it still remains good practice to mark papers in a random sequence to reduce the repetitive marking bias and errors.

Keywords: Data Analysis, empirical study, exams, marking

3 Five Years Analysis and Mitigation Plans on Adjustment Orders Impacts on Projects in Kuwait's Oil and Gas Sector

Authors: Rawan K. Al-Duaij, Salem A. Al-Salem


Projects, the unique and temporary process of achieving a set of requirements have always been challenging; Planning the schedule and budget, managing the resources and risks are mostly driven by a similar past experience or the technical consultations of experts in the matter. With that complexity of Projects in Scope, Time, and execution environment, Adjustment Orders are tools to reflect changes to the original project parameters after Contract signature. Adjustment Orders are the official/legal amendments to the terms and conditions of a live Contract. Reasons for issuing Adjustment Orders arise from changes in Contract scope, technical requirement and specification resulting in scope addition, deletion, or alteration. It can be as well a combination of most of these parameters resulting in an increase or decrease in time and/or cost. Most business leaders (handling projects in the interest of the owner) refrain from using Adjustment Orders considering their main objectives of staying within budget and on schedule. Success in managing the changes results in uninterrupted execution and agreed project costs as well as schedule. Nevertheless, this is not always practically achievable. In this paper, a detailed study through utilizing Industrial Engineering & Systems Management tools such as Six Sigma, Data Analysis, and Quality Control were implemented on the organization’s five years records of the issued Adjustment Orders in order to investigate their prevalence, and time and cost impact. The analysis outcome revealed and helped to identify and categorize the predominant causations with the highest impacts, which were considered most in recommending the corrective measures to reach the objective of minimizing the Adjustment Orders impacts. Data analysis demonstrated no specific trend in the AO frequency in past five years; however, time impact is more than the cost impact. Although Adjustment Orders might never be avoidable; this analysis offers’ some insight to the procedural gaps, and where it is highly impacting the organization. Possible solutions are concluded such as improving project handling team’s coordination and communication, utilizing a blanket service contract, and modifying the projects gate system procedures to minimize the possibility of having similar struggles in future. Projects in the Oil and Gas sector are always evolving and demand a certain amount of flexibility to sustain the goals of the field. As it will be demonstrated, the uncertainty of project parameters, in adequate project definition, operational constraints and stringent procedures are main factors resulting in the need for Adjustment Orders and accordingly the recommendation will be to address that challenge.

Keywords: Systems Management, Data Analysis, adjustment orders, oil and gas sector

2 Institutional Effectiveness in Fostering Student Retention and Success in First Year

Authors: Naziema B. Jappie


The objective of this study is to examine the relationship between college readiness characteristics and learning outcome assessment scores. About this, it is important to examine the first-year retention and success rate. In order to undertake this study, it will be necessary to look at proficiency levels on general and domain-specific knowledge and skills reflected on national benchmark test scores (NBT), in-college interventions and course-taking patterns. Preliminary results based on data from more than 1000 students suggest that there is a positive association between NBT scores and students’ 1st-year college GPA and their retention status. For example, 63% of students with a proficient level of math skills in the NBT had the highest level of GPA at the end of 1st-year of college in comparison to 56% of those who started with a primary or intermediate level, respectively. The retention rates among those with proficiency levels were also higher than those with basic or intermediate levels (98% vs. 93% and 88%, respectively). By the end of 3rd year in college, students with intermediate or proficient entering NBT math skills had 7% and 8% of dropout rate, compared to 14% for those started at primary level; a greater percentage of students qualified by the end of 3rd-year qualified among proficient students than that among intermediate or basic level students (50% vs. 44% and 27% respectively). The findings of this study added knowledge to the field in South Africa and are expected to help stakeholders and policymakers to better understand college learning and challenges for students with disadvantaged backgrounds and provide empirical evidence in support of related practices and policies.

Keywords: Data Analysis, Policy, Performance, Assessment, Student success, Proficiency

1 Predicting Success and Failure in Drug Development Using Text Analysis

Authors: Zhi Hao Chow, Cian Mulligan, Jack Walsh, Antonio Garzon Vico, Dimitar Krastev


Drug development is resource-intensive, time-consuming, and increasingly expensive with each developmental stage. The success rates of drug development are also relatively low, and the resources committed are wasted with each failed candidate. As such, a reliable method of predicting the success of drug development is in demand. The hypothesis was that some examples of failed drug candidates are pushed through developmental pipelines based on false confidence and may possess common linguistic features identifiable through sentiment analysis. Here, the concept of using text analysis to discover such features in research publications and investor reports as predictors of success was explored. R studios were used to perform text mining and lexicon-based sentiment analysis to identify affective phrases and determine their frequency in each document, then using SPSS to determine the relationship between our defined variables and the accuracy of predicting outcomes. A total of 161 publications were collected and categorised into 4 groups: (i) Cancer treatment, (ii) Neurodegenerative disease treatment, (iii) Vaccines, and (iv) Others (containing all other drugs that do not fit into the 3 categories). Text analysis was then performed on each document using 2 separate datasets (BING and AFINN) in R within the category of drugs to determine the frequency of positive or negative phrases in each document. A relative positivity and negativity value were then calculated by dividing the frequency of phrases with the word count of each document. Regression analysis was then performed with SPSS statistical software on each dataset (values from using BING or AFINN dataset during text analysis) using a random selection of 61 documents to construct a model. The remaining documents were then used to determine the predictive power of the models. Model constructed from BING predicts the outcome of drug performance in clinical trials with an overall percentage of 65.3%. AFINN model had a lower accuracy at predicting outcomes compared to the BING model at 62.5% but was not effective at predicting the failure of drugs in clinical trials. Overall, the study did not show significant efficacy of the model at predicting outcomes of drugs in development. Many improvements may need to be made to later iterations of the model to sufficiently increase the accuracy.

Keywords: Data Analysis, Drug Development, sentiment analysis, text-mining

