Search results for: time series data mining
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 38060

Search results for: time series data mining

37280 RA-Apriori: An Efficient and Faster MapReduce-Based Algorithm for Frequent Itemset Mining on Apache Flink

Authors: Sanjay Rathee, Arti Kashyap

Abstract:

Extraction of useful information from large datasets is one of the most important research problems. Association rule mining is one of the best methods for this purpose. Finding possible associations between items in large transaction based datasets (finding frequent patterns) is most important part of the association rule mining. There exist many algorithms to find frequent patterns but Apriori algorithm always remains a preferred choice due to its ease of implementation and natural tendency to be parallelized. Many single-machine based Apriori variants exist but massive amount of data available these days is above capacity of a single machine. Therefore, to meet the demands of this ever-growing huge data, there is a need of multiple machines based Apriori algorithm. For these types of distributed applications, MapReduce is a popular fault-tolerant framework. Hadoop is one of the best open-source software frameworks with MapReduce approach for distributed storage and distributed processing of huge datasets using clusters built from commodity hardware. However, heavy disk I/O operation at each iteration of a highly iterative algorithm like Apriori makes Hadoop inefficient. A number of MapReduce-based platforms are being developed for parallel computing in recent years. Among them, two platforms, namely, Spark and Flink have attracted a lot of attention because of their inbuilt support to distributed computations. Earlier we proposed a reduced- Apriori algorithm on Spark platform which outperforms parallel Apriori, one because of use of Spark and secondly because of the improvement we proposed in standard Apriori. Therefore, this work is a natural sequel of our work and targets on implementing, testing and benchmarking Apriori and Reduced-Apriori and our new algorithm ReducedAll-Apriori on Apache Flink and compares it with Spark implementation. Flink, a streaming dataflow engine, overcomes disk I/O bottlenecks in MapReduce, providing an ideal platform for distributed Apriori. Flink's pipelining based structure allows starting a next iteration as soon as partial results of earlier iteration are available. Therefore, there is no need to wait for all reducers result to start a next iteration. We conduct in-depth experiments to gain insight into the effectiveness, efficiency and scalability of the Apriori and RA-Apriori algorithm on Flink.

Keywords: apriori, apache flink, Mapreduce, spark, Hadoop, R-Apriori, frequent itemset mining

Procedia PDF Downloads 273
37279 Strategies to Enhance Compliance of Health and Safety Standards at the Selected Mining Industries in Limpopo Province, South Africa: Occupational Health Nurse’s Perspective

Authors: Livhuwani Muthelo

Abstract:

The health and safety of the miners in the South African mining industry are guided by the regulations and standards which are anticipated to promote a healthy work environment and fatalities. It is of utmost importance for the miners to comply with these regulations/standards to protect themselves from potential occupational health and safety risks, accidents, and fatalities. The purpose of this study was to develop and validate strategies to enhance compliance with the Health and safety standards within the mining industries of Limpopo province in South Africa. A mixed-method exploratory sequential research design was adopted. The population consisted of 5350 miners. Purposive sampling was used to select the participants in the qualitative strand and stratified random sampling in the quantitative strand. Semi-structured interviews were conducted among the occupational health nurse practitioners and the health and safety team. Thematic analysis was used to generate an understanding of the interviews. In the quantitative strand, a survey was conducted using a self-administered questionnaire. Data were analysed using SPSS version 26.0. A descriptive statistical test was used in the analysis of data including frequencies, means, and standard deviation. Cronbach's alpha test was used to measure internal consistency. The integrated results revealed that there are diverse experiences related to health and safety standards compliance among the mineworkers. The main findings were challenges related to leadership compliance and also related to the cost of maintaining safety, Miner's behavior-related challenges; the impact of non-compliance on the overall health of the miners was also described, the conflict between production and safety. Health and safety compliance is not just mere compliance with regulations and standards but a culture that warrants the miners and organization to take responsibility for their behavior and actions towards health and safety. Thus taking responsibility for your well-being and other miners.

Keywords: perceptions, compliance, health and safety, legislation, standards, miners

Procedia PDF Downloads 85
37278 Automatic and High Precise Modeling for System Optimization

Authors: Stephanie Chen, Mitja Echim, Christof Büskens

Abstract:

To describe and propagate the behavior of a system mathematical models are formulated. Parameter identification is used to adapt the coefficients of the underlying laws of science. For complex systems this approach can be incomplete and hence imprecise and moreover too slow to be computed efficiently. Therefore, these models might be not applicable for the numerical optimization of real systems, since these techniques require numerous evaluations of the models. Moreover not all quantities necessary for the identification might be available and hence the system must be adapted manually. Therefore, an approach is described that generates models that overcome the before mentioned limitations by not focusing on physical laws, but on measured (sensor) data of real systems. The approach is more general since it generates models for every system detached from the scientific background. Additionally, this approach can be used in a more general sense, since it is able to automatically identify correlations in the data. The method can be classified as a multivariate data regression analysis. In contrast to many other data regression methods this variant is also able to identify correlations of products of variables and not only of single variables. This enables a far more precise and better representation of causal correlations. The basis and the explanation of this method come from an analytical background: the series expansion. Another advantage of this technique is the possibility of real-time adaptation of the generated models during operation. Herewith system changes due to aging, wear or perturbations from the environment can be taken into account, which is indispensable for realistic scenarios. Since these data driven models can be evaluated very efficiently and with high precision, they can be used in mathematical optimization algorithms that minimize a cost function, e.g. time, energy consumption, operational costs or a mixture of them, subject to additional constraints. The proposed method has successfully been tested in several complex applications and with strong industrial requirements. The generated models were able to simulate the given systems with an error in precision less than one percent. Moreover the automatic identification of the correlations was able to discover so far unknown relationships. To summarize the above mentioned approach is able to efficiently compute high precise and real-time-adaptive data-based models in different fields of industry. Combined with an effective mathematical optimization algorithm like WORHP (We Optimize Really Huge Problems) several complex systems can now be represented by a high precision model to be optimized within the user wishes. The proposed methods will be illustrated with different examples.

Keywords: adaptive modeling, automatic identification of correlations, data based modeling, optimization

Procedia PDF Downloads 384
37277 Mining in Peru and Local Governance: Assessing the Contribution of CRS Projects

Authors: Sandra Carrillo Hoyos

Abstract:

Mining activities in South America have significantly grown during the last decades, given the abundance of natural resources, the implemented governmental policies to incentivize foreign investment as well as the boom in international prices for metals and oil between 2002 and 2008. While this context allowed the region to occupy a leading position between the top producers of minerals around the world, it has also meant an increase in socio-environmental conflicts which have generated costs and negative impacts not only for the companies but especially for the governments and local communities.During the latest decade, the mining sector in Peru has faced with the social resistance of a large number of communities, which began organizing actions against the implementation of high investing projects. The dissatisfaction has derived in the prevalence of socio-environmental conflicts associated with mining activities, some of them never solved into an agreement. In order to prevent those socio-environmental conflicts and obtain the social license from local communities, most of the mining companies have developed diverse initiatives within the framework of policies and practices of corporate social responsibility (CSR). This paper has assessed the mining sector’s contribution toward the local development management along the last decade, as part of CSR strategies as well as the policies promoted by the Peruvian State. This assessment found that, in the beginning, these initiatives have been based on a philanthropic approach and were reacting to pressures from local stakeholders to maintain the consent to operate from the surrounding communities as well as to create, as a result, a harmonious atmosphere for operations. Due to the weak State presence, such practices have increased the expectations of communities related to the participation of mining companies in solving structural development problems, especially those related to primary needs, infrastructure, education, health, among others. In other words, this paper was focused on analyze in what extent these initiatives have promoted local empowerment for development planning and integrated management of natural resources from a territorial approach. From this perspective, the analysis demonstrates that, while the design and planning of social investment initiatives have improved due to the sector´s sustainability approach, many companies have developed actions beyond their competence during this process. In some cases, the referenced actions have generated dependency with communities, even though this relationship has not exempted the companies of conflict situations with unfortunate consequences. Furthermore, the social programs developed have not necessarily generated a significant impact in improving the quality of life of affected populations. In fact, it is possible to identify that those regions with high mining resources and investment are facing with a situation of poverty and high dependency on mining production. In spite of the revenues derived from mining industry, local governments have not been able to translate the royalties into sustainable development opportunities. For this reason, the proposed paper suggests some challenges for the mining sector contribution to local development based on the best practices and lessons learnt from a benchmarking for the leading mining companies.

Keywords: corporate social responsibility, local development, mining, socio-environmental conflict

Procedia PDF Downloads 381
37276 Modification Encryption Time and Permutation in Advanced Encryption Standard Algorithm

Authors: Dalal N. Hammod, Ekhlas K. Gbashi

Abstract:

Today, cryptography is used in many applications to achieve high security in data transmission and in real-time communications. AES has long gained global acceptance and is used for securing sensitive data in various industries but has suffered from slow processing and take a large time to transfer data. This paper suggests a method to enhance Advance Encryption Standard (AES) Algorithm based on time and permutation. The suggested method (MAES) is based on modifying the SubByte and ShiftRrows in the encryption part and modification the InvSubByte and InvShiftRows in the decryption part. After the implementation of the proposal and testing the results, the Modified AES achieved good results in accomplishing the communication with high performance criteria in terms of randomness, encryption time, storage space, and avalanche effects. The proposed method has good randomness to ciphertext because this method passed NIST statistical tests against attacks; also, (MAES) reduced the encryption time by (10 %) than the time of the original AES; therefore, the modified AES is faster than the original AES. Also, the proposed method showed good results in memory utilization where the value is (54.36) for the MAES, but the value for the original AES is (66.23). Also, the avalanche effects used for calculating diffusion property are (52.08%) for the modified AES and (51.82%) percentage for the original AES.

Keywords: modified AES, randomness test, encryption time, avalanche effects

Procedia PDF Downloads 228
37275 Research on Air pollution Spatiotemporal Forecast Model Based on LSTM

Authors: JingWei Yu, Hong Yang Yu

Abstract:

At present, the increasingly serious air pollution in various cities of China has made people pay more attention to the air quality index(hereinafter referred to as AQI) of their living areas. To face this situation, it is of great significance to predict air pollution in heavily polluted areas. In this paper, based on the time series model of LSTM, a spatiotemporal prediction model of PM2.5 concentration in Mianyang, Sichuan Province, is established. The model fully considers the temporal variability and spatial distribution characteristics of PM2.5 concentration. The spatial correlation of air quality at different locations is based on the Air quality status of other nearby monitoring stations, including AQI and meteorological data to predict the air quality of a monitoring station. The experimental results show that the method has good prediction accuracy that the fitting degree with the actual measured data reaches more than 0.7, which can be applied to the modeling and prediction of the spatial and temporal distribution of regional PM2.5 concentration.

Keywords: LSTM, PM2.5, neural networks, spatio-temporal prediction

Procedia PDF Downloads 119
37274 Shark Detection and Classification with Deep Learning

Authors: Jeremy Jenrette, Z. Y. C. Liu, Pranav Chimote, Edward Fox, Trevor Hastie, Francesco Ferretti

Abstract:

Suitable shark conservation depends on well-informed population assessments. Direct methods such as scientific surveys and fisheries monitoring are adequate for defining population statuses, but species-specific indices of abundance and distribution coming from these sources are rare for most shark species. We can rapidly fill these information gaps by boosting media-based remote monitoring efforts with machine learning and automation. We created a database of shark images by sourcing 24,546 images covering 219 species of sharks from the web application spark pulse and the social network Instagram. We used object detection to extract shark features and inflate this database to 53,345 images. We packaged object-detection and image classification models into a Shark Detector bundle. We developed the Shark Detector to recognize and classify sharks from videos and images using transfer learning and convolutional neural networks (CNNs). We applied these models to common data-generation approaches of sharks: boosting training datasets, processing baited remote camera footage and online videos, and data-mining Instagram. We examined the accuracy of each model and tested genus and species prediction correctness as a result of training data quantity. The Shark Detector located sharks in baited remote footage and YouTube videos with an average accuracy of 89\%, and classified located subjects to the species level with 69\% accuracy (n =\ eight species). The Shark Detector sorted heterogeneous datasets of images sourced from Instagram with 91\% accuracy and classified species with 70\% accuracy (n =\ 17 species). Data-mining Instagram can inflate training datasets and increase the Shark Detector’s accuracy as well as facilitate archiving of historical and novel shark observations. Base accuracy of genus prediction was 68\% across 25 genera. The average base accuracy of species prediction within each genus class was 85\%. The Shark Detector can classify 45 species. All data-generation methods were processed without manual interaction. As media-based remote monitoring strives to dominate methods for observing sharks in nature, we developed an open-source Shark Detector to facilitate common identification applications. Prediction accuracy of the software pipeline increases as more images are added to the training dataset. We provide public access to the software on our GitHub page.

Keywords: classification, data mining, Instagram, remote monitoring, sharks

Procedia PDF Downloads 99
37273 Testing the Change in Correlation Structure across Markets: High-Dimensional Data

Authors: Malay Bhattacharyya, Saparya Suresh

Abstract:

The Correlation Structure associated with a portfolio is subjected to vary across time. Studying the structural breaks in the time-dependent Correlation matrix associated with a collection had been a subject of interest for a better understanding of the market movements, portfolio selection, etc. The current paper proposes a methodology for testing the change in the time-dependent correlation structure of a portfolio in the high dimensional data using the techniques of generalized inverse, singular valued decomposition and multivariate distribution theory which has not been addressed so far. The asymptotic properties of the proposed test are derived. Also, the performance and the validity of the method is tested on a real data set. The proposed test performs well for detecting the change in the dependence of global markets in the context of high dimensional data.

Keywords: correlation structure, high dimensional data, multivariate distribution theory, singular valued decomposition

Procedia PDF Downloads 109
37272 Lead and Cadmium Spatial Pattern and Risk Assessment around Coal Mine in Hyrcanian Forest, North Iran

Authors: Mahsa Tavakoli, Seyed Mohammad Hojjati, Yahya Kooch

Abstract:

In this study, the effect of coal mining activities on lead and cadmium concentrations and distribution in soil was investigated in Hyrcanian forest, North Iran. 16 plots (20×20 m2) were established by systematic-randomly (60×60 m2) in an area of 4 ha (200×200 m2-mine entrance placed at center). An area adjacent to the mine was not affected by the mining activity; considered as the controlled area. In order to investigate soil lead and cadmium concentration, one sample was taken from the 0-10 cm in each plot. To study the spatial pattern of soil properties and lead and cadmium concentrations in the mining area, an area of 80×80m2 (the mine as the center) was considered and 80 soil samples were systematic-randomly taken (10 m intervals). Geostatistical analysis was performed via Kriging method and GS+ software (version 5.1). In order to estimate the impact of coal mining activities on soil quality, pollution index was measured. Lead and cadmium concentrations were significantly higher in mine area (Pb: 10.97±0.30, Cd: 184.47±6.26 mg.kg-1) in comparison to control area (Pb: 9.42±0.17, Cd: 131.71±15.77 mg.kg-1). The mean values of the PI index indicate that Pb (1.16) and Cd (1.77) presented slightly polluted. Results of the NIPI index showed that Pb (1.44) and Cd (2.52) presented slight pollution and moderate pollution respectively. Results of variography and kriging method showed that it is possible to prepare interpolation maps of lead and cadmium around the mining areas in Hyrcanian forest. According to results of pollution and risk assessments, forest soil was contaminated by heavy metals (lead and cadmium); therefore, using reclamation and remediation techniques in these areas is necessary.

Keywords: traditional coal mining, heavy metals, pollution indicators, geostatistics, Caspian forest

Procedia PDF Downloads 166
37271 Unsupervised Text Mining Approach to Early Warning System

Authors: Ichihan Tai, Bill Olson, Paul Blessner

Abstract:

Traditional early warning systems that alarm against crisis are generally based on structured or numerical data; therefore, a system that can make predictions based on unstructured textual data, an uncorrelated data source, is a great complement to the traditional early warning systems. The Chicago Board Options Exchange (CBOE) Volatility Index (VIX), commonly referred to as the fear index, measures the cost of insurance against market crash, and spikes in the event of crisis. In this study, news data is consumed for prediction of whether there will be a market-wide crisis by predicting the movement of the fear index, and the historical references to similar events are presented in an unsupervised manner. Topic modeling-based prediction and representation are made based on daily news data between 1990 and 2015 from The Wall Street Journal against VIX index data from CBOE.

Keywords: early warning system, knowledge management, market prediction, topic modeling.

Procedia PDF Downloads 322
37270 Enhancing Predictive Accuracy in Pharmaceutical Sales through an Ensemble Kernel Gaussian Process Regression Approach

Authors: Shahin Mirshekari, Mohammadreza Moradi, Hossein Jafari, Mehdi Jafari, Mohammad Ensaf

Abstract:

This research employs Gaussian Process Regression (GPR) with an ensemble kernel, integrating Exponential Squared, Revised Matern, and Rational Quadratic kernels to analyze pharmaceutical sales data. Bayesian optimization was used to identify optimal kernel weights: 0.76 for Exponential Squared, 0.21 for Revised Matern, and 0.13 for Rational Quadratic. The ensemble kernel demonstrated superior performance in predictive accuracy, achieving an R² score near 1.0, and significantly lower values in MSE, MAE, and RMSE. These findings highlight the efficacy of ensemble kernels in GPR for predictive analytics in complex pharmaceutical sales datasets.

Keywords: Gaussian process regression, ensemble kernels, bayesian optimization, pharmaceutical sales analysis, time series forecasting, data analysis

Procedia PDF Downloads 56
37269 Text Mining Past Medical History in Electrophysiological Studies

Authors: Roni Ramon-Gonen, Amir Dori, Shahar Shelly

Abstract:

Background and objectives: Healthcare professionals produce abundant textual information in their daily clinical practice. The extraction of insights from all the gathered information, mainly unstructured and lacking in normalization, is one of the major challenges in computational medicine. In this respect, text mining assembles different techniques to derive valuable insights from unstructured textual data, so it has led to being especially relevant in Medicine. Neurological patient’s history allows the clinician to define the patient’s symptoms and along with the result of the nerve conduction study (NCS) and electromyography (EMG) test, assists in formulating a differential diagnosis. Past medical history (PMH) helps to direct the latter. In this study, we aimed to identify relevant PMH, understand which PMHs are common among patients in the referral cohort and documented by the medical staff, and examine the differences by sex and age in a large cohort based on textual format notes. Methods: We retrospectively identified all patients with abnormal NCS between May 2016 to February 2022. Age, gender, and all NCS attributes reports were recorded, including the summary text. All patients’ histories were extracted from the text report by a query. Basic text cleansing and data preparation were performed, as well as lemmatization. Very popular words (like ‘left’ and ‘right’) were deleted. Several words were replaced with their abbreviations. A bag of words approach was used to perform the analyses. Different visualizations which are common in text analysis, were created to easily grasp the results. Results: We identified 5282 unique patients. Three thousand and five (57%) patients had documented PMH. Of which 60.4% (n=1817) were males. The total median age was 62 years (range 0.12 – 97.2 years), and the majority of patients (83%) presented after the age of forty years. The top two documented medical histories were diabetes mellitus (DM) and surgery. DM was observed in 16.3% of the patients, and surgery at 15.4%. Other frequent patient histories (among the top 20) were fracture, cancer (ca), motor vehicle accident (MVA), leg, lumbar, discopathy, back and carpal tunnel release (CTR). When separating the data by sex, we can see that DM and MVA are more frequent among males, while cancer and CTR are less frequent. On the other hand, the top medical history in females was surgery and, after that, DM. Other frequent histories among females are breast cancer, fractures, and CTR. In the younger population (ages 18 to 26), the frequent PMH were surgery, fractures, trauma, and MVA. Discussion: By applying text mining approaches to unstructured data, we were able to better understand which medical histories are more relevant in these circumstances and, in addition, gain additional insights regarding sex and age differences. These insights might help to collect epidemiological demographical data as well as raise new hypotheses. One limitation of this work is that each clinician might use different words or abbreviations to describe the same condition, and therefore using a coding system can be beneficial.

Keywords: abnormal studies, healthcare analytics, medical history, nerve conduction studies, text mining, textual analysis

Procedia PDF Downloads 81
37268 Impact of Climate on Sugarcane Yield Over Belagavi District, Karnataka Using Statistical Mode

Authors: Girish Chavadappanavar

Abstract:

The impact of climate on agriculture could result in problems with food security and may threaten the livelihood activities upon which much of the population depends. In the present study, the development of a statistical yield forecast model has been carried out for sugarcane production over Belagavi district, Karnataka using weather variables of crop growing season and past observed yield data for the period of 1971 to 2010. The study shows that this type of statistical yield forecast model could efficiently forecast yield 5 weeks and even 10 weeks in advance of the harvest for sugarcane within an acceptable limit of error. The performance of the model in predicting yields at the district level for sugarcane crops is found quite satisfactory for both validation (2007 and 2008) as well as forecasting (2009 and 2010).In addition to the above study, the climate variability of the area has also been studied, and hence, the data series was tested for Mann Kendall Rank Statistical Test. The maximum and minimum temperatures were found to be significant with opposite trends (decreasing trend in maximum and increasing in minimum temperature), while the other three are found in significant with different trends (rainfall and evening time relative humidity with increasing trend and morning time relative humidity with decreasing trend).

Keywords: climate impact, regression analysis, yield and forecast model, sugar models

Procedia PDF Downloads 55
37267 Robotic Assisted vs Traditional Laparoscopic Partial Nephrectomy Peri-Operative Outcomes: A Comparative Single Surgeon Study

Authors: Gerard Bray, Derek Mao, Arya Bahadori, Sachinka Ranasinghe

Abstract:

The EAU currently recommends partial nephrectomy as the preferred management for localised cT1 renal tumours, irrespective of surgical approach. With the advent of robotic assisted partial nephrectomy, there is growing evidence that warm ischaemia time may be reduced compared to the traditional laparoscopic approach. There is still no clear differences between the two approaches with regards to other peri-operative and oncological outcomes. Current limitations in the field denote the lack of single surgeon series to compare the two approaches as other studies often include multiple operators of different experience levels. To the best of our knowledge, this study is the first single surgeon series comparing peri-operative outcomes of robotic assisted and laparoscopic PN. The current study aims to reduce intra-operator bias while maintaining an adequate sample size to assess the differences in outcomes between the two approaches. We retrospectively compared patient demographics, peri-operative outcomes, and renal function derangements of all partial nephrectomies undertaken by a single surgeon with experience in both laparoscopic and robotic surgery. Warm ischaemia time, length of stay, and acute renal function deterioration were all significantly reduced with robotic partial nephrectomy, compared to laparoscopic nephrectomy. This study highlights the benefits of robotic partial nephrectomy. Further prospective studies with larger sample sizes would be valuable additions to the current literature.

Keywords: partial nephrectomy, robotic assisted partial nephrectomy, warm ischaemia time, peri-operative outcomes

Procedia PDF Downloads 124
37266 Climate Change and Dengue Transmission in Lahore, Pakistan

Authors: Sadia Imran, Zenab Naseem

Abstract:

Dengue fever is one of the most alarming mosquito-borne viral diseases. Dengue virus has been distributed over the years exponentially throughout the world be it tropical or sub-tropical regions of the world, particularly in the last ten years. Changing topography, climate change in terms of erratic seasonal trends, rainfall, untimely monsoon early or late and longer or shorter incidences of either summer or winter. Globalization, frequent travel throughout the world and viral evolution has lead to more severe forms of Dengue. Global incidence of dengue infections per year have ranged between 50 million and 200 million; however, recent estimates using cartographic approaches suggest this number is closer to almost 400 million. In recent years, Pakistan experienced a deadly outbreak of the disease. The reason could be that they have the maximum exposure outdoors. Public organizations have observed that changing climate, especially lower average summer temperature, and increased vegetation have created tropical-like conditions in the city, which are suitable for Dengue virus growth. We will conduct a time-series analysis to study the interrelationship between dengue incidence and diurnal ranges of temperature and humidity in Pakistan, Lahore being the main focus of our study. We have used annual data from 2005 to 2015. We have investigated the relationship between climatic variables and dengue incidence. We used time series analysis to describe temporal trends. The result shows rising trends of Dengue over the past 10 years along with the rise in temperature & rainfall in Lahore. Hence this seconds the popular statement that the world is suffering due to Climate change and Global warming at different levels. Disease outbreak is one of the most alarming indications of mankind heading towards destruction and we need to think of mitigating measures to control epidemic from spreading and enveloping the cities, countries and regions.

Keywords: Dengue, epidemic, globalization, climate change

Procedia PDF Downloads 218
37265 Helping the Development of Public Policies with Knowledge of Criminal Data

Authors: Diego De Castro Rodrigues, Marcelo B. Nery, Sergio Adorno

Abstract:

The project aims to develop a framework for social data analysis, particularly by mobilizing criminal records and applying descriptive computational techniques, such as associative algorithms and extraction of tree decision rules, among others. The methods and instruments discussed in this work will enable the discovery of patterns, providing a guided means to identify similarities between recurring situations in the social sphere using descriptive techniques and data visualization. The study area has been defined as the city of São Paulo, with the structuring of social data as the central idea, with a particular focus on the quality of the information. Given this, a set of tools will be validated, including the use of a database and tools for visualizing the results. Among the main deliverables related to products and the development of articles are the discoveries made during the research phase. The effectiveness and utility of the results will depend on studies involving real data, validated both by domain experts and by identifying and comparing the patterns found in this study with other phenomena described in the literature. The intention is to contribute to evidence-based understanding and decision-making in the social field.

Keywords: social data analysis, criminal records, computational techniques, data mining, big data

Procedia PDF Downloads 65
37264 Hydrological Analysis for Urban Water Management

Authors: Ranjit Kumar Sahu, Ramakar Jha

Abstract:

Urban Water Management is the practice of managing freshwater, waste water, and storm water as components of a basin-wide management plan. It builds on existing water supply and sanitation considerations within an urban settlement by incorporating urban water management within the scope of the entire river basin. The pervasive problems generated by urban development have prompted, in the present work, to study the spatial extent of urbanization in Golden Triangle of Odisha connecting the cities Bhubaneswar (20.2700° N, 85.8400° E), Puri (19.8106° N, 85.8314° E) and Konark (19.9000° N, 86.1200° E)., and patterns of periodic changes in urban development (systematic/random) in order to develop future plans for (i) urbanization promotion areas, and (ii) urbanization control areas. Remote Sensing, using USGS (U.S. Geological Survey) Landsat8 maps, supervised classification of the Urban Sprawl has been done for during 1980 - 2014, specifically after 2000. This Work presents the following: (i) Time series analysis of Hydrological data (ground water and rainfall), (ii) Application of SWMM (Storm Water Management Model) and other soft computing techniques for Urban Water Management, and (iii) Uncertainty analysis of model parameters (Urban Sprawl and correlation analysis). The outcome of the study shows drastic growth results in urbanization and depletion of ground water levels in the area that has been discussed briefly. Other relative outcomes like declining trend of rainfall and rise of sand mining in local vicinity has been also discussed. Research on this kind of work will (i) improve water supply and consumption efficiency (ii) Upgrade drinking water quality and waste water treatment (iii) Increase economic efficiency of services to sustain operations and investments for water, waste water, and storm water management, and (iv) engage communities to reflect their needs and knowledge for water management.

Keywords: Storm Water Management Model (SWMM), uncertainty analysis, urban sprawl, land use change

Procedia PDF Downloads 412
37263 Electrical Decomposition of Time Series of Power Consumption

Authors: Noura Al Akkari, Aurélie Foucquier, Sylvain Lespinats

Abstract:

Load monitoring is a management process for energy consumption towards energy savings and energy efficiency. Non Intrusive Load Monitoring (NILM) is one method of load monitoring used for disaggregation purposes. NILM is a technique for identifying individual appliances based on the analysis of the whole residence data retrieved from the main power meter of the house. Our NILM framework starts with data acquisition, followed by data preprocessing, then event detection, feature extraction, then general appliance modeling and identification at the final stage. The event detection stage is a core component of NILM process since event detection techniques lead to the extraction of appliance features. Appliance features are required for the accurate identification of the household devices. In this research work, we aim at developing a new event detection methodology with accurate load disaggregation to extract appliance features. Time-domain features extracted are used for tuning general appliance models for appliance identification and classification steps. We use unsupervised algorithms such as Dynamic Time Warping (DTW). The proposed method relies on detecting areas of operation of each residential appliance based on the power demand. Then, detecting the time at which each selected appliance changes its states. In order to fit with practical existing smart meters capabilities, we work on low sampling data with a frequency of (1/60) Hz. The data is simulated on Load Profile Generator software (LPG), which was not previously taken into consideration for NILM purposes in the literature. LPG is a numerical software that uses behaviour simulation of people inside the house to generate residential energy consumption data. The proposed event detection method targets low consumption loads that are difficult to detect. Also, it facilitates the extraction of specific features used for general appliance modeling. In addition to this, the identification process includes unsupervised techniques such as DTW. To our best knowledge, there exist few unsupervised techniques employed with low sampling data in comparison to the many supervised techniques used for such cases. We extract a power interval at which falls the operation of the selected appliance along with a time vector for the values delimiting the state transitions of the appliance. After this, appliance signatures are formed from extracted power, geometrical and statistical features. Afterwards, those formed signatures are used to tune general model types for appliances identification using unsupervised algorithms. This method is evaluated using both simulated data on LPG and real-time Reference Energy Disaggregation Dataset (REDD). For that, we compute performance metrics using confusion matrix based metrics, considering accuracy, precision, recall and error-rate. The performance analysis of our methodology is then compared with other detection techniques previously used in the literature review, such as detection techniques based on statistical variations and abrupt changes (Variance Sliding Window and Cumulative Sum).

Keywords: electrical disaggregation, DTW, general appliance modeling, event detection

Procedia PDF Downloads 63
37262 Research of the Three-Dimensional Visualization Geological Modeling of Mine Based on Surpac

Authors: Honggang Qu, Yong Xu, Rongmei Liu, Zhenji Gao, Bin Wang

Abstract:

Today's mining industry is advancing gradually toward digital and visual direction. The three-dimensional visualization geological modeling of mine is the digital characterization of mineral deposits and is one of the key technology of digital mining. Three-dimensional geological modeling is a technology that combines geological spatial information management, geological interpretation, geological spatial analysis and prediction, geostatistical analysis, entity content analysis and graphic visualization in a three-dimensional environment with computer technology and is used in geological analysis. In this paper, the three-dimensional geological modeling of an iron mine through the use of Surpac is constructed, and the weight difference of the estimation methods between the distance power inverse ratio method and ordinary kriging is studied, and the ore body volume and reserves are simulated and calculated by using these two methods. Compared with the actual mine reserves, its result is relatively accurate, so it provides scientific bases for mine resource assessment, reserve calculation, mining design and so on.

Keywords: three-dimensional geological modeling, geological database, geostatistics, block model

Procedia PDF Downloads 63
37261 Fractional Integration in the West African Economic and Monetary Union

Authors: Hector Carcel Luis Alberiko Gil-Alana

Abstract:

This paper examines the time series behavior of three variables (GDP, Price level of Consumption and Population) in the eight countries that belong to the West African Economic and Monetary Union (WAEMU), which are Benin, Burkina Faso, Côte d’Ivoire, Guinea-Bissau, Mali, Niger, Senegal and Togo. The reason for carrying out this study lies in the considerable heterogeneity that can be perceived in the data from these countries. We conduct a long memory and fractional integration modeling framework and we also identify potential breaks in the data. The aim of the study is to perceive up to which degree the eight West African countries that belong to the same monetary union follow the same economic patterns of stability. Testing for mean reversion, we only found strong evidence of it in the case of Senegal for the Price level of Consumption, and in the cases of Benin, Burkina Faso and Senegal for GDP.

Keywords: West Africa, Monetary Union, fractional integration, economic patterns

Procedia PDF Downloads 410
37260 Filtering Intrusion Detection Alarms Using Ant Clustering Approach

Authors: Ghodhbani Salah, Jemili Farah

Abstract:

With the growth of cyber attacks, information safety has become an important issue all over the world. Many firms rely on security technologies such as intrusion detection systems (IDSs) to manage information technology security risks. IDSs are considered to be the last line of defense to secure a network and play a very important role in detecting large number of attacks. However the main problem with today’s most popular commercial IDSs is generating high volume of alerts and huge number of false positives. This drawback has become the main motivation for many research papers in IDS area. Hence, in this paper we present a data mining technique to assist network administrators to analyze and reduce false positive alarms that are produced by an IDS and increase detection accuracy. Our data mining technique is unsupervised clustering method based on hybrid ANT algorithm. This algorithm discovers clusters of intruders’ behavior without prior knowledge of a possible number of classes, then we apply K-means algorithm to improve the convergence of the ANT clustering. Experimental results on real dataset show that our proposed approach is efficient with high detection rate and low false alarm rate.

Keywords: intrusion detection system, alarm filtering, ANT class, ant clustering, intruders’ behaviors, false alarms

Procedia PDF Downloads 390
37259 Application of Remote Sensing Technique on the Monitoring of Mine Eco-Environment

Authors: Haidong Li, Weishou Shen, Guoping Lv, Tao Wang

Abstract:

Aiming to overcome the limitation of the application of traditional remote sensing (RS) technique in the mine eco-environmental monitoring, in this paper, we first classified the eco-environmental damages caused by mining activities and then introduced the principle, classification and characteristics of the Light Detection and Ranging (LiDAR) technique. The potentiality of LiDAR technique in the mine eco-environmental monitoring was analyzed, particularly in extracting vertical structure parameters of vegetation, through comparing the feasibility and applicability of traditional RS method and LiDAR technique in monitoring different types of indicators. The application situation of LiDAR technique in extracting typical mine indicators, such as land destruction in mining areas, damage of ecological integrity and natural soil erosion. The result showed that the LiDAR technique has the ability to monitor most of the mine eco-environmental indicators, and exhibited higher accuracy comparing with traditional RS technique, specifically speaking, the applicability of LiDAR technique on each indicator depends on the accuracy requirement of mine eco-environmental monitoring. In the item of large mine, LiDAR three-dimensional point cloud data not only could be used as the complementary data source of optical RS, Airborne/Satellite LiDAR could also fulfill the demand of extracting vertical structure parameters of vegetation in large areas.

Keywords: LiDAR, mine, ecological damage, monitoring, traditional remote sensing technique

Procedia PDF Downloads 378
37258 Large Time Asymptotic Behavior to Solutions of a Forced Burgers Equation

Authors: Satyanarayana Engu, Ahmed Mohd, V. Murugan

Abstract:

We study the large time asymptotics of solutions to the Cauchy problem for a forced Burgers equation (FBE) with the initial data, which is continuous and summable on R. For which, we first derive explicit solutions of FBE assuming a different class of initial data in terms of Hermite polynomials. Later, by violating this assumption we prove the existence of a solution to the considered Cauchy problem. Finally, we give an asymptotic approximate solution and establish that the error will be of order O(t^(-1/2)) with respect to L^p -norm, where 1≤p≤∞, for large time.

Keywords: Burgers equation, Cole-Hopf transformation, Hermite polynomials, large time asymptotics

Procedia PDF Downloads 319
37257 A Comparative Asessment of Some Algorithms for Modeling and Forecasting Horizontal Displacement of Ialy Dam, Vietnam

Authors: Kien-Trinh Thi Bui, Cuong Manh Nguyen

Abstract:

In order to simulate and reproduce the operational characteristics of a dam visually, it is necessary to capture the displacement at different measurement points and analyze the observed movement data promptly to forecast the dam safety. The accuracy of forecasts is further improved by applying machine learning methods to data analysis progress. In this study, the horizontal displacement monitoring data of the Ialy hydroelectric dam was applied to machine learning algorithms: Gaussian processes, multi-layer perceptron neural networks, and the M5-rules algorithm for modelling and forecasting of horizontal displacement of the Ialy hydropower dam (Vietnam), respectively, for analysing. The database which used in this research was built by collecting time series of data from 2006 to 2021 and divided into two parts: training dataset and validating dataset. The final results show all three algorithms have high performance for both training and model validation, but the MLPs is the best model. The usability of them are further investigated by comparison with a benchmark models created by multi-linear regression. The result show the performance which obtained from all the GP model, the MLPs model and the M5-Rules model are much better, therefore these three models should be used to analyze and predict the horizontal displacement of the dam.

Keywords: Gaussian processes, horizontal displacement, hydropower dam, Ialy dam, M5-Rules, multi-layer perception neural networks

Procedia PDF Downloads 186
37256 Potential Use of Leaching Gravel as a Raw Material in the Preparation of Geo Polymeric Material as an Alternative to Conventional Cement Materials

Authors: Arturo Reyes Roman, Daniza Castillo Godoy, Francisca Balarezo Olivares, Francisco Arriagada Castro, Miguel Maulen Tapia

Abstract:

Mining waste–based geopolymers are a sustainable alternative to conventional cement materials due to their contribution to the valorization of mining wastes as well as to the new construction materials with reduced fingerprints. The objective of this study was to determine the potential of leaching gravel (LG) from hydrometallurgical copper processing to be used as a raw material in the manufacture of geopolymer. NaOH, Na2SiO3 (modulus 1.5), and LG were mixed and then wetted with an appropriate amount of tap water, then stirred until a homogenous paste was obtained. A liquid/solid ratio of 0.3 was used for preparing mixtures. The paste was then cast in cubic moulds of 50 mm for the determination of compressive strengths. The samples were left to dry for 24h at room temperature, then unmoulded before analysis after 28 days of curing time. The compressive test was conducted in a compression machine (15/300 kN). According to the laser diffraction spectroscopy (LDS) analysis, 90% of LG particles were below 500 μm. The X-ray diffraction (XRD) analysis identified crystalline phases of albite (30 %), Quartz (16%), Anorthite (16 %), and Phillipsite (14%). The X-ray fluorescence (XRF) determinations showed mainly 55% of SiO2, 13 % of Al2O3, and 9% of CaO. ICP (OES) concentrations of Fe, Ca, Cu, Al, As, V, Zn, Mo, and Ni were 49.545; 24.735; 6.172; 14.152, 239,5; 129,6; 41,1;15,1, and 13,1 mg kg-1, respectively. The geopolymer samples showed resistance ranging between 2 and 10 MPa. In comparison with the raw material composition, the amorphous percentage of materials in the geopolymer was 35 %, whereas the crystalline percentage of main mineral phases decreased. Further studies are needed to find the optimal combinations of materials to produce a more resistant and environmentally safe geopolymer. Particularly are necessary compressive resistance higher than 15 MPa are necessary to be used as construction unit such as bricks.

Keywords: mining waste, geopolymer, construction material, alkaline activation

Procedia PDF Downloads 86
37255 Changes in Rainfall and Temperature and Its Impact on Crop Production in Moyamba District, Southern Sierra Leone

Authors: Keiwoma Mark Yila, Mathew Lamrana Siaffa Gboku, Mohamed Sahr Lebbie, Lamin Ibrahim Kamara

Abstract:

Rainfall and temperature are the important variables which are often used to trace climate variability and change. A perception study and analysis of climatic data were conducted to assess the changes in rainfall and temperature and their impact on crop production in Moyamba district, Sierra Leone. For the perception study, 400 farmers were randomly selected from farmer-based organizations (FBOs) in 4 chiefdoms, and 30 agricultural extension workers (AWEs) in the Moyamba district were purposely selected as respondents. Descriptive statistics and Kendall’s test of concordance was used to analyze the data collected from the farmers and AEWs. Data for the analysis of variability and trends of rainfall and temperature from 1991 to 2020 were obtained from the Sierra Leone Meteorological Agency and Njala University and grouped into monthly, seasonal and annual time series. Regression analysis was used to determine the statistical values and trend lines for the seasonal and annual time series data. The Mann-Kendall test and Sen’s Slope Estimator were used to analyze the trends' significance and magnitude, respectively. The results of both studies show evidence of climate change in the Moyamba district. A substantial number of farmers and AEWs perceived a decrease in the annual rainfall amount, length of the rainy season, a late start and end of the rainy season, an increase in the temperature during the day and night, and a shortened harmattan period over the last 30 years. Analysis of the meteorological data shows evidence of variability in the seasonal and annual distribution of rainfall and temperature, a decreasing and non-significant trend in the rainy season and annual rainfall, and an increasing and significant trend in seasonal and annual temperature from 1991 to 2020. However, the observed changes in rainfall and temperature by the farmers and AEWs partially agree with the results of the analyzed meteorological data. The majority of the farmers perceived that; adverse weather conditions have negatively affected crop production in the district. Droughts, high temperatures, and irregular rainfall are the three major adverse weather events that farmers perceived to have contributed to a substantial loss in the yields of the major crops cultivated in the district. In response to the negative effects of adverse weather events, a substantial number of farmers take no action due to their lack of knowledge and technical or financial capacity to implement climate-sensitive agricultural (CSA) practices. Even though few farmers are practising some CSA practices in their farms, there is an urgent need to build the capacity of farmers and AEWs to adapt to and mitigate the negative impacts of climate change. The most priority support needed by farmers is the provision of climate-resilient crop varieties, whilst the AEWs need training on CSA practices.

Keywords: climate change, crop productivity, farmer’s perception, rainfall, temperature, Sierra Leone

Procedia PDF Downloads 61
37254 Understanding Regional Circulations That Modulate Heavy Precipitations in the Kulfo Watershed

Authors: Tesfay Mekonnen Weldegerima

Abstract:

Analysis of precipitation time series is a fundamental undertaking in meteorology and hydrology. The extreme precipitation scenario of the Kulfo River watershed is studied using wavelet analysis and atmospheric transport, a lagrangian trajectory model. Daily rainfall data for the 1991-2020 study periods are collected from the office of the Ethiopian Meteorology Institute. Meteorological fields on a three-dimensional grid at 0.5o x 0.5o spatial resolution and daily temporal resolution are also obtained from the Global Data Assimilation System (GDAS). Wavelet analysis of the daily precipitation processed with the lag-1 coefficient reveals some high power recurred once every 38 to 60 days with greater than 95% confidence for red noise. The analysis also identified inter-annual periodicity in the periods 2002 - 2005 and 2017 - 2019. Back trajectory analysis for 3-day periods up to May 19/2011, indicates the Indian Ocean source; trajectories crossed the eastern African escarpment to arrive at the Kulfo watershed. Atmospheric flows associated with the Western Indian monsoon redirected by the low-level Somali winds and Arabian ridge are responsible for the moisture supply. The time-localization of the wavelet power spectrum yields valuable hydrological information, and the back trajectory approaches provide useful characterization of air mass source.

Keywords: extreme precipitation events, power spectrum, back trajectory, kulfo watershed

Procedia PDF Downloads 53
37253 An Analysis on Clustering Based Gene Selection and Classification for Gene Expression Data

Authors: K. Sathishkumar, V. Thiagarasu

Abstract:

Due to recent advances in DNA microarray technology, it is now feasible to obtain gene expression profiles of tissue samples at relatively low costs. Many scientists around the world use the advantage of this gene profiling to characterize complex biological circumstances and diseases. Microarray techniques that are used in genome-wide gene expression and genome mutation analysis help scientists and physicians in understanding of the pathophysiological mechanisms, in diagnoses and prognoses, and choosing treatment plans. DNA microarray technology has now made it possible to simultaneously monitor the expression levels of thousands of genes during important biological processes and across collections of related samples. Elucidating the patterns hidden in gene expression data offers a tremendous opportunity for an enhanced understanding of functional genomics. However, the large number of genes and the complexity of biological networks greatly increase the challenges of comprehending and interpreting the resulting mass of data, which often consists of millions of measurements. A first step toward addressing this challenge is the use of clustering techniques, which is essential in the data mining process to reveal natural structures and identify interesting patterns in the underlying data. This work presents an analysis of several clustering algorithms proposed to deals with the gene expression data effectively. The existing clustering algorithms like Support Vector Machine (SVM), K-means algorithm and evolutionary algorithm etc. are analyzed thoroughly to identify the advantages and limitations. The performance evaluation of the existing algorithms is carried out to determine the best approach. In order to improve the classification performance of the best approach in terms of Accuracy, Convergence Behavior and processing time, a hybrid clustering based optimization approach has been proposed.

Keywords: microarray technology, gene expression data, clustering, gene Selection

Procedia PDF Downloads 308
37252 Islamic Research Methodology (I-Restmo): Eight Series Research Module with Islamic Value Concept

Authors: Noraizah Abu Bakar, Norhayati Alais, Nurdiana Azizan, Fatimah Alwi, Muhammad Zaky Razaly

Abstract:

This is a concise research module with the Islamic values concept proposed to a group of researches, potential researchers, PhD and Master Scholars to prepare themselves for their studies. The intention of designing this module is to help and guide Malaysian citizens to undergone their postgraduate’s studies. This is aligned with the 10th Malaysian plan- MyBrain 15. MyBrain 15 is a financial aid to Malaysian citizens to pursue PhD and Master programs. The program becomes one of Ministry of Education Strategic Plan to ensure by year 2013, there will be 60,000 PhD scholars in Malaysia. This module is suitable for the social science researchers; however it can be useful tool for science technology researchers such as Engineering and Information Technology disciplines too. The module consists of eight (8) series that provides a proper flow of information in doing research with the Islamic Value Application provided in each of the series. This module is designed to produce future researchers with a comprehensive knowledge of humankind and the hereafter. The uniqueness about this research module is designed based on Islamic values concept. Researchers were able to understand the proper research process and simultaneously be able to open their minds to understand Islam more closely. Application of Islamic values in each series could trigger a broader idea for researchers to examine in greater depth of knowledge related to humanities.

Keywords: Eight Series Research Module, Islamic Values concept, Teaching Methodology, Flow of Information, Epistemology of research

Procedia PDF Downloads 382
37251 PM10 Prediction and Forecasting Using CART: A Case Study for Pleven, Bulgaria

Authors: Snezhana G. Gocheva-Ilieva, Maya P. Stoimenova

Abstract:

Ambient air pollution with fine particulate matter (PM10) is a systematic permanent problem in many countries around the world. The accumulation of a large number of measurements of both the PM10 concentrations and the accompanying atmospheric factors allow for their statistical modeling to detect dependencies and forecast future pollution. This study applies the classification and regression trees (CART) method for building and analyzing PM10 models. In the empirical study, average daily air data for the city of Pleven, Bulgaria for a period of 5 years are used. Predictors in the models are seven meteorological variables, time variables, as well as lagged PM10 variables and some lagged meteorological variables, delayed by 1 or 2 days with respect to the initial time series, respectively. The degree of influence of the predictors in the models is determined. The selected best CART models are used to forecast future PM10 concentrations for two days ahead after the last date in the modeling procedure and show very accurate results.

Keywords: cross-validation, decision tree, lagged variables, short-term forecasting

Procedia PDF Downloads 179