Search results for: text mining
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 2188

Search results for: text mining

1948 Machine Learning Application in Shovel Maintenance

Authors: Amir Taghizadeh Vahed, Adithya Thaduri

Abstract:

Shovels are the main components in the mining transportation system. The productivity of the mines depends on the availability of shovels due to its high capital and operating costs. The unplanned failure/shutdowns of a shovel results in higher repair costs, increase in downtime, as well as increasing indirect cost (i.e. loss of production and company’s reputation). In order to mitigate these failures, predictive maintenance can be useful approach using failure prediction. The modern mining machinery or shovels collect huge datasets automatically; it consists of reliability and maintenance data. However, the gathered datasets are useless until the information and knowledge of data are extracted. Machine learning as well as data mining, which has a major role in recent studies, has been used for the knowledge discovery process. In this study, data mining and machine learning approaches are implemented to detect not only anomalies but also patterns from a dataset and further detection of failures.

Keywords: maintenance, machine learning, shovel, conditional based monitoring

Procedia PDF Downloads 170
1947 Planning Urban Sprawl in Mining Areas in Africa: How to Ensure Coherent Development

Authors: Pascal Rey, Anaïs Weber

Abstract:

Many mining projects are being developed in Africa the last decades. Due to the economic opportunities they offer, these projects result in a massive and rapid influx of migrants to the surrounding area. In areas where central government representation is low and local administration lack financial resources, urban development is often anarchical, beyond all public control. It leads to socio-spatial segregation, insecurity and the risk of social conflicts rising. Aware that their economic development is very correlated with local situation, mining companies get more and more involved in regional planning in setting up tools and Strategic Directions document. One of the commonly used tools in this regard is the “Influx Management Plan”. It consists in looking at the region’s absorption capacities in order to ensure its coherent development and by developing several urban centers than one macrocephalic city. It includes many other measures such as urban governance support, skills transfer, creation of strategic guidelines, financial support (local taxes, mining taxes, development funds etc.) local development projects. Through various examples of mining projects in Guinea, A country that is host to many large mining projects, we will look at the implications of regional and urban planning of which mining companies are key playor as well as public authorities. While their investment capacity offers advantages and accelerates development, their actions raise questions of the unilaterality of interests and local governance. By interfering in public affairs are mining companies not increasing the risk of central and local government shirking their responsibilities in terms of regional development, or even calling their legitimacy into question? Is such public-private collaboration really sustainable for the region as a whole and for all stakeholders?

Keywords: Africa, guinea, mine, urban planning

Procedia PDF Downloads 456
1946 Assessment for the Backfill Using the Run of the Mine Tailings and Portland Cement

Authors: Javad Someehneshin, Weizhou Quan, Abdelsalam Abugharara, Stephen Butt

Abstract:

Narrow vein mining (NVM) is exploiting very thin but valuable ore bodies that are uneconomical to extract by conventional mining methods. NVM applies the technique of Sustainable Mining by Drilling (SMD). The SMD method is used to mine stranded, steeply dipping ore veins, which are too small or isolated to mine economically using conventional methods since the dilution is minimized. This novel mining technique uses drilling rigs to extract the ore through directional drilling surgically. This paper is focusing on utilizing the run of the mine tailings and Portland cement as backfill material to support the hanging wall for providing safe mine operation. Cemented paste backfill (CPB) is designed by mixing waste tailings, water, and cement of the precise percentage for optimal outcomes. It is a non-homogenous material that contains 70-85% solids. Usually, a hydraulic binder is added to the mixture to increase the strength of the CPB. The binder fraction mostly accounts for 2–10% of the total weight. In the mining industry, CPB has been improved and expanded gradually because it provides safety and support for the mines. Furthermore, CPB helps manage the waste tailings in an economical method and plays a significant role in environmental protection.

Keywords: backfilling, cement backfill, tailings, Portland cement

Procedia PDF Downloads 106
1945 Evolutionary Methods in Cryptography

Authors: Wafa Slaibi Alsharafat

Abstract:

Genetic algorithms (GA) are random algorithms as random numbers that are generated during the operation of the algorithm determine what happens. This means that if GA is applied twice to optimize exactly the same problem it might produces two different answers. In this project, we propose an evolutionary algorithm and Genetic Algorithm (GA) to be implemented in symmetric encryption and decryption. Here, user's message and user secret information (key) which represent plain text to be transferred into cipher text.

Keywords: GA, encryption, decryption, crossover

Procedia PDF Downloads 406
1944 Numerical Modeling of Artisanal and Small Scale Mining of Coltan in the African Great Lakes Region

Authors: Sergio Perez Rodriguez

Abstract:

Coltan Artisanal and Small-Scale Mining (ASM) production from Africa's Great Lakes region has previously been addressed at large scales, notably from regional to country levels. The current findings address the unresolved issue of a production model of ASM of coltan ore by an average Democratic Republic of Congo (DRC) mineworker, which can be used as a reference for a similar characterization of the daily labor of counterparts from other countries in the region. To that end, the Fundamental Equation of Mineral Production has been applied, considering a miner's average daily output of coltan, estimated in the base of gross statistical data gathered from reputable sources. Results indicate daily yields of individual miners in the order of 300 g of coltan ore, with hourly peaks of production in the range of 30 to 40 g of the mineral. Yields are expected to be in the order of 5 g or less during the least productive hours. These outputs are expected to be achieved during the halves of the eight to ten hours of daily working sessions that these artisanal laborers can attend during the mining season.

Keywords: coltan, mineral production, production to reserve ratio, artisanal mining, small-scale mining, ASM, human work, Great Lakes region, Democratic Republic of Congo

Procedia PDF Downloads 47
1943 A Conglomerate of Multiple Optical Character Recognition Table Detection and Extraction

Authors: Smita Pallavi, Raj Ratn Pranesh, Sumit Kumar

Abstract:

Information representation as tables is compact and concise method that eases searching, indexing, and storage requirements. Extracting and cloning tables from parsable documents is easier and widely used; however, industry still faces challenges in detecting and extracting tables from OCR (Optical Character Recognition) documents or images. This paper proposes an algorithm that detects and extracts multiple tables from OCR document. The algorithm uses a combination of image processing techniques, text recognition, and procedural coding to identify distinct tables in the same image and map the text to appropriate the corresponding cell in dataframe, which can be stored as comma-separated values, database, excel, and multiple other usable formats.

Keywords: table extraction, optical character recognition, image processing, text extraction, morphological transformation

Procedia PDF Downloads 111
1942 Grammatical and Lexical Cohesion in the Japan’s Prime Minister Shinzo Abe’s Speech Text ‘Nihon wa Modottekimashita’

Authors: Nadya Inda Syartanti

Abstract:

This research aims to identify, classify, and analyze descriptively the aspects of grammatical and lexical cohesion in the speech text of Japan’s Prime Minister Shinzo Abe entitled Nihon wa Modotte kimashita delivered in Washington DC, the United States on February 23, 2013, as a research data source. The method used is qualitative research, which uses descriptions through words that are applied by analyzing aspects of grammatical and lexical cohesion proposed by Halliday and Hasan (1976). The aspects of grammatical cohesion consist of references (personal, demonstrative, interrogative pronouns), substitution, ellipsis, and conjunction. In contrast, lexical cohesion consists of reiteration (repetition, synonym, antonym, hyponym, meronym) and collocation. Data classification is based on the 6 aspects of the cohesion. Through some aspects of cohesion, this research tries to find out the frequency of using grammatical and lexical cohesion in Shinzo Abe's speech text entitled Nihon wa Modotte kimashita. The results of this research are expected to help overcome the difficulty of understanding speech texts in Japanese. Therefore, this research can be a reference for learners, researchers, and anyone who is interested in the field of discourse analysis.

Keywords: cohesion, grammatical cohesion, lexical cohesion, speech text, Shinzo Abe

Procedia PDF Downloads 124
1941 Analysis and Rule Extraction of Coronary Artery Disease Data Using Data Mining

Authors: Rezaei Hachesu Peyman, Oliyaee Azadeh, Salahzadeh Zahra, Alizadeh Somayyeh, Safaei Naser

Abstract:

Coronary Artery Disease (CAD) is one major cause of disability in adults and one main cause of death in developed. In this study, data mining techniques including Decision Trees, Artificial neural networks (ANNs), and Support Vector Machine (SVM) analyze CAD data. Data of 4948 patients who had suffered from heart diseases were included in the analysis. CAD is the target variable, and 24 inputs or predictor variables are used for the classification. The performance of these techniques is compared in terms of sensitivity, specificity, and accuracy. The most significant factor influencing CAD is chest pain. Elderly males (age > 53) have a high probability to be diagnosed with CAD. SVM algorithm is the most useful way for evaluation and prediction of CAD patients as compared to non-CAD ones. Application of data mining techniques in analyzing coronary artery diseases is a good method for investigating the existing relationships between variables.

Keywords: classification, coronary artery disease, data-mining, knowledge discovery, extract

Procedia PDF Downloads 623
1940 Wasting Human and Computer Resources

Authors: Mária Csernoch, Piroska Biró

Abstract:

The legends about “user-friendly” and “easy-to-use” birotical tools (computer-related office tools) have been spreading and misleading end-users. This approach has led us to the extremely high number of incorrect documents, causing serious financial losses in the creating, modifying, and retrieving processes. Our research proved that there are at least two sources of this underachievement: (1) The lack of the definition of the correctly edited, formatted documents. Consequently, end-users do not know whether their methods and results are correct or not. They are not aware of their ignorance. They are so ignorant that their ignorance does not allow them to realize their lack of knowledge. (2) The end-users’ problem-solving methods. We have found that in non-traditional programming environments end-users apply, almost exclusively, surface approach metacognitive methods to carry out their computer related activities, which are proved less effective than deep approach methods. Based on these findings we have developed deep approach methods which are based on and adapted from traditional programming languages. In this study, we focus on the most popular type of birotical documents, the text-based documents. We have provided the definition of the correctly edited text, and based on this definition, adapted the debugging method known in programming. According to the method, before the realization of text editing, a thorough debugging of already existing texts and the categorization of errors are carried out. With this method in advance to real text editing users learn the requirements of text-based documents and also of the correctly formatted text. The method has been proved much more effective than the previously applied surface approach methods. The advantages of the method are that the real text handling requires much less human and computer sources than clicking aimlessly in the GUI (Graphical User Interface), and the data retrieval is much more effective than from error-prone documents.

Keywords: deep approach metacognitive methods, error-prone birotical documents, financial losses, human and computer resources

Procedia PDF Downloads 354
1939 Short Text Classification for Saudi Tweets

Authors: Asma A. Alsufyani, Maram A. Alharthi, Maha J. Althobaiti, Manal S. Alharthi, Huda Rizq

Abstract:

Twitter is one of the most popular microblogging sites that allows users to publish short text messages called 'tweets'. Increasing the number of accounts to follow (followings) increases the number of tweets that will be displayed from different topics in an unclassified manner in the timeline of the user. Therefore, it can be a vital solution for many Twitter users to have their tweets in a timeline classified into general categories to save the user’s time and to provide easy and quick access to tweets based on topics. In this paper, we developed a classifier for timeline tweets trained on a dataset consisting of 3600 tweets in total, which were collected from Saudi Twitter and annotated manually. We experimented with the well-known Bag-of-Words approach to text classification, and we used support vector machines (SVM) in the training process. The trained classifier performed well on a test dataset, with an average F1-measure equal to 92.3%. The classifier has been integrated into an application, which practically proved the classifier’s ability to classify timeline tweets of the user.

Keywords: corpus creation, feature extraction, machine learning, short text classification, social media, support vector machine, Twitter

Procedia PDF Downloads 115
1938 Moral Wrongdoers: Evaluating the Value of Moral Actions Performed by War Criminals

Authors: Jean-Francois Caron

Abstract:

This text explores the value of moral acts performed by war criminals, and the extent to which they should alleviate the punishment these individuals ought to receive for violating the rules of war. Without neglecting the necessity of retribution in war crimes cases, it argues from an ethical perspective that we should not rule out the possibility of considering lesser punishments for war criminals who decide to perform a moral act, as it might produce significant positive moral outcomes. This text also analyzes how such a norm could be justified from a moral perspective.

Keywords: war criminals, pardon, amnesty, retribution

Procedia PDF Downloads 245
1937 A Deep Learning Approach to Subsection Identification in Electronic Health Records

Authors: Nitin Shravan, Sudarsun Santhiappan, B. Sivaselvan

Abstract:

Subsection identification, in the context of Electronic Health Records (EHRs), is identifying the important sections for down-stream tasks like auto-coding. In this work, we classify the text present in EHRs according to their information, using machine learning and deep learning techniques. We initially describe briefly about the problem and formulate it as a text classification problem. Then, we discuss upon the methods from the literature. We try two approaches - traditional feature extraction based machine learning methods and deep learning methods. Through experiments on a private dataset, we establish that the deep learning methods perform better than the feature extraction based Machine Learning Models.

Keywords: deep learning, machine learning, semantic clinical classification, subsection identification, text classification

Procedia PDF Downloads 175
1936 Identification of Text Domains and Register Variation through the Analysis of Lexical Distribution in a Bangla Mass Media Text Corpus

Authors: Mahul Bhattacharyya, Niladri Sekhar Dash

Abstract:

The present research paper is an experimental attempt to investigate the nature of variation in the register in three major text domains, namely, social, cultural, and political texts collected from the corpus of Bangla printed mass media texts. This present study uses a corpus of a moderate amount of Bangla mass media text that contains nearly one million words collected from different media sources like newspapers, magazines, advertisements, periodicals, etc. The analysis of corpus data reveals that each text has certain lexical properties that not only control their identity but also mark their uniqueness across the domains. At first, the subject domains of the texts are classified into two parameters namely, ‘Genre' and 'Text Type'. Next, some empirical investigations are made to understand how the domains vary from each other in terms of lexical properties like both function and content words. Here the method of comparative-cum-contrastive matching of lexical load across domains is invoked through word frequency count to track how domain-specific words and terms may be marked as decisive indicators in the act of specifying the textual contexts and subject domains. The study shows that the common lexical stock that percolates across all text domains are quite dicey in nature as their lexicological identity does not have any bearing in the act of specifying subject domains. Therefore, it becomes necessary for language users to anchor upon certain domain-specific lexical items to recognize a text that belongs to a specific text domain. The eventual findings of this study confirm that texts belonging to different subject domains in Bangla news text corpus clearly differ on the parameters of lexical load, lexical choice, lexical clustering, lexical collocation. In fact, based on these parameters, along with some statistical calculations, it is possible to classify mass media texts into different types to mark their relation with regard to the domains they should actually belong. The advantage of this analysis lies in the proper identification of the linguistic factors which will give language users a better insight into the method they employ in text comprehension, as well as construct a systemic frame for designing text identification strategy for language learners. The availability of huge amount of Bangla media text data is useful for achieving accurate conclusions with a certain amount of reliability and authenticity. This kind of corpus-based analysis is quite relevant for a resource-poor language like Bangla, as no attempt has ever been made to understand how the structure and texture of Bangla mass media texts vary due to certain linguistic and extra-linguistic constraints that are actively operational to specific text domains. Since mass media language is assumed to be the most 'recent representation' of the actual use of the language, this study is expected to show how the Bangla news texts reflect the thoughts of the society and how they leave a strong impact on the thought process of the speech community.

Keywords: Bangla, corpus, discourse, domains, lexical choice, mass media, register, variation

Procedia PDF Downloads 145
1935 Hybrid Energy System for the German Mining Industry: An Optimized Model

Authors: Kateryna Zharan, Jan C. Bongaerts

Abstract:

In recent years, economic attractiveness of renewable energy (RE) for the mining industry, especially for off-grid mines, and a negative environmental impact of fossil energy are stimulating to use RE for mining needs. Being that remote area mines have higher energy expenses than mines connected to a grid, integration of RE may give a mine economic benefits. Regarding the literature review, there is a lack of business models for adopting of RE at mine. The main aim of this paper is to develop an optimized model of RE integration into the German mining industry (GMI). Hereby, the GMI with amount of around 800 mill. t. annually extracted resources is included in the list of the 15 major mining country in the world. Accordingly, the mining potential of Germany is evaluated in this paper as a perspective market for RE implementation. The GMI has been classified in order to find out the location of resources, quantity and types of the mines, amount of extracted resources, and access of the mines to the energy resources. Additionally, weather conditions have been analyzed in order to figure out where wind and solar generation technologies can be integrated into a mine with the highest efficiency. Despite the fact that the electricity demand of the GMI is almost completely covered by a grid connection, the hybrid energy system (HES) based on a mix of RE and fossil energy is developed due to show environmental and economic benefits. The HES for the GMI consolidates a combination of wind turbine, solar PV, battery and diesel generation. The model has been calculated using the HOMER software. Furthermore, the demonstrated HES contains a forecasting model that predicts solar and wind generation in advance. The main result from the HES such as CO2 emission reduction is estimated in order to make the mining processing more environmental friendly.

Keywords: diesel generation, German mining industry, hybrid energy system, hybrid optimization model for electric renewables, optimized model, renewable energy

Procedia PDF Downloads 311
1934 Digitalization in Aggregate Quarries

Authors: José Eugenio Ortiz, Pierre Plaza, Josefa Herrero, Iván Cabria, José Luis Blanco, Javier Gavilanes, José Ignacio Escavy, Ignacio López-Cilla, Virginia Yagüe, César Pérez, Silvia Rodríguez, Jorge Rico, Cecilia Serrano, Jesús Bernat

Abstract:

The development of Artificial Intelligence services in mining processes, specifically in aggregate quarries, is facilitating automation and improving numerous aspects of operations. Ultimately, AI is transforming the mining industry by improving efficiency, safety and sustainability. With the ability to analyze large amounts of data and make autonomous decisions, AI offers great opportunities to optimize mining operations and maximize the economic and social benefits of this vital industry. Within the framework of the European DIGIECOQUARRY project, various services were developed for the identification of material quality, production estimation, detection of anomalies and prediction of consumption and production automatically with good results.

Keywords: aggregates, artificial intelligence, automatization, mining operations

Procedia PDF Downloads 46
1933 Study of the Stability of Underground Mines by Numerical Method: The Mine Chaabet El Hamra, Algeria

Authors: Nakache Radouane, M. Boukelloul, M. Fredj

Abstract:

Method room and pillar sizes are key factors for safe mining and their recovery in open-stop mining. This method is advantageous due to its simplicity and requirement of little information to be used. It is probably the most representative method among the total load approach methods although it also remains a safe design method. Using a finite element software (PLAXIS 3D), analyses were carried out with an elasto-plastic model and comparisons were made with methods based on the total load approach. The results were presented as the optimization for improving the ore recovery rate while maintaining a safe working environment.

Keywords: room and pillar, mining, total load approach, elasto-plastic

Procedia PDF Downloads 295
1932 Evaluation of the Performance of ACTIFLO® Clarifier in the Treatment of Mining Wastewaters: Case Study of Costerfield Mining Operations, Victoria, Australia

Authors: Seyed Mohsen Samaei, Shirley Gato-Trinidad

Abstract:

A pre-treatment stage prior to reverse osmosis (RO) is very important to ensure the long-term performance of the RO membranes in any wastewater treatment using RO. This study aims to evaluate the application of the Actiflo® clarifier as part of a pre-treatment unit in mining operations. It involves performing analytical testing on RO feed water before and after installation of Actiflo® unit. Water samples prior to RO plant stage were obtained on different dates from Costerfield mining operations in Victoria, Australia. Tests were conducted in an independent laboratory to determine the concentration of various compounds in RO feed water before and after installation of Actiflo® unit during the entire evaluated period from December 2015 to June 2018. Water quality analysis shows that the quality of RO feed water has remarkably improved since installation of Actiflo® clarifier. Suspended solids (SS) and turbidity removal efficiencies has been improved by 91 and 85 percent respectively in pre-treatment system since the installation of Actiflo®. The Actiflo® clarifier proved to be a valuable part of pre-treatment system prior to RO. It has the potential to conveniently condition the mining wastewater prior to RO unit, and reduce the risk of RO physical failure and irreversible fouling. Consequently, reliable and durable operation of RO unit with minimum requirement for RO membrane replacement is expected with Actiflo® in use.

Keywords: ACTIFLO ® clarifier, mining wastewater, reverse osmosis, water treatment

Procedia PDF Downloads 166
1931 Determination of the Risks of Heart Attack at the First Stage as Well as Their Control and Resource Planning with the Method of Data Mining

Authors: İbrahi̇m Kara, Seher Arslankaya

Abstract:

Frequently preferred in the field of engineering in particular, data mining has now begun to be used in the field of health as well since the data in the health sector have reached great dimensions. With data mining, it is aimed to reveal models from the great amounts of raw data in agreement with the purpose and to search for the rules and relationships which will enable one to make predictions about the future from the large amount of data set. It helps the decision-maker to find the relationships among the data which form at the stage of decision-making. In this study, it is aimed to determine the risk of heart attack at the first stage, to control it, and to make its resource planning with the method of data mining. Through the early and correct diagnosis of heart attacks, it is aimed to reveal the factors which affect the diseases, to protect health and choose the right treatment methods, to reduce the costs in health expenditures, and to shorten the durations of patients’ stay at hospitals. In this way, the diagnosis and treatment costs of a heart attack will be scrutinized, which will be useful to determine the risk of the disease at the first stage, to control it, and to make its resource planning.

Keywords: data mining, decision support systems, heart attack, health sector

Procedia PDF Downloads 323
1930 Analysing the Perception of Climate Hazards on Biodiversity Conservation in Mining Landscapes within Southwestern Ghana

Authors: Salamatu Shaibu, Jan Hernning Sommer

Abstract:

Integrating biodiversity conservation practices in mining landscapes ensures the continual provision of various ecosystem services to the dependent communities whilst serving as ecological insurance for corporate mining when purchasing reclamation security bonds. Climate hazards such as long dry seasons, erratic rainfall patterns, and extreme weather events contribute to biodiversity loss in addition to the impact due to mining. Both corporate mining and mine-fringe communities perceive the effect of climate on biodiversity from the context of the benefits they accrue, which motivate their conservation practices. In this study, pragmatic approaches including semi-structured interviews, field visual observation, and review were used to collect data on corporate mining employees and households of fringing communities in the southwestern mining hub. The perceived changes in the local climatic conditions and the consequences on environmental management practices that promote biodiversity conservation were examined. Using a thematic content analysis tool, the result shows that best practices such as concurrent land rehabilitation, reclamation ponds, artificial wetlands, land clearance, and topsoil management are directly affected by prolonging long dry seasons and erratic rainfall patterns. Excessive dust and noise generation directly affect both floral and faunal diversity coupled with excessive fire outbreaks in rehabilitated lands and nearby forest reserves. Proposed adaptive measures include engaging national conservation authorities to promote reforestation projects around forest reserves. National government to desist from using permit for mining concessions in forest reserves, engaging local communities through educational campaigns to control forest encroachment and burning, promoting community-based resource management to promote community ownership, and provision of stricter environmental legislation to compel corporate, artisanal, and small scale mining companies to promote biodiversity conservation.

Keywords: biodiversity conservation, climate hazards, corporate mining, mining landscapes

Procedia PDF Downloads 175
1929 A Methodology for Automatic Diversification of Document Categories

Authors: Dasom Kim, Chen Liu, Myungsu Lim, Su-Hyeon Jeon, ByeoungKug Jeon, Kee-Young Kwahk, Namgyu Kim

Abstract:

Recently, numerous documents including unstructured data and text have been created due to the rapid increase in the usage of social media and the Internet. Each document is usually provided with a specific category for the convenience of the users. In the past, the categorization was performed manually. However, in the case of manual categorization, not only can the accuracy of the categorization be not guaranteed but the categorization also requires a large amount of time and huge costs. Many studies have been conducted towards the automatic creation of categories to solve the limitations of manual categorization. Unfortunately, most of these methods cannot be applied to categorizing complex documents with multiple topics because the methods work by assuming that one document can be categorized into one category only. In order to overcome this limitation, some studies have attempted to categorize each document into multiple categories. However, they are also limited in that their learning process involves training using a multi-categorized document set. These methods therefore cannot be applied to multi-categorization of most documents unless multi-categorized training sets are provided. To overcome the limitation of the requirement of a multi-categorized training set by traditional multi-categorization algorithms, we previously proposed a new methodology that can extend a category of a single-categorized document to multiple categorizes by analyzing relationships among categories, topics, and documents. In this paper, we design a survey-based verification scenario for estimating the accuracy of our automatic categorization methodology.

Keywords: big data analysis, document classification, multi-category, text mining, topic analysis

Procedia PDF Downloads 235
1928 Efficient Frequent Itemset Mining Methods over Real-Time Spatial Big Data

Authors: Hamdi Sana, Emna Bouazizi, Sami Faiz

Abstract:

In recent years, there is a huge increase in the use of spatio-temporal applications where data and queries are continuously moving. As a result, the need to process real-time spatio-temporal data seems clear and real-time stream data management becomes a hot topic. Sliding window model and frequent itemset mining over dynamic data are the most important problems in the context of data mining. Thus, sliding window model for frequent itemset mining is a widely used model for data stream mining due to its emphasis on recent data and its bounded memory requirement. These methods use the traditional transaction-based sliding window model where the window size is based on a fixed number of transactions. Actually, this model supposes that all transactions have a constant rate which is not suited for real-time applications. And the use of this model in such applications endangers their performance. Based on these observations, this paper relaxes the notion of window size and proposes the use of a timestamp-based sliding window model. In our proposed frequent itemset mining algorithm, support conditions are used to differentiate frequents and infrequent patterns. Thereafter, a tree is developed to incrementally maintain the essential information. We evaluate our contribution. The preliminary results are quite promising.

Keywords: real-time spatial big data, frequent itemset, transaction-based sliding window model, timestamp-based sliding window model, weighted frequent patterns, tree, stream query

Procedia PDF Downloads 124
1927 The Application of a Hybrid Neural Network for Recognition of a Handwritten Kazakh Text

Authors: Almagul Assainova , Dariya Abykenova, Liudmila Goncharenko, Sergey Sybachin, Saule Rakhimova, Abay Aman

Abstract:

The recognition of a handwritten Kazakh text is a relevant objective today for the digitization of materials. The study presents a model of a hybrid neural network for handwriting recognition, which includes a convolutional neural network and a multi-layer perceptron. Each network includes 1024 input neurons and 42 output neurons. The model is implemented in the program, written in the Python programming language using the EMNIST database, NumPy, Keras, and Tensorflow modules. The neural network training of such specific letters of the Kazakh alphabet as ә, ғ, қ, ң, ө, ұ, ү, h, і was conducted. The neural network model and the program created on its basis can be used in electronic document management systems to digitize the Kazakh text.

Keywords: handwriting recognition system, image recognition, Kazakh font, machine learning, neural networks

Procedia PDF Downloads 219
1926 Summarizing Data Sets for Data Mining by Using Statistical Methods in Coastal Engineering

Authors: Yunus Doğan, Ahmet Durap

Abstract:

Coastal regions are the one of the most commonly used places by the natural balance and the growing population. In coastal engineering, the most valuable data is wave behaviors. The amount of this data becomes very big because of observations that take place for periods of hours, days and months. In this study, some statistical methods such as the wave spectrum analysis methods and the standard statistical methods have been used. The goal of this study is the discovery profiles of the different coast areas by using these statistical methods, and thus, obtaining an instance based data set from the big data to analysis by using data mining algorithms. In the experimental studies, the six sample data sets about the wave behaviors obtained by 20 minutes of observations from Mersin Bay in Turkey and converted to an instance based form, while different clustering techniques in data mining algorithms were used to discover similar coastal places. Moreover, this study discusses that this summarization approach can be used in other branches collecting big data such as medicine.

Keywords: clustering algorithms, coastal engineering, data mining, data summarization, statistical methods

Procedia PDF Downloads 332
1925 Human Immunodeficiency Virus (HIV) Test Predictive Modeling and Identify Determinants of HIV Testing for People with Age above Fourteen Years in Ethiopia Using Data Mining Techniques: EDHS 2011

Authors: S. Abera, T. Gidey, W. Terefe

Abstract:

Introduction: Testing for HIV is the key entry point to HIV prevention, treatment, and care and support services. Hence, predictive data mining techniques can greatly benefit to analyze and discover new patterns from huge datasets like that of EDHS 2011 data. Objectives: The objective of this study is to build a predictive modeling for HIV testing and identify determinants of HIV testing for adults with age above fourteen years using data mining techniques. Methods: Cross-Industry Standard Process for Data Mining (CRISP-DM) was used to predict the model for HIV testing and explore association rules between HIV testing and the selected attributes among adult Ethiopians. Decision tree, Naïve-Bayes, logistic regression and artificial neural networks of data mining techniques were used to build the predictive models. Results: The target dataset contained 30,625 study participants; of which 16, 515 (53.9%) were women. Nearly two-fifth; 17,719 (58%), have never been tested for HIV while the rest 12,906 (42%) had been tested. Ethiopians with higher wealth index, higher educational level, belonging 20 to 29 years old, having no stigmatizing attitude towards HIV positive person, urban residents, having HIV related knowledge, information about family planning on mass media and knowing a place where to get testing for HIV showed an increased patterns with respect to HIV testing. Conclusion and Recommendation: Public health interventions should consider the identified determinants to promote people to get testing for HIV.

Keywords: data mining, HIV, testing, ethiopia

Procedia PDF Downloads 456
1924 The Effects of Three Pre-Reading Activities (Text Summary, Vocabulary Definition, and Pre-Passage Questions) on the Reading Comprehension of Iranian EFL Learners

Authors: Leila Anjomshoa, Firooz Sadighi

Abstract:

This study investigated the effects of three types of pre-reading activities (vocabulary definitions, text summary and pre-passage questions) on EFL learners’ English reading comprehension. On the basis of the results of a placement test administered to two hundred and thirty English students at Kerman Azad University, 200 subjects (one hundred intermediate and one hundred advanced) were selected.Four texts, two of them at intermediate level and two of them at advanced level were chosen. The data gathered was subjected to the statistical procedures of ANOVA. A close examination of the results through Tukey’s HSD showed the fact that the experimental groups performed better than the control group, highlighting the effect of the treatment on them. Also, the experimental group C (text summary), performed remarkably better than the other three groups (both experimental & control). Group B subjects, vocabulary definitions, performed better than groups A and D. The pre-passage questions group’s (D) performance showed higher scores than the control condition.

Keywords: pre-reading activities, text summary, vocabulary definition, and pre-passage questions, reading comprehension

Procedia PDF Downloads 309
1923 Analysis of Users’ Behavior on Book Loan Log Based on Association Rule Mining

Authors: Kanyarat Bussaban, Kunyanuth Kularbphettong

Abstract:

This research aims to create a model for analysis of student behavior using Library resources based on data mining technique in case of Suan Sunandha Rajabhat University. The model was created under association rules, apriori algorithm. The results were found 14 rules and the rules were tested with testing data set and it showed that the ability of classify data was 79.24 percent and the MSE was 22.91. The results showed that the user’s behavior model by using association rule technique can use to manage the library resources.

Keywords: behavior, data mining technique, a priori algorithm, knowledge discovery

Procedia PDF Downloads 371
1922 Entropy in a Field of Emergence in an Aspect of Linguo-Culture

Authors: Nurvadi Albekov

Abstract:

Communicative situation is a basis, which designates potential models of ‘constructed forms’, a motivated basis of a text, for a text can be assumed as a product of the communicative situation. It is within the field of emergence the models of text, that can be potentially prognosticated in a certain communicative situation, are designated. Every text can be assumed as conceptual system structured on the base of certain communicative situation. However in the process of ‘structuring’ of a certain model of ‘conceptual system’ consciousness of a recipient is able act only within the border of the field of emergence for going out of this border indicates misunderstanding of the communicative situation. On the base of communicative situation we can witness the increment of meaning where the synergizing of the informative model of communication, formed by using of the invariant units of a language system, is a result of verbalization of the communicative situation. The potential of the models of a text, prognosticated within the field of emergence, also depends on the communicative situation. The conception ‘the field of emergence’ is interpreted as a unit of the language system, having poly-directed universal structure, implying the presence of the core, the center and the periphery, including different levels of means of a functioning system of language, both in terms of linguistic resources, and in terms of extra linguistic factors interaction of which results increment of a text. The conception ‘field of emergence’ is considered as the most promising in the analysis of texts: oral, written, printed and electronic. As a unit of the language system field of emergence has several properties that predict its use during the study of a text in different levels. This work is an attempt analysis of entropy in a text in the aspect of lingua-cultural code, prognosticated within the model of the field of emergence. The article describes the problem of entropy in the field of emergence, caused by influence of the extra-linguistic factors. The increasing of entropy is caused not only by the fact of intrusion of the language resources but by influence of the alien culture in a whole, and by appearance of non-typical for this very culture symbols in the field of emergence. The borrowing of alien lingua-cultural symbols into the lingua-culture of the author is a reason of increasing the entropy when constructing a text both in meaning and in structuring level. It is nothing but artificial formatting of lexical units that violate stylistic unity of a phrase. It is marked that one of the important characteristics descending the entropy in the field of emergence is a typical similarity of lexical and semantic resources of the different lingua-cultures in aspects of extra linguistic factors.

Keywords: communicative situation, field of emergence, lingua-culture, entropy

Procedia PDF Downloads 327
1921 Understanding the Qualitative Nature of Product Reviews by Integrating Text Processing Algorithm and Usability Feature Extraction

Authors: Cherry Yieng Siang Ling, Joong Hee Lee, Myung Hwan Yun

Abstract:

The quality of a product to be usable has become the basic requirement in consumer’s perspective while failing the requirement ends up the customer from not using the product. Identifying usability issues from analyzing quantitative and qualitative data collected from usability testing and evaluation activities aids in the process of product design, yet the lack of studies and researches regarding analysis methodologies in qualitative text data of usability field inhibits the potential of these data for more useful applications. While the possibility of analyzing qualitative text data found with the rapid development of data analysis studies such as natural language processing field in understanding human language in computer, and machine learning field in providing predictive model and clustering tool. Therefore, this research aims to study the application capability of text processing algorithm in analysis of qualitative text data collected from usability activities. This research utilized datasets collected from LG neckband headset usability experiment in which the datasets consist of headset survey text data, subject’s data and product physical data. In the analysis procedure, which integrated with the text-processing algorithm, the process includes training of comments onto vector space, labeling them with the subject and product physical feature data, and clustering to validate the result of comment vector clustering. The result shows 'volume and music control button' as the usability feature that matches best with the cluster of comment vectors where centroid comments of a cluster emphasized more on button positions, while centroid comments of the other cluster emphasized more on button interface issues. When volume and music control buttons are designed separately, the participant experienced less confusion, and thus, the comments mentioned only about the buttons' positions. While in the situation where the volume and music control buttons are designed as a single button, the participants experienced interface issues regarding the buttons such as operating methods of functions and confusion of functions' buttons. The relevance of the cluster centroid comments with the extracted feature explained the capability of text processing algorithms in analyzing qualitative text data from usability testing and evaluations.

Keywords: usability, qualitative data, text-processing algorithm, natural language processing

Procedia PDF Downloads 244
1920 Analysis Mechanized Boring (TBM) of Tehran Subway Line 7

Authors: Shahin Shabani, Pouya Pourmadadi

Abstract:

Tunnel boring machines (TBMs) have been used for the construction of various tunnels for mining projects for the purpose of access, conveyance of ore and waste, drainage, exploration, water supply and water diversion. Several mining projects have seen the successful and economic beneficial use of TBMs, and there is an increasing awareness of the benefits of TBMs for mining projects. Key technical considerations for the use of TBMs for the construction of tunnels for mining projects include geological issues (rock type, rock alteration, rock strength, rock abrasivity, durability, ground water inflows), depth of cover and the potential for overstressing/rockbursts, site access and terrain, portal locations, TBM constraints, minimum tunnel size, tunnel support requirements, contractor and labor experience, and project schedule demands. This study focuses on tunnelling mining, with the goal to develop methods and tools to be used to gain understanding of these processes, and to analyze metro of Tehran. The Metro Line 7 of Tehran is one of the Longest (26 Km) and deepest (27m) of projects that’s under implementation. Because of major differences like passing under all geotechnical layers of the town and encountering part of it with underground water table and also using mechanized excavation system, is one of special metro projects.

Keywords: TBM, tunnel boring machines economic, metro, line 7

Procedia PDF Downloads 344
1919 A Hybrid Data Mining Algorithm Based System for Intelligent Defence Mission Readiness and Maintenance Scheduling

Authors: Shivam Dwivedi, Sumit Prakash Gupta, Durga Toshniwal

Abstract:

It is a challenging task in today’s date to keep defence forces in the highest state of combat readiness with budgetary constraints. A huge amount of time and money is squandered in the unnecessary and expensive traditional maintenance activities. To overcome this limitation Defence Intelligent Mission Readiness and Maintenance Scheduling System has been proposed, which ameliorates the maintenance system by diagnosing the condition and predicting the maintenance requirements. Based on new data mining algorithms, this system intelligently optimises mission readiness for imminent operations and maintenance scheduling in repair echelons. With modified data mining algorithms such as Weighted Feature Ranking Genetic Algorithm and SVM-Random Forest Linear ensemble, it improves the reliability, availability and safety, alongside reducing maintenance cost and Equipment Out of Action (EOA) time. The results clearly conclude that the introduced algorithms have an edge over the conventional data mining algorithms. The system utilizing the intelligent condition-based maintenance approach improves the operational and maintenance decision strategy of the defence force.

Keywords: condition based maintenance, data mining, defence maintenance, ensemble, genetic algorithms, maintenance scheduling, mission capability

Procedia PDF Downloads 254