Search results for: parallel data mining
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 25699

Search results for: parallel data mining

24889 The Right to Data Portability and Its Influence on the Development of Digital Services

Authors: Roman Bieda

Abstract:

The General Data Protection Regulation (GDPR) will come into force on 25 May 2018 which will create a new legal framework for the protection of personal data in the European Union. Article 20 of GDPR introduces a right to data portability. This right allows for data subjects to receive the personal data which they have provided to a data controller, in a structured, commonly used and machine-readable format, and to transmit this data to another data controller. The right to data portability, by facilitating transferring personal data between IT environments (e.g.: applications), will also facilitate changing the provider of services (e.g. changing a bank or a cloud computing service provider). Therefore, it will contribute to the development of competition and the digital market. The aim of this paper is to discuss the right to data portability and its influence on the development of new digital services.

Keywords: data portability, digital market, GDPR, personal data

Procedia PDF Downloads 456
24888 Developing Sustainable Tourism Practices in Communities Adjacent to Mines: An Exploratory Study in South Africa

Authors: Felicite Ann Fairer-Wessels

Abstract:

There has always been a disparity between mining and tourism mainly due to the socio-economic and environmental impacts of mines on both the adjacent resident communities and the areas taken up by the mining operation. Although heritage mining tourism has been actively and successfully pursued and developed in the UK, largely Wales, and Scandinavian countries, the debate whether active mining and tourism can have a mutually beneficial relationship remains imminent. This pilot study explores the relationship between the ‘to be developed’ future Nokeng Mine and its adjacent community, the rural community of Moloto, will be investigated in terms of whether sustainable tourism and livelihood activities can potentially be developed with the support of the mine. Concepts such as social entrepreneur, corporate social responsibility, sustainable development and triple bottom line are discussed. Within the South African context as a mineral rich developing country, the government has a statutory obligation to empower disenfranchised communities through social and labour plans and policies. All South African mines must preside over a Social and Labour Plan according to the Mineral and Petroleum Resources Development Act, No 28 of 2002. The ‘social’ component refers to the ‘social upliftment’ of communities within or adjacent to any mine; whereas the ‘labour’ component refers to the mine workers sourced from the specific community. A qualitative methodology is followed using the case study as research instrument for the Nokeng Mine and Moloto community with interviews and focus group discussions. The target population comprised of the Moloto Tribal Council members (8 in-depth interviews), the Moloto community members (17: focus groups); and the Nokeng Mine representatives (4 in-depth interviews). In this pilot study two disparate ‘worlds’ are potentially linked: on the one hand, the mine as social entrepreneur that is searching for feasible and sustainable ideas; and on the other hand, the community adjacent to the mine, with potentially sustainable tourism entrepreneurs that can tap into the resources of the mine should their ideas be feasible to build their businesses. Being an exploratory study the findings are limited but indicate that the possible success of tourism and sustainable livelihood activities lies in the fact that both the Mine and Community are keen to work together – the mine in terms of obtaining labour and profit; and the community in terms of improved and sustainable social and economic conditions; with both parties realizing the importance to mitigate negative environmental impacts. In conclusion, a relationship of trust is imperative between a mine and a community before a long term liaison is possible. However whether tourism is a viable solution for the community to engage in is debatable. The community could initially rather pursue the sustainable livelihoods approach and focus on life-supporting activities such as building, gardening, etc. that once established could feed into possible sustainable tourism activities.

Keywords: community development, mining tourism, sustainability, South Africa

Procedia PDF Downloads 282
24887 Experience Modularization for New Value of Evanescent Cultural Communities: Developing Creative Tourism Services in Bangkok

Authors: Wuttigrai Ngamsirijit

Abstract:

Creative tourism is an ongoing development in many countries as an attempt to moving away from serial reproduction of culture and reviving the culture. Despite, in the destinations with diverse and potential cultural resources, creating new tourism services can be vague. This paper presents how tourism experiences are modularized and consolidated in order to form new creative tourism service offerings in evanescent cultural communities of Bangkok, Thailand. The benefits from data mining in accommodating value co-creation are discussed, and implication of experience modularization to national creative tourism policy is addressed.

Keywords: co-creation, creative tourism, new service design, experience modularization

Procedia PDF Downloads 350
24886 Effect of Segregation Pattern of Mn, Si, and C on through Thickness Microstructure and Properties of Hot Rolled Steel

Authors: Waleed M. Al-Othman, Hamid Bayati, Abdullah Al-Shahrani, Haitham Al-Jabr

Abstract:

Pearlite bands commonly form parallel to the surface of the hot rolled steel and have significant influence on the properties of the steel. This study investigated the correlation between segregation pattern of Mn, Si, C and formation of the pearlite bands in hot rolled Gr 60 steel plate. Microstructural study indicated formation of a distinguished thick band at centerline of the plate with number of parallel bands through thickness of the steel plate. The thickness, frequency, and continuity of the bands are reduced from mid-thickness toward external surface of the steel plate. Analysis showed a noticeable increase of C, Si and Mn levels within the bands. Such alloying segregation takes place during metal solidification. EDS analysis verified presence of particles rich in Ti, Nb, Mn, C, N, within the bands. Texture analysis by Electron Backscatter Detector (EBSD) indicated the grains size/misorientation can noticeably change within the bands. Effect of banding on through-thickness properties of the steel was examined by carrying out microhardness, toughness and tensile tests. Results suggest the Mn and C contents are changed in sinusoidal pattern through thickness of the hot rolled plate and pearlite bands are formed at the peaks of this sinusoidal segregation pattern. Changes in grain size/misorientation, formation of highly alloyed particles, and pearlite within these bands, facilitate crack formation along boundaries of these bands.

Keywords: pearlite band, alloying segregation, hot rolling, Ti, Nb, N, C

Procedia PDF Downloads 121
24885 A Framework for Event-Based Monitoring of Business Processes in the Supply Chain Management of Industry 4.0

Authors: Johannes Atug, Andreas Radke, Mitchell Tseng, Gunther Reinhart

Abstract:

In modern supply chains, large numbers of SKU (Stock-Keeping-Unit) need to be timely managed, and any delays in noticing disruptions of items often limit the ability to defer the impact on customer order fulfillment. However, in supply chains of IoT-connected enterprises, the ERP (Enterprise-Resource-Planning), the MES (Manufacturing-Execution-System) and the SCADA (Supervisory-Control-and-Data-Acquisition) systems generate large amounts of data, which generally glean much earlier notice of deviations in the business process steps. That is, analyzing these streams of data with process mining techniques allows the monitoring of the supply chain business processes and thus identification of items that deviate from the standard order fulfillment process. In this paper, a framework to enable event-based SCM (Supply-Chain-Management) processes including an overview of core enabling technologies are presented, which is based on the RAMI (Reference-Architecture-Model for Industrie 4.0) architecture. The application of this framework in the industry is presented, and implications for SCM in industry 4.0 and further research are outlined.

Keywords: cyber-physical production systems, event-based monitoring, supply chain management, RAMI (Reference-Architecture-Model for Industrie 4.0)

Procedia PDF Downloads 215
24884 Assessment of Chromium Concentration and Human Health Risk in the Steelpoort River Sub-Catchment of the Olifants River Basin, South Africa

Authors: Abraham Addo-Bediako

Abstract:

Many freshwater ecosystems are facing immense pressure from anthropogenic activities, such as agricultural, industrial and mining. Trace metal pollution in freshwater ecosystems has become an issue of public health concern due to its toxicity and persistence in the environment. Trace elements pose a serious risk not only to the environment and aquatic biota but also humans. Chromium is one of such trace elements and its pollution in surface waters and groundwaters represents a serious environmental problem. In South Africa, agriculture, mining, industrial and domestic wastes are the main contributors to chromium discharge in rivers. The common forms of chromium are chromium (III) and chromium (VI). The latter is the most toxic because it can cause damage to human health. The aim of the study was to assess the contamination of chromium in the water and sediments of two rivers in the Steelpoort River sub-catchment of the Olifants River Basin, South Africa and human health risk. The concentration of Cr was analyzed using inductively coupled plasma–optical emission spectrometry (ICP-OES). The concentration of the metal was found to exceed the threshold limit, mainly in areas of high human activities. The hazard quotient through ingestion exposure did not exceed the threshold limit of 1 for adults and children and cancer risk for adults and children computed did not exceed the threshold limit of 10-4. Thus, there is no potential health risk from chromium through ingestion of drinking water for now. However, with increasing human activities, especially mining, the concentration could increase and become harmful to humans who depend on rivers for drinking water. It is recommended that proper management strategies should be taken to minimize the impact of chromium on the rivers and water from the rivers should properly be treated before domestic use.

Keywords: land use, health risk, metal pollution, water quality

Procedia PDF Downloads 65
24883 Three-Stage Mining Metals Supply Chain Coordination and Product Quality Improvement with Revenue Sharing Contract

Authors: Hamed Homaei, Iraj Mahdavi, Ali Tajdin

Abstract:

One of the main concerns of miners is to increase the quality level of their products because the mining metals price depends on their quality level; however, increasing the quality level of these products has different costs at different levels of the supply chain. These costs usually increase after extractor level. This paper studies the coordination issue of a decentralized three-level supply chain with one supplier (extractor), one mineral processor and one manufacturer in which the increasing product quality level cost at the processor level is higher than the supplier and at the level of the manufacturer is more than the processor. We identify the optimal product quality level for each supply chain member by designing a revenue sharing contract. Finally, numerical examples show that the designed contract not only increases the final product quality level but also provides a win-win condition for all supply chain members and increases the whole supply chain profit.

Keywords: three-stage supply chain, product quality improvement, channel coordination, revenue sharing

Procedia PDF Downloads 170
24882 Automatic Lexicon Generation for Domain Specific Dataset for Mining Public Opinion on China Pakistan Economic Corridor

Authors: Tayyaba Azim, Bibi Amina

Abstract:

The increase in the popularity of opinion mining with the rapid growth in the availability of social networks has attracted a lot of opportunities for research in the various domains of Sentiment Analysis and Natural Language Processing (NLP) using Artificial Intelligence approaches. The latest trend allows the public to actively use the internet for analyzing an individual’s opinion and explore the effectiveness of published facts. The main theme of this research is to account the public opinion on the most crucial and extensively discussed development projects, China Pakistan Economic Corridor (CPEC), considered as a game changer due to its promise of bringing economic prosperity to the region. So far, to the best of our knowledge, the theme of CPEC has not been analyzed for sentiment determination through the ML approach. This research aims to demonstrate the use of ML approaches to spontaneously analyze the public sentiment on Twitter tweets particularly about CPEC. Support Vector Machine SVM is used for classification task classifying tweets into positive, negative and neutral classes. Word2vec and TF-IDF features are used with the SVM model, a comparison of the trained model on manually labelled tweets and automatically generated lexicon is performed. The contributions of this work are: Development of a sentiment analysis system for public tweets on CPEC subject, construction of an automatic generation of the lexicon of public tweets on CPEC, different themes are identified among tweets and sentiments are assigned to each theme. It is worth noting that the applications of web mining that empower e-democracy by improving political transparency and public participation in decision making via social media have not been explored and practised in Pakistan region on CPEC yet.

Keywords: machine learning, natural language processing, sentiment analysis, support vector machine, Word2vec

Procedia PDF Downloads 133
24881 Patterns in Fish Diversity and Abundance of an Abandoned Gold Mine Reservoirs

Authors: O. E. Obayemi, M. A. Ayoade, O. O. Komolafe

Abstract:

Fish survey was carried out for an annual cycle covering both rainy and dry seasons using cast nets, gill nets and traps at two different reservoirs. The objective was to examined the fish assemblages of the reservoirs and provide more additional information on the reservoir. The fish species in the reservoirs comprised of twelve species of six families. The results of the study also showed that five species of fish were caught in reservoir five while ten fish species were captured in reservoir six. Species such as Malapterurus electricus, Ctenopoma kingsleyae, Mormyrus rume, Parachanna obscura, Sarotherodon galilaeus, Tilapia mariae, C. guntheri, Clarias macromystax, Coptodon zilii and Clarias gariepinus were caught during the sampling period. There was a significant difference (p=0.014, t = 1.711) in the abundance of fish species in the two reservoirs. Seasonally, reservoirs five (p=0.221, t = 1.859) and six (p=0.453, t = 1.734) showed there was no significant difference in their fish populations. Also, despite being impacted with gold mining the diversity indices were high when compared to less disturbed waterbodies. The study concluded that the environments recorded low abundant fish species which suggests the influence of mining on the abundance and diversity of fish species.

Keywords: Igun, fish, Shannon-Wiener Index, Simpson index, Pielou index

Procedia PDF Downloads 77
24880 The Operating Behaviour of Unbalanced Unpaced Merging Assembly Lines

Authors: S. Shaaban, T. McNamara, S. Hudson

Abstract:

This paper reports on the performance of deliberately unbalanced, reliable, non-automated and assembly lines that merge, whose workstations differ in terms of their mean operation times. Simulations are carried out on 5- and 8-station lines with 1, 2 and 4 buffer capacity units, % degrees of line imbalance of 2, 5 and 12, and 24 different patterns of means imbalance. Data on two performance measures, namely throughput and average buffer level were gathered, statistically analysed and compared to a merging balanced line counterpart. It was found that the best configurations are a balanced line arrangement and a monotone decreasing order for each of the parallel merging lines, with the first generally resulting in a lower throughput and the second leading to a lower average buffer level than those of a balanced line.

Keywords: average buffer level, merging lines, simulation, throughput, unbalanced

Procedia PDF Downloads 306
24879 The Structure and Function Investigation and Analysis of the Automatic Spin Regulator (ASR) in the Powertrain System of Construction and Mining Machines with the Focus on Dump Trucks

Authors: Amir Mirzaei

Abstract:

The powertrain system is one of the most basic and essential components in a machine. The occurrence of motion is practically impossible without the presence of this system. When power is generated by the engine, it is transmitted by the powertrain system to the wheels, which are the last parts of the system. Powertrain system has different components according to the type of use and design. When the force generated by the engine reaches to the wheels, the amount of frictional force between the tire and the ground determines the amount of traction and non-slip or the amount of slip. At various levels, such as icy, muddy, and snow-covered ground, the amount of friction coefficient between the tire and the ground decreases dramatically and considerably, which in turn increases the amount of force loss and the vehicle traction decreases drastically. This condition is caused by the phenomenon of slipping, which, in addition to the waste of energy produced, causes the premature wear of driving tires. It also causes the temperature of the transmission oil to rise too much, as a result, causes a reduction in the quality and become dirty to oil and also reduces the useful life of the clutches disk and plates inside the transmission. this issue is much more important in road construction and mining machinery than passenger vehicles and is always one of the most important and significant issues in the design discussion, in order to overcome. One of these methods is the automatic spin regulator system which is abbreviated as ASR. The importance of this method and its structure and function have solved one of the biggest challenges of the powertrain system in the field of construction and mining machinery. That this research is examined.

Keywords: automatic spin regulator, ASR, methods of reducing slipping, methods of preventing the reduction of the useful life of clutches disk and plate, methods of preventing the premature dirtiness of transmission oil, method of preventing the reduction of the useful life of tires

Procedia PDF Downloads 65
24878 Geochemical Baseline and Origin of Trace Elements in Soils and Sediments around Selibe-Phikwe Cu-Ni Mining Town, Botswana

Authors: Fiona S. Motswaiso, Kengo Nakamura, Takeshi Komai

Abstract:

Heavy metals may occur naturally in rocks and soils, but elevated quantities of them are being gradually released into the environment by anthropogenic activities such as mining. In order to address issues of heavy metal water and soil pollution, a distinction needs to be made between natural and anthropogenic anomalies. The current study aims at characterizing the spatial distribution of trace elements and evaluate site-specific geochemical background concentrations of trace elements in the mine soils examined, and also to discriminate between lithogenic and anthropogenic sources of enrichment around a copper-nickel mining town in Selibe-Phikwe, Botswana. A total of 20 Soil samples, 11 river sediment, and 9 river water samples were collected from an area of 625m² within the precincts of the mine and the smelter. The concentrations of metals (Cu, Ni, Pb, Zn, Cr, Ni, Mn, As, Pb, and Co) were determined by using an ICP-MS after digestion with aqua regia. Major elements were also determined using ED-XRF. Water pH and EC were measured on site and recorded while soil pH and EC were also determined in the laboratory after performing water elution tests. The highest Cu and Ni concentrations in soil are 593mg/kg and 453mg/kg respectively, which is 3 times higher than the crustal composition values and 2 times higher than the South African minimum allowable levels of heavy metals in soils. The level of copper contamination was higher than that of nickel and other contaminants. Water pH levels ranged from basic (9) to very acidic (3) in areas closer to the mine/smelter. There is high variation in heavy metal concentration, eg. Cu suggesting that some sites depict regional natural background concentrations while other depict anthropogenic sources.

Keywords: contamination, geochemical baseline, heavy metals, soils

Procedia PDF Downloads 138
24877 Relative Entropy Used to Determine the Divergence of Cells in Single Cell RNA Sequence Data Analysis

Authors: An Chengrui, Yin Zi, Wu Bingbing, Ma Yuanzhu, Jin Kaixiu, Chen Xiao, Ouyang Hongwei

Abstract:

Single cell RNA sequence (scRNA-seq) is one of the effective tools to study transcriptomics of biological processes. Recently, similarity measurement of cells is Euclidian distance or its derivatives. However, the process of scRNA-seq is a multi-variate Bernoulli event model, thus we hypothesize that it would be more efficient when the divergence between cells is valued with relative entropy than Euclidian distance. In this study, we compared the performances of Euclidian distance, Spearman correlation distance and Relative Entropy using scRNA-seq data of the early, medial and late stage of limb development generated in our lab. Relative Entropy is better than other methods according to cluster potential test. Furthermore, we developed KL-SNE, an algorithm modifying t-SNE whose definition of divergence between cells Euclidian distance to Kullback–Leibler divergence. Results showed that KL-SNE was more effective to dissect cell heterogeneity than t-SNE, indicating the better performance of relative entropy than Euclidian distance. Specifically, the chondrocyte expressing Comp was clustered together with KL-SNE but not with t-SNE. Surprisingly, cells in early stage were surrounded by cells in medial stage in the processing of KL-SNE while medial cells neighbored to late stage with the process of t-SNE. This results parallel to Heatmap which showed cells in medial stage were more heterogenic than cells in other stages. In addition, we also found that results of KL-SNE tend to follow Gaussian distribution compared with those of the t-SNE, which could also be verified with the analysis of scRNA-seq data from another study on human embryo development. Therefore, it is also an effective way to convert non-Gaussian distribution to Gaussian distribution and facilitate the subsequent statistic possesses. Thus, relative entropy is potentially a better way to determine the divergence of cells in scRNA-seq data analysis.

Keywords: Single cell RNA sequence, Similarity measurement, Relative Entropy, KL-SNE, t-SNE

Procedia PDF Downloads 327
24876 Risk Assessment of Trace Metals in the Soil Surface of an Abandoned Mine, El-Abed Northwestern Algeria

Authors: Farida Mellah, Abdelhak Boutaleb, Bachir Henni, Dalila Berdous, Abdelhamid Mellah

Abstract:

Context/Purpose: One of the largest mining operations for lead and zinc deposits in northwestern Algeria in more than thirty years, El Abed is now the abandoned mine that has been inactive since 2004, leaving large amounts of accumulated mining waste under the influence of Wind, erosion, rain, and near agricultural lands. Materials & Methods: This study aims to verify the concentrations and sources of heavy metals for surface samples containing randomly taken soil. Chemical analyses were performed using iCAP 7000 Series ICP-optical emission spectrometer, using a set of environmental quality indicators by calculating the enrichment factor using iron and aluminum references, geographic accumulation index and geographic information system (GIS). On the basis of the spatial distribution. Results: The results indicated that the average metal concentration was: (As = 30,82),(Pb = 1219,27), (Zn = 2855,94), (Cu = 5,3), mg/Kg,based on these results, all metals except Cu passed by GBV in the Earth's crust. Environmental quality indicators were calculated based on the concentrations of trace metals such as lead, arsenic, zinc, copper, iron and aluminum. Interpretation: This study investigated the concentrations and sources of trace metals, and by using quality indicators and statistical methods, lead, zinc, and arsenic were determined from human sources, while copper was a natural source. And based on the spatial analysis on the basis of GIS, many hot spots were identified in the El-Abed region. Conclusion: These results could help in the development of future treatment strategies aimed primarily at eliminating materials from mining waste.

Keywords: soil contamination, trace metals, geochemical indices, El Abed mine, Algeria

Procedia PDF Downloads 55
24875 Sentiment Analysis of Ensemble-Based Classifiers for E-Mail Data

Authors: Muthukumarasamy Govindarajan

Abstract:

Detection of unwanted, unsolicited mails called spam from email is an interesting area of research. It is necessary to evaluate the performance of any new spam classifier using standard data sets. Recently, ensemble-based classifiers have gained popularity in this domain. In this research work, an efficient email filtering approach based on ensemble methods is addressed for developing an accurate and sensitive spam classifier. The proposed approach employs Naive Bayes (NB), Support Vector Machine (SVM) and Genetic Algorithm (GA) as base classifiers along with different ensemble methods. The experimental results show that the ensemble classifier was performing with accuracy greater than individual classifiers, and also hybrid model results are found to be better than the combined models for the e-mail dataset. The proposed ensemble-based classifiers turn out to be good in terms of classification accuracy, which is considered to be an important criterion for building a robust spam classifier.

Keywords: accuracy, arcing, bagging, genetic algorithm, Naive Bayes, sentiment mining, support vector machine

Procedia PDF Downloads 123
24874 Quantum Statistical Machine Learning and Quantum Time Series

Authors: Omar Alzeley, Sergey Utev

Abstract:

Minimizing a constrained multivariate function is the fundamental of Machine learning, and these algorithms are at the core of data mining and data visualization techniques. The decision function that maps input points to output points is based on the result of optimization. This optimization is the central of learning theory. One approach to complex systems where the dynamics of the system is inferred by a statistical analysis of the fluctuations in time of some associated observable is time series analysis. The purpose of this paper is a mathematical transition from the autoregressive model of classical time series to the matrix formalization of quantum theory. Firstly, we have proposed a quantum time series model (QTS). Although Hamiltonian technique becomes an established tool to detect a deterministic chaos, other approaches emerge. The quantum probabilistic technique is used to motivate the construction of our QTS model. The QTS model resembles the quantum dynamic model which was applied to financial data. Secondly, various statistical methods, including machine learning algorithms such as the Kalman filter algorithm, are applied to estimate and analyses the unknown parameters of the model. Finally, simulation techniques such as Markov chain Monte Carlo have been used to support our investigations. The proposed model has been examined by using real and simulated data. We establish the relation between quantum statistical machine and quantum time series via random matrix theory. It is interesting to note that the primary focus of the application of QTS in the field of quantum chaos was to find a model that explain chaotic behaviour. Maybe this model will reveal another insight into quantum chaos.

Keywords: machine learning, simulation techniques, quantum probability, tensor product, time series

Procedia PDF Downloads 451
24873 Visual Template Detection and Compositional Automatic Regular Expression Generation for Business Invoice Extraction

Authors: Anthony Proschka, Deepak Mishra, Merlyn Ramanan, Zurab Baratashvili

Abstract:

Small and medium-sized businesses receive over 160 billion invoices every year. Since these documents exhibit many subtle differences in layout and text, extracting structured fields such as sender name, amount, and VAT rate from them automatically is an open research question. In this paper, existing work in template-based document extraction is extended, and a system is devised that is able to reliably extract all required fields for up to 70% of all documents in the data set, more than any other previously reported method. The approaches are described for 1) detecting through visual features which template a given document belongs to, 2) automatically generating extraction rules for a given new template by composing regular expressions from multiple components, and 3) computing confidence scores that indicate the accuracy of the automatic extractions. The system can generate templates with as little as one training sample and only requires the ground truth field values instead of detailed annotations such as bounding boxes that are hard to obtain. The system is deployed and used inside a commercial accounting software.

Keywords: data mining, information retrieval, business, feature extraction, layout, business data processing, document handling, end-user trained information extraction, document archiving, scanned business documents, automated document processing, F1-measure, commercial accounting software

Procedia PDF Downloads 110
24872 Neural Machine Translation for Low-Resource African Languages: Benchmarking State-of-the-Art Transformer for Wolof

Authors: Cheikh Bamba Dione, Alla Lo, Elhadji Mamadou Nguer, Siley O. Ba

Abstract:

In this paper, we propose two neural machine translation (NMT) systems (French-to-Wolof and Wolof-to-French) based on sequence-to-sequence with attention and transformer architectures. We trained our models on a parallel French-Wolof corpus of about 83k sentence pairs. Because of the low-resource setting, we experimented with advanced methods for handling data sparsity, including subword segmentation, back translation, and the copied corpus method. We evaluate the models using the BLEU score and find that transformer outperforms the classic seq2seq model in all settings, in addition to being less sensitive to noise. In general, the best scores are achieved when training the models on word-level-based units. For subword-level models, using back translation proves to be slightly beneficial in low-resource (WO) to high-resource (FR) language translation for the transformer (but not for the seq2seq) models. A slight improvement can also be observed when injecting copied monolingual text in the target language. Moreover, combining the copied method data with back translation leads to a substantial improvement of the translation quality.

Keywords: backtranslation, low-resource language, neural machine translation, sequence-to-sequence, transformer, Wolof

Procedia PDF Downloads 130
24871 From Text to Data: Sentiment Analysis of Presidential Election Political Forums

Authors: Sergio V Davalos, Alison L. Watkins

Abstract:

User generated content (UGC) such as website post has data associated with it: time of the post, gender, location, type of device, and number of words. The text entered in user generated content (UGC) can provide a valuable dimension for analysis. In this research, each user post is treated as a collection of terms (words). In addition to the number of words per post, the frequency of each term is determined by post and by the sum of occurrences in all posts. This research focuses on one specific aspect of UGC: sentiment. Sentiment analysis (SA) was applied to the content (user posts) of two sets of political forums related to the US presidential elections for 2012 and 2016. Sentiment analysis results in deriving data from the text. This enables the subsequent application of data analytic methods. The SASA (SAIL/SAI Sentiment Analyzer) model was used for sentiment analysis. The application of SASA resulted with a sentiment score for each post. Based on the sentiment scores for the posts there are significant differences between the content and sentiment of the two sets for the 2012 and 2016 presidential election forums. In the 2012 forums, 38% of the forums started with positive sentiment and 16% with negative sentiment. In the 2016 forums, 29% started with positive sentiment and 15% with negative sentiment. There also were changes in sentiment over time. For both elections as the election got closer, the cumulative sentiment score became negative. The candidate who won each election was in the more posts than the losing candidates. In the case of Trump, there were more negative posts than Clinton’s highest number of posts which were positive. KNIME topic modeling was used to derive topics from the posts. There were also changes in topics and keyword emphasis over time. Initially, the political parties were the most referenced and as the election got closer the emphasis changed to the candidates. The performance of the SASA method proved to predict sentiment better than four other methods in Sentibench. The research resulted in deriving sentiment data from text. In combination with other data, the sentiment data provided insight and discovery about user sentiment in the US presidential elections for 2012 and 2016.

Keywords: sentiment analysis, text mining, user generated content, US presidential elections

Procedia PDF Downloads 172
24870 Internet of Things, Edge and Cloud Computing in Rock Mechanical Investigation for Underground Surveys

Authors: Esmael Makarian, Ayub Elyasi, Fatemeh Saberi, Olusegun Stanley Tomomewo

Abstract:

Rock mechanical investigation is one of the most crucial activities in underground operations, especially in surveys related to hydrocarbon exploration and production, geothermal reservoirs, energy storage, mining, and geotechnics. There is a wide range of traditional methods for driving, collecting, and analyzing rock mechanics data. However, these approaches may not be suitable or work perfectly in some situations, such as fractured zones. Cutting-edge technologies have been provided to solve and optimize the mentioned issues. Internet of Things (IoT), Edge, and Cloud Computing technologies (ECt & CCt, respectively) are among the most widely used and new artificial intelligence methods employed for geomechanical studies. IoT devices act as sensors and cameras for real-time monitoring and mechanical-geological data collection of rocks, such as temperature, movement, pressure, or stress levels. Structural integrity, especially for cap rocks within hydrocarbon systems, and rock mass behavior assessment, to further activities such as enhanced oil recovery (EOR) and underground gas storage (UGS), or to improve safety risk management (SRM) and potential hazards identification (P.H.I), are other benefits from IoT technologies. EC techniques can process, aggregate, and analyze data immediately collected by IoT on a real-time scale, providing detailed insights into the behavior of rocks in various situations (e.g., stress, temperature, and pressure), establishing patterns quickly, and detecting trends. Therefore, this state-of-the-art and useful technology can adopt autonomous systems in rock mechanical surveys, such as drilling and production (in hydrocarbon wells) or excavation (in mining and geotechnics industries). Besides, ECt allows all rock-related operations to be controlled remotely and enables operators to apply changes or make adjustments. It must be mentioned that this feature is very important in environmental goals. More often than not, rock mechanical studies consist of different data, such as laboratory tests, field operations, and indirect information like seismic or well-logging data. CCt provides a useful platform for storing and managing a great deal of volume and different information, which can be very useful in fractured zones. Additionally, CCt supplies powerful tools for predicting, modeling, and simulating rock mechanical information, especially in fractured zones within vast areas. Also, it is a suitable source for sharing extensive information on rock mechanics, such as the direction and size of fractures in a large oil field or mine. The comprehensive review findings demonstrate that digital transformation through integrated IoT, Edge, and Cloud solutions is revolutionizing traditional rock mechanical investigation. These advanced technologies have empowered real-time monitoring, predictive analysis, and data-driven decision-making, culminating in noteworthy enhancements in safety, efficiency, and sustainability. Therefore, by employing IoT, CCt, and ECt, underground operations have experienced a significant boost, allowing for timely and informed actions using real-time data insights. The successful implementation of IoT, CCt, and ECt has led to optimized and safer operations, optimized processes, and environmentally conscious approaches in underground geological endeavors.

Keywords: rock mechanical studies, internet of things, edge computing, cloud computing, underground surveys, geological operations

Procedia PDF Downloads 40
24869 How to Use Big Data in Logistics Issues

Authors: Mehmet Akif Aslan, Mehmet Simsek, Eyup Sensoy

Abstract:

Big Data stands for today’s cutting-edge technology. As the technology becomes widespread, so does Data. Utilizing massive data sets enable companies to get competitive advantages over their adversaries. Out of many area of Big Data usage, logistics has significance role in both commercial sector and military. This paper lays out what big data is and how it is used in both military and commercial logistics.

Keywords: big data, logistics, operational efficiency, risk management

Procedia PDF Downloads 626
24868 E4D-MP: Time-Lapse Multiphysics Simulation and Joint Inversion Toolset for Large-Scale Subsurface Imaging

Authors: Zhuanfang Fred Zhang, Tim C. Johnson, Yilin Fang, Chris E. Strickland

Abstract:

A variety of geophysical techniques are available to image the opaque subsurface with little or no contact with the soil. It is common to conduct time-lapse surveys of different types for a given site for improved results of subsurface imaging. Regardless of the chosen survey methods, it is often a challenge to process the massive amount of survey data. The currently available software applications are generally based on the one-dimensional assumption for a desktop personal computer. Hence, they are usually incapable of imaging the three-dimensional (3D) processes/variables in the subsurface of reasonable spatial scales; the maximum amount of data that can be inverted simultaneously is often very small due to the capability limitation of personal computers. Presently, high-performance or integrating software that enables real-time integration of multi-process geophysical methods is needed. E4D-MP enables the integration and inversion of time-lapsed large-scale data surveys from geophysical methods. Using the supercomputing capability and parallel computation algorithm, E4D-MP is capable of processing data across vast spatiotemporal scales and in near real time. The main code and the modules of E4D-MP for inverting individual or combined data sets of time-lapse 3D electrical resistivity, spectral induced polarization, and gravity surveys have been developed and demonstrated for sub-surface imaging. E4D-MP provides capability of imaging the processes (e.g., liquid or gas flow, solute transport, cavity development) and subsurface properties (e.g., rock/soil density, conductivity) critical for successful control of environmental engineering related efforts such as environmental remediation, carbon sequestration, geothermal exploration, and mine land reclamation, among others.

Keywords: gravity survey, high-performance computing, sub-surface monitoring, electrical resistivity tomography

Procedia PDF Downloads 139
24867 Evaluation of the CRISP-DM Business Understanding Step: An Approach for Assessing the Predictive Power of Regression versus Classification for the Quality Prediction of Hydraulic Test Results

Authors: Christian Neunzig, Simon Fahle, Jürgen Schulz, Matthias Möller, Bernd Kuhlenkötter

Abstract:

Digitalisation in production technology is a driver for the application of machine learning methods. Through the application of predictive quality, the great potential for saving necessary quality control can be exploited through the data-based prediction of product quality and states. However, the serial use of machine learning applications is often prevented by various problems. Fluctuations occur in real production data sets, which are reflected in trends and systematic shifts over time. To counteract these problems, data preprocessing includes rule-based data cleaning, the application of dimensionality reduction techniques, and the identification of comparable data subsets to extract stable features. Successful process control of the target variables aims to centre the measured values around a mean and minimise variance. Competitive leaders claim to have mastered their processes. As a result, much of the real data has a relatively low variance. For the training of prediction models, the highest possible generalisability is required, which is at least made more difficult by this data availability. The implementation of a machine learning application can be interpreted as a production process. The CRoss Industry Standard Process for Data Mining (CRISP-DM) is a process model with six phases that describes the life cycle of data science. As in any process, the costs to eliminate errors increase significantly with each advancing process phase. For the quality prediction of hydraulic test steps of directional control valves, the question arises in the initial phase whether a regression or a classification is more suitable. In the context of this work, the initial phase of the CRISP-DM, the business understanding, is critically compared for the use case at Bosch Rexroth with regard to regression and classification. The use of cross-process production data along the value chain of hydraulic valves is a promising approach to predict the quality characteristics of workpieces. Suitable methods for leakage volume flow regression and classification for inspection decision are applied. Impressively, classification is clearly superior to regression and achieves promising accuracies.

Keywords: classification, CRISP-DM, machine learning, predictive quality, regression

Procedia PDF Downloads 131
24866 Document-level Sentiment Analysis: An Exploratory Case Study of Low-resource Language Urdu

Authors: Ammarah Irum, Muhammad Ali Tahir

Abstract:

Document-level sentiment analysis in Urdu is a challenging Natural Language Processing (NLP) task due to the difficulty of working with lengthy texts in a language with constrained resources. Deep learning models, which are complex neural network architectures, are well-suited to text-based applications in addition to data formats like audio, image, and video. To investigate the potential of deep learning for Urdu sentiment analysis, we implemented five different deep learning models, including Bidirectional Long Short Term Memory (BiLSTM), Convolutional Neural Network (CNN), Convolutional Neural Network with Bidirectional Long Short Term Memory (CNN-BiLSTM), and Bidirectional Encoder Representation from Transformer (BERT). In this study, we developed a hybrid deep learning model called BiLSTM-Single Layer Multi Filter Convolutional Neural Network (BiLSTM-SLMFCNN) by fusing BiLSTM and CNN architecture. The proposed and baseline techniques are applied on Urdu Customer Support data set and IMDB Urdu movie review data set by using pre-trained Urdu word embedding that are suitable for sentiment analysis at the document level. Results of these techniques are evaluated and our proposed model outperforms all other deep learning techniques for Urdu sentiment analysis. BiLSTM-SLMFCNN outperformed the baseline deep learning models and achieved 83%, 79%, 83% and 94% accuracy on small, medium and large sized IMDB Urdu movie review data set and Urdu Customer Support data set respectively.

Keywords: urdu sentiment analysis, deep learning, natural language processing, opinion mining, low-resource language

Procedia PDF Downloads 52
24865 Efficiency and Scale Elasticity in Network Data Envelopment Analysis: An Application to International Tourist Hotels in Taiwan

Authors: Li-Hsueh Chen

Abstract:

Efficient operation is more and more important for managers of hotels. Unlike the manufacturing industry, hotels cannot store their products. In addition, many hotels provide room service, and food and beverage service simultaneously. When efficiencies of hotels are evaluated, the internal structure should be considered. Hence, based on the operational characteristics of hotels, this study proposes a DEA model to simultaneously assess the efficiencies among the room production division, food and beverage production division, room service division and food and beverage service division. However, not only the enhancement of efficiency but also the adjustment of scale can improve the performance. In terms of the adjustment of scale, scale elasticity or returns to scale can help to managers to make decisions concerning expansion or contraction. In order to construct a reasonable approach to measure the efficiencies and scale elasticities of hotels, this study builds an alternative variable-returns-to-scale-based two-stage network DEA model with the combination of parallel and series structures to explore the scale elasticities of the whole system, room production division, food and beverage production division, room service division and food and beverage service division based on the data of international tourist hotel industry in Taiwan. The results may provide valuable information on operational performance and scale for managers and decision makers.

Keywords: efficiency, scale elasticity, network data envelopment analysis, international tourist hotel

Procedia PDF Downloads 213
24864 The Changes in Motivations and the Use of Translation Strategies in Crowdsourced Translation: A Case Study on Global Voices’ Chinese Translation Project

Authors: Ya-Mei Chen

Abstract:

Online crowdsourced translation, an innovative translation practice brought by Web 2.0 technologies and the democratization of information, has become increasingly popular in the Internet era. Carried out by grass-root internet users, crowdsourced translation contains fundamentally different features from its off-line traditional counterpart, such as voluntary participation and parallel collaboration. To better understand such a participatory and collaborative nature, this paper will use the online Chinese translation project of Global Voices as a case study to investigate the following issues: (1) the changes in volunteer translators’ and reviewers’ motivations for participation, (2) translators’ and reviewers’ use of translation strategies and (3) the correlations of translators’ and reviewers’ motivations and strategies with the organizational mission, the translation style guide, the translator-reviewer interaction, the mediation of the translation platform and various types of capital within the translation field. With an aim to systematically explore the above three issues, this paper will collect both quantitative and qualitative data and then draw upon Engestrom’s activity theory and Bourdieu’s field theory as a theoretical framework to analyze the data in question. An online anonymous questionnaire will be conducted to obtain the quantitative data. The questionnaire will contain questions related to volunteer translators’ and reviewers’ backgrounds, participation motivations, translation strategies and mutual relations as well as the operation of the translation platform. Concerning the qualitative data, they will come from (1) a comparative study between some English news texts published on Global Voices and their Chinese translations, (2) an analysis of the online discussion forum associated with Global Voices’ Chinese translation project and (3) the information about the project’s translation mission and guidelines. It is hoped that this research, through a detailed sociological analysis of a cause-driven crowdsourced translation project, can enable translation researchers and practitioners to adequately meet the translation challenges appearing in the digital age.

Keywords: crowdsourced translation, global voices, motivation, translation strategies

Procedia PDF Downloads 358
24863 The Concentration of Natural Alpha Emitters Radionuclides in Fish and Their Contribution to the Internal Dose

Authors: Wagner Pereira, Alphonse Kelecom

Abstract:

Mining can impact the environment, and the major impact of some mining activities is the radiological impact. In human populations, such impact is well studied and regulated. For biota, this assessment always had as focus the protection of human food chain. The protection of biota itself is a new approach, still developing. In order to contribute to this new approach, fish collecting was carried out in areas of naturally occurring radioactive materials (NORM), where a uranium mine is in decommissioning phase. The activity concentrations were analyzed, in Bq/kg wet weight, for Uranium (Unat), Th-232 and Ra-226 in the lambari fish Astyanax bimaculatus L. (omnivorous fish) and in the traíra fish Hoplias malabaricus Bloch, 1794 (carnivorous fish). Seven composite samples (that is: a sufficient number of individuals to reach at least 2 kg of fresh weight) were collected every six months between 2013 and 2015. The mean activity concentrations (AC) for uranium ranged from 1.12 (lambari) to 0.60 (lungfish). For Th, variations ranged from 0.30 to 0.05 (lambari and traíra, respectively). Finally, the Ra-226 means ranged between 0.08 and 0.03. No temporal trends of accumulation could be identified. Systematically, the AC values of radionuclides were higher in omnivorous fish when compared to the carnivore ones.

Keywords: biota dose, NORM, fish, environmental protection

Procedia PDF Downloads 236
24862 Biosorption of Gold from Chloride Media in a Simultaneous Adsorption-Reduction Process

Authors: Shafiq Alam, Yen Ning Lee

Abstract:

Conventional hydrometallurgical processing of metals involves the use of large quantities of toxic chemicals. Realizing a need to develop sustainable technologies, extensive research studies are being carried out to recover and recycle base, precious and rare earth metals from their pregnant leach solutions (PLS) using green chemicals/biomaterials prepared from biomass wastes derived from agriculture, marine and forest resources. Our innovative research showed that bio-adsorbents prepared from such biomass wastes can effectively adsorb precious metals, especially gold after conversion of their functional groups in a very simple process. The highly effective ‘Adsorption-coupled-Reduction’ phenomenon witnessed appears promising for the potential use of this gold biosorption process in the mining industry. Proper management and effective use of biomass wastes as value added green chemicals will not only reduce the volume of wastes being generated every day in our society, but will also have a high-end value to the mining and mineral processing industries as those biomaterials would be cheap, but very selective for gold recovery/recycling from low grade ore, leach residue or e-wastes.

Keywords: biosorption, hydrometallurgy, gold, adsorption, reduction, biomass, sustainability

Procedia PDF Downloads 362
24861 Copyright Clearance for Artificial Intelligence Training Data: Challenges and Solutions

Authors: Erva Akin

Abstract:

– The use of copyrighted material for machine learning purposes is a challenging issue in the field of artificial intelligence (AI). While machine learning algorithms require large amounts of data to train and improve their accuracy and creativity, the use of copyrighted material without permission from the authors may infringe on their intellectual property rights. In order to overcome copyright legal hurdle against the data sharing, access and re-use of data, the use of copyrighted material for machine learning purposes may be considered permissible under certain circumstances. For example, if the copyright holder has given permission to use the data through a licensing agreement, then the use for machine learning purposes may be lawful. It is also argued that copying for non-expressive purposes that do not involve conveying expressive elements to the public, such as automated data extraction, should not be seen as infringing. The focus of such ‘copy-reliant technologies’ is on understanding language rules, styles, and syntax and no creative ideas are being used. However, the non-expressive use defense is within the framework of the fair use doctrine, which allows the use of copyrighted material for research or educational purposes. The questions arise because the fair use doctrine is not available in EU law, instead, the InfoSoc Directive provides for a rigid system of exclusive rights with a list of exceptions and limitations. One could only argue that non-expressive uses of copyrighted material for machine learning purposes do not constitute a ‘reproduction’ in the first place. Nevertheless, the use of machine learning with copyrighted material is difficult because EU copyright law applies to the mere use of the works. Two solutions can be proposed to address the problem of copyright clearance for AI training data. The first is to introduce a broad exception for text and data mining, either mandatorily or for commercial and scientific purposes, or to permit the reproduction of works for non-expressive purposes. The second is that copyright laws should permit the reproduction of works for non-expressive purposes, which opens the door to discussions regarding the transposition of the fair use principle from the US into EU law. Both solutions aim to provide more space for AI developers to operate and encourage greater freedom, which could lead to more rapid innovation in the field. The Data Governance Act presents a significant opportunity to advance these debates. Finally, issues concerning the balance of general public interests and legitimate private interests in machine learning training data must be addressed. In my opinion, it is crucial that robot-creation output should fall into the public domain. Machines depend on human creativity, innovation, and expression. To encourage technological advancement and innovation, freedom of expression and business operation must be prioritised.

Keywords: artificial intelligence, copyright, data governance, machine learning

Procedia PDF Downloads 67
24860 Weighted-Distance Sliding Windows and Cooccurrence Graphs for Supporting Entity-Relationship Discovery in Unstructured Text

Authors: Paolo Fantozzi, Luigi Laura, Umberto Nanni

Abstract:

The problem of Entity relation discovery in structured data, a well covered topic in literature, consists in searching within unstructured sources (typically, text) in order to find connections among entities. These can be a whole dictionary, or a specific collection of named items. In many cases machine learning and/or text mining techniques are used for this goal. These approaches might be unfeasible in computationally challenging problems, such as processing massive data streams. A faster approach consists in collecting the cooccurrences of any two words (entities) in order to create a graph of relations - a cooccurrence graph. Indeed each cooccurrence highlights some grade of semantic correlation between the words because it is more common to have related words close each other than having them in the opposite sides of the text. Some authors have used sliding windows for such problem: they count all the occurrences within a sliding windows running over the whole text. In this paper we generalise such technique, coming up to a Weighted-Distance Sliding Window, where each occurrence of two named items within the window is accounted with a weight depending on the distance between items: a closer distance implies a stronger evidence of a relationship. We develop an experiment in order to support this intuition, by applying this technique to a data set consisting in the text of the Bible, split into verses.

Keywords: cooccurrence graph, entity relation graph, unstructured text, weighted distance

Procedia PDF Downloads 132