Search results for: geological geophysical geochemical and minerogenic data
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 25716

Search results for: geological geophysical geochemical and minerogenic data

25176 Shale Gas Accumulation of Over-Mature Cambrian Niutitang Formation Shale in Structure-Complicated Area, Southeastern Margin of Upper Yangtze, China

Authors: Chao Yang, Jinchuan Zhang, Yongqiang Xiong

Abstract:

The Lower Cambrian Niutitang Formation shale (NFS) deposited in the marine deep-shelf environment in Southeast Upper Yangtze (SUY), possess excellent source rock basis for shale gas generation, however, it is currently challenged by being over-mature with strong tectonic deformations, leading to much uncertainty of gas-bearing potential. With emphasis on the shale gas enrichment of the NFS, analyses were made based on the regional gas-bearing differences obtained from field gas-desorption testing of 18 geological survey wells across the study area. Results show that the NFS bears low gas content of 0.2-2.5 m³/t, and the eastern region of SUY is higher than the western region in gas content. Moreover, the methane fraction also presents the similar regional differentiation with the western region less than 10 vol.% while the eastern region generally more than 70 vol.%. Through the analysis of geological theory, the following conclusions are drawn: Depositional environment determines the gas-enriching zones. In the western region, the Dengying Formation underlying the NFS in unconformity contact was mainly plateau facies dolomite with caves and thereby bears poor gas-sealing ability. Whereas the Laobao Formation underling the NFS in eastern region was a set of siliceous rocks of shelf-slope facies, which can effectively prevent the shale gas from escaping away from the NFS. The tectonic conditions control the gas-enriching bands in the SUY, which is located in the fold zones formed by the thrust of the Southern China plate towards to the Sichuan Basin. Compared with the western region located in the trough-like folds, the eastern region at the fold-thrust belts was uplifted early and deformed weakly, resulting in the relatively less mature level and relatively slight tectonic deformation of the NFS. Faults determine whether shale gas can be accumulated in large scale. Four deep and large normal faults in the study area cut through the Niutitang Formation to the Sinian strata, directly causing a large spillover of natural gas in the adjacent areas. For the secondary faults developed within the shale formation, the reverse faults generally have a positive influence on the shale accumulation while the normal faults perform the opposite influence. Overall, shale gas enrichment targets of the NFS, are the areas with certain thickness of siliceous rocks at the basement of the Niutitang Formation, and near the margin of the paleouplift with less developed faults. These findings provide direction for shale gas exploration in South China, and also provide references for the areas with similar geological conditions all over the world.

Keywords: over-mature marine shale, shale gas accumulation, structure-complicated area, Southeast Upper Yangtze

Procedia PDF Downloads 153
25175 Geological and Geotechnical Approach for Stabilization of Cut-Slopes in Power House Area of Luhri HEP Stage-I (210 MW), India

Authors: S. P. Bansal, Mukesh Kumar Sharma, Ankit Prabhakar

Abstract:

Luhri Hydroelectric Project Stage-I (210 MW) is a run of the river type development with a dam toe surface powerhouse (122m long, 50.50m wide, and 65.50m high) on the right bank of river Satluj in Himachal Pradesh, India. The project is located in the inner lesser Himalaya between Dhauladhar Range in the south and higher Himalaya in the north in the seismically active region. At the project, the location river is confined within narrow V-shaped valleys with little or no flat areas close to the river bed. Nearly 120m high cut slopes behind the powerhouse are proposed from the powerhouse foundation level of 795m to ± 915m to accommodate the surface powerhouse. The stability of 120m high cut slopes is a prime concern for the reason of risk involved. The slopes behind the powerhouse will be excavated in mainly in augen gneiss, fresh to weathered in nature, and biotite rich at places. The foliation joints are favorable and dipping inside the hill. Two valleys dipping steeper joints will be encountered on the slopes, which can cause instability during excavation. Geological exploration plays a vital role in designing and optimization of cut slopes. SWEDGE software has been used to analyze the geometry and stability of surface wedges in cut slopes. The slopes behind powerhouse have been analyzed in three zones for stability analysis by providing a break in the continuity of cut slopes, which shall provide quite substantial relief for slope stabilization measure. Pseudo static analysis has been carried out for the stabilization of wedges. The results indicate that many large wedges are forming, which have a factor of safety less than 1. The stability measures (support system, bench width, slopes) have been planned so that no wedge failure may occur in the future.

Keywords: cut slopes, geotechnical investigations, Himalayan geology, surface powerhouse, wedge failure

Procedia PDF Downloads 119
25174 Identifying Karst Pattern to Prevent Bell Spring from Being Submerged in Daryan Dam Reservoir

Authors: H. Shafaattalab Dehghani, H. R. Zarei

Abstract:

The large karstic Bell spring with a discharge ranging between 250 and 5300 lit/ sec is one of the most important springs of Kermanshah Province. This spring supplies drinking water of Nodsheh City and its surrounding villages. The spring is located in the reservoir of Daryan Dam and its mouth would be submerged after impounding under a water column of about 110 m height. This paper has aimed to render an account of the karstification pattern around the spring under consideration with the intention of preventing Bell Spring from being submerged in Daryan Dam Reservoir. The studies comprise engineering geology and hydrogeology investigations. Some geotechnical activities included in these studies include geophysical studies, drilling, excavation of exploratory gallery and shaft and diving. The results depict that Bell is a single-conduit siphon spring with 4 m diameter and 85 m height that 32 m of the conduit is located below the spring outlet. To survive the spring, it was decided to plug the outlet and convey the water to upper elevations under the natural pressure of the aquifer. After plugging, water was successfully conveyed to elevation 837 meter above sea level (about 120 m from the outlet) under the natural pressure of the aquifer. This signifies the accuracy of the studies done and proper recognition of the karstification pattern of Bell Spring. This is a unique experience in karst problems in Iran.

Keywords: bell spring, Karst, Daryan Dam, submerged

Procedia PDF Downloads 277
25173 3D Modeling for Frequency and Time-Domain Airborne EM Systems with Topography

Authors: C. Yin, B. Zhang, Y. Liu, J. Cai

Abstract:

Airborne EM (AEM) is an effective geophysical exploration tool, especially suitable for ridged mountain areas. In these areas, topography will have serious effects on AEM system responses. However, until now little study has been reported on topographic effect on airborne EM systems. In this paper, an edge-based unstructured finite-element (FE) method is developed for 3D topographic modeling for both frequency and time-domain airborne EM systems. Starting from the frequency-domain Maxwell equations, a vector Helmholtz equation is derived to obtain a stable and accurate solution. Considering that the AEM transmitter and receiver are both located in the air, the scattered field method is used in our modeling. The Galerkin method is applied to discretize the Helmholtz equation for the final FE equations. Solving the FE equations, the frequency-domain AEM responses are obtained. To accelerate the calculation speed, the response of source in free-space is used as the primary field and the PARDISO direct solver is used to deal with the problem with multiple transmitting sources. After calculating the frequency-domain AEM responses, a Hankel’s transform is applied to obtain the time-domain AEM responses. To check the accuracy of present algorithm and to analyze the characteristic of topographic effect on airborne EM systems, both the frequency- and time-domain AEM responses for 3 model groups are simulated: 1) a flat half-space model that has a semi-analytical solution of EM response; 2) a valley or hill earth model; 3) a valley or hill earth with an abnormal body embedded. Numerical experiments show that close to the node points of the topography, AEM responses demonstrate sharp changes. Special attentions need to be paid to the topographic effects when interpreting AEM survey data over rugged topographic areas. Besides, the profile of the AEM responses presents a mirror relation with the topographic earth surface. In comparison to the topographic effect that mainly occurs at the high-frequency end and early time channels, the EM responses of underground conductors mainly occur at low frequencies and later time channels. For the signal of the same time channel, the dB/dt field reflects the change of conductivity better than the B-field. The research of this paper will serve airborne EM in the identification and correction of the topographic effects.

Keywords: 3D, Airborne EM, forward modeling, topographic effect

Procedia PDF Downloads 322
25172 Influence of Strike-Slip Faulting in the Tectonic Evolution of North-Eastern Tunisia

Authors: Aymen Arfaoui, Abdelkader Soumaya, Ali Kadri, Noureddine Ben Ayed

Abstract:

The major contractional events characterized by strike-slip faulting, folding, and thrusting occurred in the Eocene, Late Miocene, and Quaternary along with the NE Tunisian domain between Bou Kornine-Ressas- Msella and Cap Bon Peninsula. During the Plio-Quaternary, the Grombalia and Mornag grabens show a maximum of collapse in parallelism with the NNW-SSE SHmax direction and developed as 3rd order extensive regions within a regional compressional regime. Using available tectonic and geophysical data supplemented by new fault-kinematic observations, we show that Cenozoic deformations are dominated by first order N-S faults reactivation, this sinistral wrench system is responsible for the formation of strike-slip duplexes, thrusts, folds, and grabens. Based on our new structural interpretation, the major faults of N-S Axis, Bou Kornine-Ressas-Messella (MRB), and Hammamet-Korbous (HK) form an N-S first order restraining stepover within a left-lateral strike-slip duplex. The N-S master MRB fault is dominated by contractional imbricate fans, while the parallel HK fault is characterized by a trailing of extensional imbricate fans. The Eocene and Miocene compression phases in the study area caused sinistral strike-slip reactivation of pre-existing N-S faults, reverse reactivation of NE-SW trending faults, and normal-oblique reactivation of NW-SE faults, creating a NE-SW to N-S trending system of east-verging folds and overlaps. Seismic tomography images reveal a key role for the lithospheric subvertical tear or STEP fault (Slab Transfer Edge Propagator) evidenced below this region on the development of the MRB and the HK relay zone. The presence of extensive syntectonic Pliocene sequences above this crustal scale fault may be the result of a recent lithospheric vertical motion of this STEP fault due to the rollback and lateral migration of the Calabrian slab eastward.

Keywords: Tunisia, strike-slip fault, contractional duplex, tectonic stress, restraining stepover, STEP fault

Procedia PDF Downloads 136
25171 Influence of Water Reservoir Parameters on the Climate and Coastal Areas

Authors: Lia Matchavariani

Abstract:

Water reservoir construction on the rivers flowing into the sea complicates the coast protection, seashore starts to degrade causing coast erosion and disaster on the backdrop of current climate change. The instruments of the impact of a water reservoir on the climate and coastal areas are its contact surface with the atmosphere and the area irrigated with its water or humidified with infiltrated waters. The Black Sea coastline is characterized by the highest ecological vulnerability. The type and intensity of the water reservoir impact are determined by its morphometry, type of regulation, level regime, and geomorphological and geological characteristics of the adjoining area. Studies showed the impact of the water reservoir on the climate, on its comfort parameters is positive if it is located in the zone of insufficient humidity and vice versa, is negative if the water reservoir is found in the zone with abundant humidity. There are many natural and anthropogenic factors determining the peculiarities of the impact of the water reservoir on the climate, which can be assessed with maximum accuracy by the so-called “long series” method, which operates on the meteorological elements (temperature, wind, precipitations, etc.) with the long series formed with the stationary observation data. This is the time series, which consists of two periods with statistically sufficient duration. The first period covers the observations up to the formation of the water reservoir and another period covers the observations accomplished during its operation. If no such data are available, or their series is statistically short, “an analog” method is used. Such an analog water reservoir is selected based on the similarity of the environmental conditions. It must be located within the zone of the designed water reservoir, under similar environmental conditions, and besides, a sufficient number of observations accomplished in its coastal zone.

Keywords: coast-constituent sediment, eustasy, meteorological parameters, seashore degradation, water reservoirs impact

Procedia PDF Downloads 49
25170 Data Mining in Medicine Domain Using Decision Trees and Vector Support Machine

Authors: Djamila Benhaddouche, Abdelkader Benyettou

Abstract:

In this paper, we used data mining to extract biomedical knowledge. In general, complex biomedical data collected in studies of populations are treated by statistical methods, although they are robust, they are not sufficient in themselves to harness the potential wealth of data. For that you used in step two learning algorithms: the Decision Trees and Support Vector Machine (SVM). These supervised classification methods are used to make the diagnosis of thyroid disease. In this context, we propose to promote the study and use of symbolic data mining techniques.

Keywords: biomedical data, learning, classifier, algorithms decision tree, knowledge extraction

Procedia PDF Downloads 564
25169 Analysis of Different Classification Techniques Using WEKA for Diabetic Disease

Authors: Usama Ahmed

Abstract:

Data mining is the process of analyze data which are used to predict helpful information. It is the field of research which solve various type of problem. In data mining, classification is an important technique to classify different kind of data. Diabetes is most common disease. This paper implements different classification technique using Waikato Environment for Knowledge Analysis (WEKA) on diabetes dataset and find which algorithm is suitable for working. The best classification algorithm based on diabetic data is Naïve Bayes. The accuracy of Naïve Bayes is 76.31% and take 0.06 seconds to build the model.

Keywords: data mining, classification, diabetes, WEKA

Procedia PDF Downloads 152
25168 Comprehensive Study of Data Science

Authors: Asifa Amara, Prachi Singh, Kanishka, Debargho Pathak, Akshat Kumar, Jayakumar Eravelly

Abstract:

Today's generation is totally dependent on technology that uses data as its fuel. The present study is all about innovations and developments in data science and gives an idea about how efficiently to use the data provided. This study will help to understand the core concepts of data science. The concept of artificial intelligence was introduced by Alan Turing in which the main principle was to create an artificial system that can run independently of human-given programs and can function with the help of analyzing data to understand the requirements of the users. Data science comprises business understanding, analyzing data, ethical concerns, understanding programming languages, various fields and sources of data, skills, etc. The usage of data science has evolved over the years. In this review article, we have covered a part of data science, i.e., machine learning. Machine learning uses data science for its work. Machines learn through their experience, which helps them to do any work more efficiently. This article includes a comparative study image between human understanding and machine understanding, advantages, applications, and real-time examples of machine learning. Data science is an important game changer in the life of human beings. Since the advent of data science, we have found its benefits and how it leads to a better understanding of people, and how it cherishes individual needs. It has improved business strategies, services provided by them, forecasting, the ability to attend sustainable developments, etc. This study also focuses on a better understanding of data science which will help us to create a better world.

Keywords: data science, machine learning, data analytics, artificial intelligence

Procedia PDF Downloads 88
25167 Physio-Thermal and Geochemical Behavior and Alteration of the Au Pathfinder Gangue Hydrothermal Quartz at the Kubi Gold Ore Deposits

Authors: Gabriel K. Nzulu, Lina Rostorm, Hans Högberg, Jun Liu, per Eklund, Lars Hultman, Martin Magnuson

Abstract:

Altered and gangue quartz in hydrothermal veins from the Kubi Gold deposit in Dunkwa on Offin in the central region of Ghana are investigated for possible Au associated pathfinder minerals and to provide understanding and increase the knowledge of the mineral hosting and alteration processes in quartz. X-ray diffraction, air annealing furnace, differential scanning calorimetry, energy dispersive X-ray spectroscopy, and transmission electron microscopy have been applied on different quartz types outcropping from surface and bed rocks at the Kubi Gold Mining to reveal the material properties at different temperatures. From the diffraction results of the fresh and annealed quartz samples, we find that the samples contain pathfinder and the impurity minerals FeS₂, biotite, TiO₂, and magnetite. These minerals, under oxidation process between 574-1400 °C temperatures experienced hematite alterations and a transformation from α-quartz to β-quartz and further to cristobalite as observed from the calorimetry scans for hydrothermally exposed materials. The energy dispersive spectroscopy revealed elemental species of Fe, S, Mg, K, Al, Ti, Na, Si, O, and Ca contained in the samples and these are attributed to the impurity phase minerals observed in the diffraction. The findings also suggest that during the hydrothermal flow regime, impurity minerals and metals can be trapped by voids and faults. Under favorable temperature conditions the trapped minerals can be altered to change color at different depositional stages by oxidation and reduction processes leading to hematite alteration which is a useful pathfinder in mineral exploration.

Keywords: quartz, hydrothermal, minerals, hematite, x-ray diffraction, crystal-structure, defects

Procedia PDF Downloads 101
25166 Application of Artificial Neural Network Technique for Diagnosing Asthma

Authors: Azadeh Bashiri

Abstract:

Introduction: Lack of proper diagnosis and inadequate treatment of asthma leads to physical and financial complications. This study aimed to use data mining techniques and creating a neural network intelligent system for diagnosis of asthma. Methods: The study population is the patients who had visited one of the Lung Clinics in Tehran. Data were analyzed using the SPSS statistical tool and the chi-square Pearson's coefficient was the basis of decision making for data ranking. The considered neural network is trained using back propagation learning technique. Results: According to the analysis performed by means of SPSS to select the top factors, 13 effective factors were selected, in different performances, data was mixed in various forms, so the different models were made for training the data and testing networks and in all different modes, the network was able to predict correctly 100% of all cases. Conclusion: Using data mining methods before the design structure of system, aimed to reduce the data dimension and the optimum choice of the data, will lead to a more accurate system. Therefore, considering the data mining approaches due to the nature of medical data is necessary.

Keywords: asthma, data mining, Artificial Neural Network, intelligent system

Procedia PDF Downloads 279
25165 Interpreting Privacy Harms from a Non-Economic Perspective

Authors: Christopher Muhawe, Masooda Bashir

Abstract:

With increased Internet Communication Technology(ICT), the virtual world has become the new normal. At the same time, there is an unprecedented collection of massive amounts of data by both private and public entities. Unfortunately, this increase in data collection has been in tandem with an increase in data misuse and data breach. Regrettably, the majority of data breach and data misuse claims have been unsuccessful in the United States courts for the failure of proof of direct injury to physical or economic interests. The requirement to express data privacy harms from an economic or physical stance negates the fact that not all data harms are physical or economic in nature. The challenge is compounded by the fact that data breach harms and risks do not attach immediately. This research will use a descriptive and normative approach to show that not all data harms can be expressed in economic or physical terms. Expressing privacy harms purely from an economic or physical harm perspective negates the fact that data insecurity may result into harms which run counter the functions of privacy in our lives. The promotion of liberty, selfhood, autonomy, promotion of human social relations and the furtherance of the existence of a free society. There is no economic value that can be placed on these functions of privacy. The proposed approach addresses data harms from a psychological and social perspective.

Keywords: data breach and misuse, economic harms, privacy harms, psychological harms

Procedia PDF Downloads 201
25164 Machine Learning Analysis of Student Success in Introductory Calculus Based Physics I Course

Authors: Chandra Prayaga, Aaron Wade, Lakshmi Prayaga, Gopi Shankar Mallu

Abstract:

This paper presents the use of machine learning algorithms to predict the success of students in an introductory physics course. Data having 140 rows pertaining to the performance of two batches of students was used. The lack of sufficient data to train robust machine learning models was compensated for by generating synthetic data similar to the real data. CTGAN and CTGAN with Gaussian Copula (Gaussian) were used to generate synthetic data, with the real data as input. To check the similarity between the real data and each synthetic dataset, pair plots were made. The synthetic data was used to train machine learning models using the PyCaret package. For the CTGAN data, the Ada Boost Classifier (ADA) was found to be the ML model with the best fit, whereas the CTGAN with Gaussian Copula yielded Logistic Regression (LR) as the best model. Both models were then tested for accuracy with the real data. ROC-AUC analysis was performed for all the ten classes of the target variable (Grades A, A-, B+, B, B-, C+, C, C-, D, F). The ADA model with CTGAN data showed a mean AUC score of 0.4377, but the LR model with the Gaussian data showed a mean AUC score of 0.6149. ROC-AUC plots were obtained for each Grade value separately. The LR model with Gaussian data showed consistently better AUC scores compared to the ADA model with CTGAN data, except in two cases of the Grade value, C- and A-.

Keywords: machine learning, student success, physics course, grades, synthetic data, CTGAN, gaussian copula CTGAN

Procedia PDF Downloads 46
25163 Data Access, AI Intensity, and Scale Advantages

Authors: Chuping Lo

Abstract:

This paper presents a simple model demonstrating that ceteris paribus countries with lower barriers to accessing global data tend to earn higher incomes than other countries. Therefore, large countries that inherently have greater data resources tend to have higher incomes than smaller countries, such that the former may be more hesitant than the latter to liberalize cross-border data flows to maintain this advantage. Furthermore, countries with higher artificial intelligence (AI) intensity in production technologies tend to benefit more from economies of scale in data aggregation, leading to higher income and more trade as they are better able to utilize global data.

Keywords: digital intensity, digital divide, international trade, scale of economics

Procedia PDF Downloads 71
25162 Secured Transmission and Reserving Space in Images Before Encryption to Embed Data

Authors: G. R. Navaneesh, E. Nagarajan, C. H. Rajam Raju

Abstract:

Nowadays the multimedia data are used to store some secure information. All previous methods allocate a space in image for data embedding purpose after encryption. In this paper, we propose a novel method by reserving space in image with a boundary surrounded before encryption with a traditional RDH algorithm, which makes it easy for the data hider to reversibly embed data in the encrypted images. The proposed method can achieve real time performance, that is, data extraction and image recovery are free of any error. A secure transmission process is also discussed in this paper, which improves the efficiency by ten times compared to other processes as discussed.

Keywords: secure communication, reserving room before encryption, least significant bits, image encryption, reversible data hiding

Procedia PDF Downloads 415
25161 Identity Verification Using k-NN Classifiers and Autistic Genetic Data

Authors: Fuad M. Alkoot

Abstract:

DNA data have been used in forensics for decades. However, current research looks at using the DNA as a biometric identity verification modality. The goal is to improve the speed of identification. We aim at using gene data that was initially used for autism detection to find if and how accurate is this data for identification applications. Mainly our goal is to find if our data preprocessing technique yields data useful as a biometric identification tool. We experiment with using the nearest neighbor classifier to identify subjects. Results show that optimal classification rate is achieved when the test set is corrupted by normally distributed noise with zero mean and standard deviation of 1. The classification rate is close to optimal at higher noise standard deviation reaching 3. This shows that the data can be used for identity verification with high accuracy using a simple classifier such as the k-nearest neighbor (k-NN). 

Keywords: biometrics, genetic data, identity verification, k nearest neighbor

Procedia PDF Downloads 260
25160 Different Data-Driven Bivariate Statistical Approaches to Landslide Susceptibility Mapping (Uzundere, Erzurum, Turkey)

Authors: Azimollah Aleshzadeh, Enver Vural Yavuz

Abstract:

The main goal of this study is to produce landslide susceptibility maps using different data-driven bivariate statistical approaches; namely, entropy weight method (EWM), evidence belief function (EBF), and information content model (ICM), at Uzundere county, Erzurum province, in the north-eastern part of Turkey. Past landslide occurrences were identified and mapped from an interpretation of high-resolution satellite images, and earlier reports as well as by carrying out field surveys. In total, 42 landslide incidence polygons were mapped using ArcGIS 10.4.1 software and randomly split into a construction dataset 70 % (30 landslide incidences) for building the EWM, EBF, and ICM models and the remaining 30 % (12 landslides incidences) were used for verification purposes. Twelve layers of landslide-predisposing parameters were prepared, including total surface radiation, maximum relief, soil groups, standard curvature, distance to stream/river sites, distance to the road network, surface roughness, land use pattern, engineering geological rock group, topographical elevation, the orientation of slope, and terrain slope gradient. The relationships between the landslide-predisposing parameters and the landslide inventory map were determined using different statistical models (EWM, EBF, and ICM). The model results were validated with landslide incidences, which were not used during the model construction. In addition, receiver operating characteristic curves were applied, and the area under the curve (AUC) was determined for the different susceptibility maps using the success (construction data) and prediction (verification data) rate curves. The results revealed that the AUC for success rates are 0.7055, 0.7221, and 0.7368, while the prediction rates are 0.6811, 0.6997, and 0.7105 for EWM, EBF, and ICM models, respectively. Consequently, landslide susceptibility maps were classified into five susceptibility classes, including very low, low, moderate, high, and very high. Additionally, the portion of construction and verification landslides incidences in high and very high landslide susceptibility classes in each map was determined. The results showed that the EWM, EBF, and ICM models produced satisfactory accuracy. The obtained landslide susceptibility maps may be useful for future natural hazard mitigation studies and planning purposes for environmental protection.

Keywords: entropy weight method, evidence belief function, information content model, landslide susceptibility mapping

Procedia PDF Downloads 135
25159 A Review on Intelligent Systems for Geoscience

Authors: R Palson Kennedy, P.Kiran Sai

Abstract:

This article introduces machine learning (ML) researchers to the hurdles that geoscience problems present, as well as the opportunities for improvement in both ML and geosciences. This article presents a review from the data life cycle perspective to meet that need. Numerous facets of geosciences present unique difficulties for the study of intelligent systems. Geosciences data is notoriously difficult to analyze since it is frequently unpredictable, intermittent, sparse, multi-resolution, and multi-scale. The first half addresses data science’s essential concepts and theoretical underpinnings, while the second section contains key themes and sharing experiences from current publications focused on each stage of the data life cycle. Finally, themes such as open science, smart data, and team science are considered.

Keywords: Data science, intelligent system, machine learning, big data, data life cycle, recent development, geo science

Procedia PDF Downloads 140
25158 Porphyry Cu-Mo-(Au) Mineralization at Paraga Area, Nakhchivan District, Azerbaijan: Evidence from Mineral Paragenesis, Hyrothermal Alteration and Geochemical Studies

Authors: M. Kumral, A. Abdelnasser, M. Budakoglu, M. Karaman, D. K. Yildirim, Z. Doner, A. Bostanci

Abstract:

The Paraga area is located at the extreme eastern part of Nakhchivan district at the boundary with Armenia. The field study is situated at Ordubad region placed in 9 km from Paraga village and stays at 2300-2800 m height over sea level. It lies within a region of low-grade metamorphic porphyritic volcanic and plutonic rocks. The detailed field studies revealed that this area composed mainly of metagabbro-diorite intrusive rocks with porphyritic character emplaced into meta-andesitic rocks. This complex is later intruded by unmapped olivine gabbroic rocks. The Cu-Mo-(Au) mineralization at Paraga deposit is vein-type mineralization that is essentially related to quartz veins stockwork which cut the dioritic rocks and concentrated at the eastern and northeastern parts of the area with different directions N80W, N25W, N70E and N45E. Also, this mineralization is associated with two shearing zones directed N75W and N15E. The host porphyritic rocks were affected by intense sulfidation, carbonatization, sericitization and silicification with pervasive hematitic alterations accompanied with mineralized quartz veins and quartz-carbonate veins. Sulfide minerals which are chalcopyrite, pyrite, arsenopyrite and sphalerite occurred in two cases either inside these mineralized quartz veins or disseminated in the highly altered rocks as well as molybdenite and also at the peripheries between the altered host rock and veins. Gold found as inclusion disseminated in arsenopyrite and pyrite as well as in their cracks.

Keywords: porphyry Cu-Mo-(Au), Paraga area, Nakhchivan, Azerbaijan, paragenesis, hyrothermal alteration

Procedia PDF Downloads 411
25157 Data Quality as a Pillar of Data-Driven Organizations: Exploring the Benefits of Data Mesh

Authors: Marc Bachelet, Abhijit Kumar Chatterjee, José Manuel Avila

Abstract:

Data quality is a key component of any data-driven organization. Without data quality, organizations cannot effectively make data-driven decisions, which often leads to poor business performance. Therefore, it is important for an organization to ensure that the data they use is of high quality. This is where the concept of data mesh comes in. Data mesh is an organizational and architectural decentralized approach to data management that can help organizations improve the quality of data. The concept of data mesh was first introduced in 2020. Its purpose is to decentralize data ownership, making it easier for domain experts to manage the data. This can help organizations improve data quality by reducing the reliance on centralized data teams and allowing domain experts to take charge of their data. This paper intends to discuss how a set of elements, including data mesh, are tools capable of increasing data quality. One of the key benefits of data mesh is improved metadata management. In a traditional data architecture, metadata management is typically centralized, which can lead to data silos and poor data quality. With data mesh, metadata is managed in a decentralized manner, ensuring accurate and up-to-date metadata, thereby improving data quality. Another benefit of data mesh is the clarification of roles and responsibilities. In a traditional data architecture, data teams are responsible for managing all aspects of data, which can lead to confusion and ambiguity in responsibilities. With data mesh, domain experts are responsible for managing their own data, which can help provide clarity in roles and responsibilities and improve data quality. Additionally, data mesh can also contribute to a new form of organization that is more agile and adaptable. By decentralizing data ownership, organizations can respond more quickly to changes in their business environment, which in turn can help improve overall performance by allowing better insights into business as an effect of better reports and visualization tools. Monitoring and analytics are also important aspects of data quality. With data mesh, monitoring, and analytics are decentralized, allowing domain experts to monitor and analyze their own data. This will help in identifying and addressing data quality problems in quick time, leading to improved data quality. Data culture is another major aspect of data quality. With data mesh, domain experts are encouraged to take ownership of their data, which can help create a data-driven culture within the organization. This can lead to improved data quality and better business outcomes. Finally, the paper explores the contribution of AI in the coming years. AI can help enhance data quality by automating many data-related tasks, like data cleaning and data validation. By integrating AI into data mesh, organizations can further enhance the quality of their data. The concepts mentioned above are illustrated by AEKIDEN experience feedback. AEKIDEN is an international data-driven consultancy that has successfully implemented a data mesh approach. By sharing their experience, AEKIDEN can help other organizations understand the benefits and challenges of implementing data mesh and improving data quality.

Keywords: data culture, data-driven organization, data mesh, data quality for business success

Procedia PDF Downloads 140
25156 Architectures and Implementations of Data Spaces: A Comparative Study of Gaia-X and Eclipse Data Space Components Frameworks

Authors: Ryan Kelvin Ford

Abstract:

For individuals and organizations, significant potential benefits were assured by sharing the data in a secure, trusted, and standardized environment. Technical trust and standards help each participant to use data space securely to share and access data. Sharing data in a safe environment helps acquire new business opportunities. Data sovereignty, interoperability, and trust were considered key factors to evaluate data spaces. Businesses and policymakers assure a fair data economy by integrating data space in organizations. A collaborative environment was needed to facilitate data sharing among organizations, satisfied with the implementation of different architectures using data spaces such as Eclipse Data Space Components (EDC), International Data Space Association (IDSA), Gaia-X, and Gaia-X Federation Services (GXFS). The last 15 years of application were reviewed and compared based on the architectures and implementations of different data spaces such as IDSA, EDC, Gaia-X and GXFS, EDC framework, IDSA, GXFS, data connector, data space architecture, characteristics of data space connectors, federated data spaces initiatives, data spaces overview, eclipse data space connector, designing data spaces, building data spaces based on technical overview, European future digital ecosystem based on Gaia-Vision and strategy of Gaia-Architecture. Empirical research based on an organized view was conducted. The current discussion elaborates on the systematic review of the impact of data space technology from various perspectives. The systematic review uses multiple databases such as IEEE Explore, Taylor & Francis, Science Direct, and Google Scholar to pursue publications on the impact of Data space from January 2019 to December 2024. The search results showcased a comparative review of 150 articles, out of which 20 were related to the IDSA, Gaia‑X, and EDC architecture and implementation.

Keywords: IDSA, Gaia-X, Gaia-X architecture, EDC, EDC architecture, GXFS architecture, IDSA, data space connector

Procedia PDF Downloads 9
25155 Big Data Analysis with RHadoop

Authors: Ji Eun Shin, Byung Ho Jung, Dong Hoon Lim

Abstract:

It is almost impossible to store or analyze big data increasing exponentially with traditional technologies. Hadoop is a new technology to make that possible. R programming language is by far the most popular statistical tool for big data analysis based on distributed processing with Hadoop technology. With RHadoop that integrates R and Hadoop environment, we implemented parallel multiple regression analysis with different sizes of actual data. Experimental results showed our RHadoop system was much faster as the number of data nodes increases. We also compared the performance of our RHadoop with lm function and big lm packages available on big memory. The results showed that our RHadoop was faster than other packages owing to paralleling processing with increasing the number of map tasks as the size of data increases.

Keywords: big data, Hadoop, parallel regression analysis, R, RHadoop

Procedia PDF Downloads 439
25154 A Mutually Exclusive Task Generation Method Based on Data Augmentation

Authors: Haojie Wang, Xun Li, Rui Yin

Abstract:

In order to solve the memorization overfitting in the meta-learning MAML algorithm, a method of generating mutually exclusive tasks based on data augmentation is proposed. This method generates a mutex task by corresponding one feature of the data to multiple labels, so that the generated mutex task is inconsistent with the data distribution in the initial dataset. Because generating mutex tasks for all data will produce a large number of invalid data and, in the worst case, lead to exponential growth of computation, this paper also proposes a key data extraction method, that only extracts part of the data to generate the mutex task. The experiments show that the method of generating mutually exclusive tasks can effectively solve the memorization overfitting in the meta-learning MAML algorithm.

Keywords: data augmentation, mutex task generation, meta-learning, text classification.

Procedia PDF Downloads 99
25153 Efficient Positioning of Data Aggregation Point for Wireless Sensor Network

Authors: Sifat Rahman Ahona, Rifat Tasnim, Naima Hassan

Abstract:

Data aggregation is a helpful technique for reducing the data communication overhead in wireless sensor network. One of the important tasks of data aggregation is positioning of the aggregator points. There are a lot of works done on data aggregation. But, efficient positioning of the aggregators points is not focused so much. In this paper, authors are focusing on the positioning or the placement of the aggregation points in wireless sensor network. Authors proposed an algorithm to select the aggregators positions for a scenario where aggregator nodes are more powerful than sensor nodes.

Keywords: aggregation point, data communication, data aggregation, wireless sensor network

Procedia PDF Downloads 165
25152 Spatial Econometric Approaches for Count Data: An Overview and New Directions

Authors: Paula Simões, Isabel Natário

Abstract:

This paper reviews a number of theoretical aspects for implementing an explicit spatial perspective in econometrics for modelling non-continuous data, in general, and count data, in particular. It provides an overview of the several spatial econometric approaches that are available to model data that are collected with reference to location in space, from the classical spatial econometrics approaches to the recent developments on spatial econometrics to model count data, in a Bayesian hierarchical setting. Considerable attention is paid to the inferential framework, necessary for structural consistent spatial econometric count models, incorporating spatial lag autocorrelation, to the corresponding estimation and testing procedures for different assumptions, to the constrains and implications embedded in the various specifications in the literature. This review combines insights from the classical spatial econometrics literature as well as from hierarchical modeling and analysis of spatial data, in order to look for new possible directions on the processing of count data, in a spatial hierarchical Bayesian econometric context.

Keywords: spatial data analysis, spatial econometrics, Bayesian hierarchical models, count data

Procedia PDF Downloads 598
25151 Increasing the Speed of the Apriori Algorithm by Dimension Reduction

Authors: A. Abyar, R. Khavarzadeh

Abstract:

The most basic and important decision-making tool for industrial and service managers is understanding the market and customer behavior. In this regard, the Apriori algorithm, as one of the well-known machine learning methods, is used to identify customer preferences. On the other hand, with the increasing diversity of goods and services and the speed of changing customer behavior, we are faced with big data. Also, due to the large number of competitors and changing customer behavior, there is an urgent need for continuous analysis of this big data. While the speed of the Apriori algorithm decreases with increasing data volume. In this paper, the big data PCA method is used to reduce the dimension of the data in order to increase the speed of Apriori algorithm. Then, in the simulation section, the results are examined by generating data with different volumes and different diversity. The results show that when using this method, the speed of the a priori algorithm increases significantly.

Keywords: association rules, Apriori algorithm, big data, big data PCA, market basket analysis

Procedia PDF Downloads 13
25150 A NoSQL Based Approach for Real-Time Managing of Robotics's Data

Authors: Gueidi Afef, Gharsellaoui Hamza, Ben Ahmed Samir

Abstract:

This paper deals with the secret of the continual progression data that new data management solutions have been emerged: The NoSQL databases. They crossed several areas like personalization, profile management, big data in real-time, content management, catalog, view of customers, mobile applications, internet of things, digital communication and fraud detection. Nowadays, these database management systems are increasing. These systems store data very well and with the trend of big data, a new challenge’s store demands new structures and methods for managing enterprise data. The new intelligent machine in the e-learning sector, thrives on more data, so smart machines can learn more and faster. The robotics are our use case to focus on our test. The implementation of NoSQL for Robotics wrestle all the data they acquire into usable form because with the ordinary type of robotics; we are facing very big limits to manage and find the exact information in real-time. Our original proposed approach was demonstrated by experimental studies and running example used as a use case.

Keywords: NoSQL databases, database management systems, robotics, big data

Procedia PDF Downloads 359
25149 Fuzzy Optimization Multi-Objective Clustering Ensemble Model for Multi-Source Data Analysis

Authors: C. B. Le, V. N. Pham

Abstract:

In modern data analysis, multi-source data appears more and more in real applications. Multi-source data clustering has emerged as a important issue in the data mining and machine learning community. Different data sources provide information about different data. Therefore, multi-source data linking is essential to improve clustering performance. However, in practice multi-source data is often heterogeneous, uncertain, and large. This issue is considered a major challenge from multi-source data. Ensemble is a versatile machine learning model in which learning techniques can work in parallel, with big data. Clustering ensemble has been shown to outperform any standard clustering algorithm in terms of accuracy and robustness. However, most of the traditional clustering ensemble approaches are based on single-objective function and single-source data. This paper proposes a new clustering ensemble method for multi-source data analysis. The fuzzy optimized multi-objective clustering ensemble method is called FOMOCE. Firstly, a clustering ensemble mathematical model based on the structure of multi-objective clustering function, multi-source data, and dark knowledge is introduced. Then, rules for extracting dark knowledge from the input data, clustering algorithms, and base clusterings are designed and applied. Finally, a clustering ensemble algorithm is proposed for multi-source data analysis. The experiments were performed on the standard sample data set. The experimental results demonstrate the superior performance of the FOMOCE method compared to the existing clustering ensemble methods and multi-source clustering methods.

Keywords: clustering ensemble, multi-source, multi-objective, fuzzy clustering

Procedia PDF Downloads 196
25148 Modeling Activity Pattern Using XGBoost for Mining Smart Card Data

Authors: Eui-Jin Kim, Hasik Lee, Su-Jin Park, Dong-Kyu Kim

Abstract:

Smart-card data are expected to provide information on activity pattern as an alternative to conventional person trip surveys. The focus of this study is to propose a method for training the person trip surveys to supplement the smart-card data that does not contain the purpose of each trip. We selected only available features from smart card data such as spatiotemporal information on the trip and geographic information system (GIS) data near the stations to train the survey data. XGboost, which is state-of-the-art tree-based ensemble classifier, was used to train data from multiple sources. This classifier uses a more regularized model formalization to control the over-fitting and show very fast execution time with well-performance. The validation results showed that proposed method efficiently estimated the trip purpose. GIS data of station and duration of stay at the destination were significant features in modeling trip purpose.

Keywords: activity pattern, data fusion, smart-card, XGboost

Procedia PDF Downloads 253
25147 A Mutually Exclusive Task Generation Method Based on Data Augmentation

Authors: Haojie Wang, Xun Li, Rui Yin

Abstract:

In order to solve the memorization overfitting in the model-agnostic meta-learning MAML algorithm, a method of generating mutually exclusive tasks based on data augmentation is proposed. This method generates a mutex task by corresponding one feature of the data to multiple labels so that the generated mutex task is inconsistent with the data distribution in the initial dataset. Because generating mutex tasks for all data will produce a large number of invalid data and, in the worst case, lead to an exponential growth of computation, this paper also proposes a key data extraction method that only extract part of the data to generate the mutex task. The experiments show that the method of generating mutually exclusive tasks can effectively solve the memorization overfitting in the meta-learning MAML algorithm.

Keywords: mutex task generation, data augmentation, meta-learning, text classification.

Procedia PDF Downloads 149