2 A Near-Optimal Domain Independent Approach for Detecting Approximate Duplicates

Authors: Abdelaziz Fellah, Allaoua Maamir


We propose a domain-independent merging-cluster filter approach complemented with a set of algorithms for identifying approximate duplicate entities efficiently and accurately within a single and across multiple data sources. The near-optimal merging-cluster filter (MCF) approach is based on the Monge-Elkan well-tuned algorithm and extended with an affine variant of the Smith-Waterman similarity measure. Then we present constant, variable, and function threshold algorithms that work conceptually in a divide-merge filtering fashion for detecting near duplicates as hierarchical clusters along with their corresponding representatives. The algorithms take recursive refinement approaches in the spirit of filtering, merging, and updating, cluster representatives to detect approximate duplicates at each level of the cluster tree. Experiments show a high effectiveness and accuracy of the MCF approach in detecting approximate duplicates by outperforming the seminal Monge-Elkan’s algorithm on several real-world benchmarks and generated datasets.

Keywords: data mining, data cleaning, approximate duplicates, near-duplicates detection, data mining applications and discovery

Procedia PDF Downloads 316
1 Study of Hydrocarbons Metering Issues in Algerian Fields under the New Law Context

Authors: A. Hadjadj, S. Maamir


Since the advent of the law 86/14 concerning the
exploitation of the national territory by foreign companies in
partnership with the Algerian oil and gas company, the problem of
hydrocarbons metering in the sharing production come out.
More generally, good management counting hydrocarbons can
provide data on the production wells, the field and the reservoir for
medium and long term planning, particularly in the context of the
management and field development.
In this work, we are interested in the transactional metering which
is a very delicate and crucial period in the current context of the new
hydrocarbon’s law characterized by assets system between the
various activities of Sonatrach and its foreign partners.
After a state of the art on hydrocarbons metering devices in
Algeria and elsewhere, we will decline the advantages and
disadvantages of each system, and then we describe the problem to
try to reach an optimal solution.

Keywords: transactional metering, flowmeter orifice, heat flow, Sonatrach

Procedia PDF Downloads 250