PEL: Position-Enhanced Length Filter for Set Similarity Joins

Mann, Willi; Augsten, Nikolaus

Im Dokument suchen Titelaufnahme

Titel	PEL: Position-Enhanced Length Filter for Set Similarity Joins
Verfasser	Mann, Willi ; Augsten, Nikolaus
Enthalten in	Grundlagen von Datenbanken 2014 / Klan, Friederike; Specht, Günther; Gamper, Hans, Bozen, 2014, (2014), S. 89-94-94
Erschienen	2014
Material	Online-Ressource
Sprache	Englisch
Serie	Proceedings of the 26th GI-Workshop Grundlagen von Datenbanken ; 1314
Dokumenttyp	Aufsatz in einem Sammelwerk
URN	urn:nbn:at:at-ubs:3-14251

Zugriffsbeschränkung
Das Dokument ist frei verfügbar

Links

Nachweis	Universitätsbibliothek Salzburg

Dateien
PEL: Position-Enhanced Length Filter for Set Similarity Joins [pdf 0.32 mb]

Klassifikation

Universität Salzburg → Fachbereiche bis 2021 → Naturwissenschaftliche Fakultät → Fachbereich Computerwissenschaften
Basisklassifikation → Informatik → Datenbanken

Abstract

Set similarity joins compute all pairs of similar sets from two collections of sets. Set similarity joins are typically implemented in a filter-verify framework: a filter generates candidate pairs, possibly including false positives, which must be verified to produce the final join result. Good filters produce a small number of false positives, while they reduce the time they spend on hopeless candidates. The best known algorithms generate candidates using the so-called prefix filter in conjunction with length- and position-based filters.

In this paper we show that the potential of length and position have only partially been leveraged. We propose a new filter, the position-enhanced length filter, which exploits the matching position to incrementally tighten the length filter; our filter identifies hopeless candidates and avoids processing them. The filter is very efficient, requires no change in the data structures of most prefix filter algorithms, and is particularly effective for foreign joins, i.e., joins between two different collections of sets.

Notiz
Die ursprüngliche Publikation ist verfügbar unter: G. Specht, H. Gamper, F. Klan (eds.): Proceedings of the 26thGI-Workshop on Foundations of Databases (Grundlagen von Datenbanken), 21.10.2014 - 24.10.2014, Bozen, Italy, published at http://ceur-ws.org/Vol-1313/

Statistik
Das PDF-Dokument wurde 9 mal heruntergeladen.

Titelaufnahme