Source:
SPIRE: String Processing and Information Retrieval, Springer, Volume 4209, Glasgow, Scotland, p.329--336 (2006)
URL:
http://luispedro.org/files/dot-link.pdf
Keywords:
String indexing;
approximate matching
Abstract:
The problem we address is text indexing for approximate matching. Given a text $T$ which undergoes some preprocessing to generate an index, we can later query this index to identify the places where a string occurs up to a certain number of errors $k$ (edition distance). The indexing structure occupies space $O(n \log^k n)$ in the average case, independent of alphabet size. This structure can be used to report the existence of a match with k errors in $O(3^k m^{k+1})$ and to report the occurrences in $O(3^k m^{k+1} + ed)$ time, where $m$ is the length of the pattern and ed and the number of matching edit scripts. The construction of the structure has time bound by $O(kNS)$, where $N$ is the number of nodes in the index and $S$ the alphabet size.