Substring index

Data structure

In computer science, a substring index is a data structure which gives substring search in a text or text collection in sublinear time. If you have a document $S$ of length $n$ , or a set of documents $D=\{S^{1},S^{2},\dots ,S^{d}\}$ of total length $n$ , you can locate all occurrences of a pattern $P$ in $o(n)$ time. (See Big O notation.)

The phrase full-text index is also often used for an index of all substrings of a text. But this is ambiguous, as it is also used for regular word indexes such as inverted files and document retrieval. See full text search.

Substring indexes include:

Suffix tree
Suffix array
N-gram index, an inverted file for all N-grams of the text
Compressed suffix array^[1]
FM-index
LZ-index

References

^ R. Grossi and J. S. Vitter, Compressed Suffix Arrays and Suffix Trees, with Applications to Text Indexing and String Matching, SIAM Journal on Computing, 35(2), 2005, 378–407.