Skip to content

Commit e7f398e

Browse files
MysjkintstreamDOTh
authored andcommitted
Implemented KMP in C++ (#332)
* Folder structure and Readme file. * Readme documentation. * Pseudocode for kmp in readme. * KMP algorithm implementation and readme update. * Readme pesudocode reference.
1 parent 7ffd9e3 commit e7f398e

File tree

2 files changed

+128
-0
lines changed

2 files changed

+128
-0
lines changed
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,62 @@
1+
#include <iostream>
2+
#include <vector>
3+
4+
using namespace std;
5+
6+
// Computes the prefix-suffix array.
7+
vector<int> ComputePrefixFunction(string pattern){
8+
int m = pattern.length();
9+
vector<int> prefixArr(m);
10+
prefixArr[0] = 0;
11+
int k = 0;
12+
13+
for(int i=1; i<m; ++i){
14+
while(k > 0 && pattern[k] != pattern[i]){
15+
k = prefixArr[k];
16+
}
17+
if(pattern[k] == pattern[i]){
18+
k++;
19+
}
20+
prefixArr[i] = k;
21+
}
22+
23+
return prefixArr;
24+
}
25+
26+
// Returns a vector of indicies where there is a pattern match.
27+
vector<int> KMP(string text, string pattern){
28+
int n = text.length();
29+
int m = pattern.length();
30+
vector<int> prefixArr = ComputePrefixFunction(pattern);
31+
int q = 0;
32+
33+
vector<int> results;
34+
35+
for (int i = 0; i < n; ++i){
36+
while(q > 0 && pattern[q] != text[i]){
37+
q = prefixArr[q];
38+
}
39+
if(pattern[q] == text[i]){
40+
q++;
41+
}
42+
if(q == m){
43+
results.push_back(i-m);
44+
q = prefixArr[q];
45+
}
46+
}
47+
48+
return results;
49+
}
50+
51+
int main(){
52+
string example1 = "bacbababaabacabababaabaca";
53+
string pattern1 = "abaabaca";
54+
55+
vector<int> result = KMP(example1, pattern1);
56+
vector<int>::iterator itt;
57+
cout << "Matches at the following indicies..." << endl;
58+
for(itt = result.begin(); itt != result.end(); ++itt){
59+
cout << *itt << endl;
60+
}
61+
return 0;
62+
}
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,66 @@
1+
# Knuth-Morris-Pratt (KMP) Algorithm
2+
KMP is a linear time string matching algorithm. The problem involves finding
3+
all occurences where a string pattern matches a substring in a text.
4+
The naive approach to string matching involves
5+
looping over all indicies over a text string and finding the indicies
6+
where the pattern p matches the substring starting at the index.
7+
8+
s.t. pattern[0 ... m - 1] = text[idx ... idx + m - 1], where idx is some offset.
9+
10+
The worst case of this approach is O(m*(n-m+1)), where m is |p| and n is |text|.
11+
12+
The main drawback of the naive appraoch is that it handles overlaps
13+
poorly. Since it will go deep into the second nested loop when checking
14+
whether the substring and the pattern matches. When it hits a mismatch
15+
it will start over from the next increment, thereby redoing some of its
16+
comparisons.
17+
18+
The KMP algorithm solves this problem by relying on some clever preprocessing,
19+
thereby reaching a linear time performance. The clever preprocessing is simply
20+
creating an array that contains information calculated by a prefix function, and this information describes how the pattern matches against shifts of itself.
21+
We use this array to avoid the worst case situation of the naive approach by reusing previously performed comparisons.
22+
23+
The prefix-function(i) is the longest prefix of p that is also a suffix of p[1 ... i]. The whole idea of finding these substrings in the
24+
pattern which are both prefixes and suffixes, is that they determine from what index in the pattern and text we should start from next, hence
25+
avoiding having to start all the way at the start index of the pattern and only one index further in the text each time we hit a
26+
character miss match.
27+
28+
KMP runs in O(n + m). Note, KMP is only necessary when there are many overlapping parts, since it is only in such
29+
situations where the prefix-suffix array helps. However, the worst case linear time efficiency is guaranteed, meaning
30+
the KMP algorithm is useful in general cases aswell.
31+
32+
## Pseudocode
33+
Where t is the text string and p is the pattern.
34+
35+
KMP-Matcher(t, p):
36+
n = len(t)
37+
m = len(p)
38+
prefix-arr = Compute-Prefix-Function(p)
39+
q = 0
40+
for i = 0 to n:
41+
while q > 0 and p[q] != t[i]:
42+
q = prefix-arr[q]
43+
if p[q] == t[i]:
44+
q++
45+
if q == m:
46+
patterns occurs at index i - m
47+
q = prefix-arr[q]
48+
49+
Compute-Prefix-Function(p)
50+
m = len(p)
51+
new arr[0 ... m-1]
52+
arr[0] = 0
53+
k = 0
54+
for i = 1 to m:
55+
while k > 0 and p[k] != p[i]:
56+
k = arr[k]
57+
if p[k] == p[i]:
58+
k++
59+
arr[i] = k
60+
61+
return arr
62+
63+
CLRS[p. 1006].
64+
65+
66+

0 commit comments

Comments
 (0)