-
-
Notifications
You must be signed in to change notification settings - Fork 19
Strings
It is a very common problem to search for the presence of a substring inside of a string. Let's say that:
- the string is called
str
(with lengthn
) - the substring is
substr
(with lengthm
).
The brute force approach to string searches is done by simply sliding the pattern along the
string until you find a match. Every time you slide down to the next character in str
, this
algorithm doesn't really remember what it already knows about the string (since it iterates thru
the entire length of the string at every iteration). It essentially forgets about what it knows
about the string (already) for every n-1
attempts that it makes to match the substring.
/**
* O(m * n), where m = str.size, and n = substr.size
*
* This is an inefficient brute force algorithm which has quadratic complexity.
*/
fun substring(str: CharArray, substr: CharArray): Any {
// substr can't be longer than str
if (substr.size > str.size) return false
// Iterate str using cursor1 and for each index look ahead
// to see if matches exist for substr
var occurrences = 0
for (cursor1 in 0 until str.size) {
var matchCount = 0
for (cursor2 in 0 until substr.size) {
val index = cursor1 + cursor2
// If index exceeds the size of str that means substr wasn't found
if (index > str.size - 1) break
// If there's a match at index between the str and substr
// then remember it
if (str[index] == substr[cursor2]) matchCount++
}
// Found a match
if (matchCount == substr.size) occurrences++
}
return object {
val numberOfMatches = occurrences
val matchFound = occurrences > 0
override fun toString(): String = StringBuilder().apply {
append("{match found = $matchFound")
append(", # matches = $numberOfMatches}")
}.toString().brightBlue()
}
}
Please note that if you want to turn a Kotlin String
into a CharArray
you can use
something like "Hello world".toCharArray()
. Here's more info about this on
stackoverflow.
By using a state machine that is built from the substring pattern we can come up with a much better algorithm to match these patterns inside our string. The idea here is not to forget what we have already seen about the string as we iterate over each character in it.
This is a streaming algorithm where we pass one character at a time (as we iterate thru the entire string) to a state maachine which matches the pattern in the substring. For every iteration:
- Each character is compared with the character at a cursor (which represents the state) in the substring.
- If there's a match, this cursor is incremented, and at the next iteration, the next character in the pattern will be matched, and so on.
- When there's a mismatch the cursor is reset to 0.
- And we know a match has been found when the cursor equals the length of the substring.
This approach is based on the idea of Deterministic Finite Automaton.
/**
* O(m + n), where m = str.size, and n = substr.size
*
* This function uses a deterministic finite automation (DFA) method
* which entails the use of a state machine to keep track of progress
* in a game.
*/
fun substring_optimized(str: CharArray, substr: CharArray): Any {
class StateMachine(val pattern: CharArray) {
var cursor = 0
fun add(character: Char) {
if (pattern[cursor] == character) cursor++
else cursor = 0
}
fun isMatch() = cursor == pattern.size
fun reset() {
cursor = 0
}
}
val stateMachine = StateMachine(substr)
var numberOfOccurrences = 0
for (cursor in 0 until str.size) {
stateMachine.add(str[cursor])
if (stateMachine.isMatch()) {
stateMachine.reset()
numberOfOccurrences++
}
}
return object {
val occurrences = numberOfOccurrences
val matchFound = numberOfOccurrences > 0
override fun toString(): String = StringBuilder().apply {
append("{occurrences = $occurrences")
append(", matchFound = $matchFound}")
}.toString().brightBlue()
}
}