# DSA: Strings Pattern Matching

##### What is Pattern Matching?

Pattern matching is a fundamental problem in computer science that is employed in a wide range of applications. Pattern matching in the context of strings refers to detecting instances of a given pattern (substring) inside a bigger string.

##### Algorithm for Pattern Matching:

There are several popular algorithms and techniques for string pattern matching. Here are some commonly used ones:

##### Naive Pattern Matching:

The Naive Pattern Matching method is the most basic and easiest technique. It compares all characters in the main string to the pattern. It has a time complexity of O(N*M), where N is the text length and M is the pattern length.

``````class NaivePatternMatching {
static void search(char[] pattern, char[] text) {
int m = pattern.length;
int n = text.length;

for (int i = 0; i <= n - m; i++) {
int j;

for (j = 0; j < m; j++) {
if (text[i + j] != pattern[j])
break;
}

if (j == m)
System.out.println("Pattern found at index " + i);
}
}

public static void main(String[] args) {
char[] text = "AABAACAADAABAAABAA".toCharArray();
char[] pattern = "AABA".toCharArray();

search(pattern, text);
}
}
``````

Explanation:

1. The `search` method takes two parameters: the `pattern` (the pattern to search for) and the `text` (the text in which to search for the pattern).
2. It initializes variables `m` and `n` with the lengths of the `pattern` and `text`, respectively.
3. The outer `for` loop iterates from 0 to `n - m` (inclusive). This loop controls the starting position of the pattern in the text.
4. Inside the outer loop, there is an inner `for` loop that iterates over the characters of the pattern. It compares each character of the pattern with the corresponding character in the text starting from the current position.
5. If a mismatch is found (i.e., the characters at the current positions don’t match), the inner loop breaks, and the search continues with the next starting position.
6. If the inner loop completes without a break, it means that the entire pattern has been matched at the current starting position. In this case, it prints the message “Pattern found at index ” followed by the current starting index.
7. The `main` method is where the pattern search is invoked. It initializes the `text` and `pattern` arrays with the provided strings, and then calls the `search` method, passing the `pattern` and `text` arrays as arguments.
8. When you run the program, it searches for the pattern “AABA” within the text “AABAACAADAABAAABAA”. The output shows the indices where the pattern is found: 0, 9, and 13.
##### Knuth-Morris-Pratt (KMP) Algorithm:

The KMP method enhances pattern-matching performance by utilizing a preprocessed table known as the Longest Proper Prefix, which is also a Suffix (LPS) table. It prevents redundant character comparisons based on the LPS table information. The KMP algorithm has a time complexity of O(N+M).

``````class KMPAlgorithm {
static void KMPAlgorithm(String pattern, String text) {
int m = pattern.length();
int n = text.length();
int[] lps = computeLPSArray(pattern, m);
int i = 0; // index for text[]
int j = 0; // index for pattern[]

while (i < n) {
if (pattern.charAt(j) == text.charAt(i)) {
i++;
j++;
}

if (j == m) {
System.out.println("Pattern found at index " + (i - j));
j = lps[j - 1];
} else if (i < n && pattern.charAt(j) != text.charAt(i)) {
if (j != 0) {
j = lps[j - 1];
} else {
i++;
}
}
}
}

static int[] computeLPSArray(String pattern, int m) {
int[] lps = new int[m];
int len = 0;
lps[0] = 0;
int i = 1;

while (i < m) {
if (pattern.charAt(i) == pattern.charAt(len)) {
len++;
lps[i] = len;
i++;
} else {
if (len != 0) {
len = lps[len - 1];
} else {
lps[i] = 0;
i++;
}
}
}
return lps;
}

public static void main(String[] args) {
String text = "AABAACAADAABAAABAA";
String pattern = "AABA";

KMPAlgorithm(pattern, text);
}
}
``````

Explanation:

1. The `KMPAlgorithm` function takes the pattern and text strings as input.
2. It initializes variables, including the length of the pattern (`m`), length of the text (`n`), and the Longest Proper Prefix which is also a Suffix array (`lps`), which is computed using the `computeLPSArray` function.
3. The algorithm iterates through the text while comparing characters of the pattern with the corresponding characters in the text.
4. If a match is found, it increments both the pattern index (`j`) and text index (`i`).
5. If the entire pattern is matched (`j == m`), it prints the index at which the pattern is found and updates the pattern index using the `lps` array.
6. If there is a mismatch and `j` is not at the beginning of the pattern, it updates the pattern index using the `lps` array.
7. If there is a mismatch and `j` is at the beginning of the pattern, it increments the text index.
8. The `computeLPSArray` function computes the Longest Proper Prefix which is also a Suffix array for the given pattern using the two-pointer technique.
9. The `main` function demonstrates an example usage of the algorithm by calling `KMPAlgorithm` with a sample text and pattern.
##### Rabin-Karp Algorithm:

The Rabin-Karp algorithm searches for patterns in text using hashing. It compares the hash value of the pattern to the hash values of the text’s substrings. It has an average time complexity of O(N+M), where N is the text length and M is the pattern length.

``````class RabinKarpAlgorithm {
static final int d = 256; // Number of characters in the input alphabet

static void search(String pattern, String text, int q) {
int m = pattern.length();
int n = text.length();
int p = 0; // Hash value for pattern
int t = 0; // Hash value for text
int h = 1;

// Calculate the hash value of the pattern and the first window of text
for (int i = 0; i < m - 1; i++)
h = (h * d) % q;

for (int i = 0; i < m; i++) {
p = (d * p + pattern.charAt(i)) % q;
t = (d * t + text.charAt(i)) % q;
}

// Slide the pattern over the text one by one
for (int i = 0; i <= n - m; i++) {
// If the hash values of the current window in text and pattern match, check for exact match
if (p == t) {
boolean found = true;
for (int j = 0; j < m; j++) {
if (text.charAt(i + j) != pattern.charAt(j)) {
found = false;
break;
}
}
if (found)
System.out.println("Pattern found at index " + i);
}

// Calculate the hash value for the next window of text
if (i < n - m) {
t = (d * (t - text.charAt(i) * h) + text.charAt(i + m)) %q;
if (t < 0)
t = t + q;
}
}
}

public static void main(String[] args) {
String text = "AABAACAADAABAAABAA";
String pattern = "AABA";
int q = 101; // A prime number

search(pattern, text, q);
}
}
``````

Explanation:

1. The `search` function takes the pattern, text, and a prime number `q` as input.
2. It initializes variables, including the lengths of the pattern (`m`) and text (`n`), hash values for pattern (`p`) and text (`t`), and a value `h` for the highest significant digit.
3. It calculates the hash value of the pattern and the first window of the text.
4. It iterates through the text, comparing the hash values of the current window in the text and pattern.
5. If the hash values match, it performs an additional check for an exact match by comparing characters.
6. If a match is found, it prints the index at which the pattern is found.
7. It calculates the hash value for the next window of the text using a rolling hash technique.
8. The `main` function demonstrates an example usage of the algorithm by calling `search` with a sample text, pattern, and a prime number.
##### Finite Automaton Algorithm:

The Finite Automaton Algorithm generates a finite automaton based on the pattern and utilizes it to do pattern matching. It prepares the pattern by generating a state transition table. The time complexity of the algorithm is O(N+M), where N is the length of the text and M is the length of the pattern.

``````import java.util.Arrays;

class FiniteAutomatonAlgorithm {
static final int NO_OF_CHARS = 256;

static int[][] computeTransitionFunction(String pattern, int m) {
int[][] TF = new int[m + 1][NO_OF_CHARS];
int lps = 0;

// Fill the entries in TF[][] based on pattern[]
for (int state = 0; state <= m; state++) {
for (int x = 0; x < NO_OF_CHARS; x++)
TF[state][x] = getNextState(pattern, m, state, x);
}

return TF;
}

static int getNextState(String pattern, int m, int state, int x) {
// If the character at the current state matches the input character, move to the next state
if (state < m && x == pattern.charAt(state))
return state + 1;

int nextState, i;

// Find the longest prefix suffix
for (nextState = state; nextState > 0; nextState--) {
if (pattern.charAt(nextState - 1) == x) {
for (i = 0; i < nextState - 1; i++) {
if (pattern.charAt(i) != pattern.charAt(state - nextState + 1 + i))
break;
}
if (i == nextState - 1)
return nextState;
}
}

return 0;
}

static void search(String pattern, String text) {
int m = pattern.length();
int n = text.length();

int[][] TF = computeTransitionFunction(pattern, m);

// Process text over the finite automaton to find occurrences of the pattern
int state = 0;
for (int i = 0; i < n; i++) {
state = TF[state][text.charAt(i)];
if (state == m)
System.out.println("Pattern found at index " + (i - m + 1));
}
}

public static void main(String[] args) {
String text = "AABAACAADAABAAABAA";
String pattern = "AABA";

search(pattern, text);
}
}
``````

Explanation:

1. The `computeTransitionFunction` function calculates the transition function (TF) for the given pattern.
2. It initializes a 2D array `TF` with dimensions `(m + 1) x NO_OF_CHARS`, where `m` is the length of the pattern and `NO_OF_CHARS` is the number of characters in the input alphabet (e.g., ASCII).
3. It fills the entries in `TF` by iterating through each state and character, using the `getNextState` function to determine the next state based on the current state and input character.
4. The `getNextState` function calculates the next state based on the current state and the input character.
5. It checks if the character at the current state matches the input character. If so, it moves to the next state.
6. If there is no match, it finds the longest prefix suffix and updates the next state accordingly.
7. The `search` function performs the string search using the finite automaton.
8. It initializes the state as 0 and processes the text character by character.
9. It updates the state using the transition function `TF[state][text.charAt(i)]`.
10. If the state becomes equal to the length of the pattern, a match is found, and it prints the index where the pattern occurs.
11. The `main` function demonstrates an example usage of the algorithm by calling `search` with a sample text and pattern.

The time and space complexity of the pattern-matching algorithms discussed:

where,

• `n` refers to the length of the text.
• `m` refers to the length of the pattern.
• `NO_OF_CHARS` refers to the number of characters in the input alphabet.

Note: also read about DSA: Binary Search

If you like my post please follow me to read my latest post on programming and technology.

https://www.instagram.com/coderz.py/