What is Pattern Matching?
Pattern matching is a fundamental problem in computer science that is employed in a wide range of applications. Pattern matching in the context of strings refers to detecting instances of a given pattern (substring) inside a bigger string.
Algorithm for Pattern Matching:
There are several popular algorithms and techniques for string pattern matching. Here are some commonly used ones:
Naive Pattern Matching:
The Naive Pattern Matching method is the most basic and easiest technique. It compares all characters in the main string to the pattern. It has a time complexity of O(N*M), where N is the text length and M is the pattern length.
class NaivePatternMatching {
static void search(char[] pattern, char[] text) {
int m = pattern.length;
int n = text.length;
for (int i = 0; i <= n - m; i++) {
int j;
for (j = 0; j < m; j++) {
if (text[i + j] != pattern[j])
break;
}
if (j == m)
System.out.println("Pattern found at index " + i);
}
}
public static void main(String[] args) {
char[] text = "AABAACAADAABAAABAA".toCharArray();
char[] pattern = "AABA".toCharArray();
search(pattern, text);
}
}
Explanation:
- The
search
method takes two parameters: thepattern
(the pattern to search for) and thetext
(the text in which to search for the pattern). - It initializes variables
m
andn
with the lengths of thepattern
andtext
, respectively. - The outer
for
loop iterates from 0 ton - m
(inclusive). This loop controls the starting position of the pattern in the text. - Inside the outer loop, there is an inner
for
loop that iterates over the characters of the pattern. It compares each character of the pattern with the corresponding character in the text starting from the current position. - If a mismatch is found (i.e., the characters at the current positions don’t match), the inner loop breaks, and the search continues with the next starting position.
- If the inner loop completes without a break, it means that the entire pattern has been matched at the current starting position. In this case, it prints the message “Pattern found at index ” followed by the current starting index.
- The
main
method is where the pattern search is invoked. It initializes thetext
andpattern
arrays with the provided strings, and then calls thesearch
method, passing thepattern
andtext
arrays as arguments. - When you run the program, it searches for the pattern “AABA” within the text “AABAACAADAABAAABAA”. The output shows the indices where the pattern is found: 0, 9, and 13.
Knuth-Morris-Pratt (KMP) Algorithm:
The KMP method enhances pattern-matching performance by utilizing a preprocessed table known as the Longest Proper Prefix, which is also a Suffix (LPS) table. It prevents redundant character comparisons based on the LPS table information. The KMP algorithm has a time complexity of O(N+M).
class KMPAlgorithm {
static void KMPAlgorithm(String pattern, String text) {
int m = pattern.length();
int n = text.length();
int[] lps = computeLPSArray(pattern, m);
int i = 0; // index for text[]
int j = 0; // index for pattern[]
while (i < n) {
if (pattern.charAt(j) == text.charAt(i)) {
i++;
j++;
}
if (j == m) {
System.out.println("Pattern found at index " + (i - j));
j = lps[j - 1];
} else if (i < n && pattern.charAt(j) != text.charAt(i)) {
if (j != 0) {
j = lps[j - 1];
} else {
i++;
}
}
}
}
static int[] computeLPSArray(String pattern, int m) {
int[] lps = new int[m];
int len = 0;
lps[0] = 0;
int i = 1;
while (i < m) {
if (pattern.charAt(i) == pattern.charAt(len)) {
len++;
lps[i] = len;
i++;
} else {
if (len != 0) {
len = lps[len - 1];
} else {
lps[i] = 0;
i++;
}
}
}
return lps;
}
public static void main(String[] args) {
String text = "AABAACAADAABAAABAA";
String pattern = "AABA";
KMPAlgorithm(pattern, text);
}
}
Explanation:
- The
KMPAlgorithm
function takes the pattern and text strings as input. - It initializes variables, including the length of the pattern (
m
), length of the text (n
), and the Longest Proper Prefix which is also a Suffix array (lps
), which is computed using thecomputeLPSArray
function. - The algorithm iterates through the text while comparing characters of the pattern with the corresponding characters in the text.
- If a match is found, it increments both the pattern index (
j
) and text index (i
). - If the entire pattern is matched (
j == m
), it prints the index at which the pattern is found and updates the pattern index using thelps
array. - If there is a mismatch and
j
is not at the beginning of the pattern, it updates the pattern index using thelps
array. - If there is a mismatch and
j
is at the beginning of the pattern, it increments the text index. - The
computeLPSArray
function computes the Longest Proper Prefix which is also a Suffix array for the given pattern using the two-pointer technique. - The
main
function demonstrates an example usage of the algorithm by callingKMPAlgorithm
with a sample text and pattern.
Rabin-Karp Algorithm:
The Rabin-Karp algorithm searches for patterns in text using hashing. It compares the hash value of the pattern to the hash values of the text’s substrings. It has an average time complexity of O(N+M), where N is the text length and M is the pattern length.
class RabinKarpAlgorithm {
static final int d = 256; // Number of characters in the input alphabet
static void search(String pattern, String text, int q) {
int m = pattern.length();
int n = text.length();
int p = 0; // Hash value for pattern
int t = 0; // Hash value for text
int h = 1;
// Calculate the hash value of the pattern and the first window of text
for (int i = 0; i < m - 1; i++)
h = (h * d) % q;
for (int i = 0; i < m; i++) {
p = (d * p + pattern.charAt(i)) % q;
t = (d * t + text.charAt(i)) % q;
}
// Slide the pattern over the text one by one
for (int i = 0; i <= n - m; i++) {
// If the hash values of the current window in text and pattern match, check for exact match
if (p == t) {
boolean found = true;
for (int j = 0; j < m; j++) {
if (text.charAt(i + j) != pattern.charAt(j)) {
found = false;
break;
}
}
if (found)
System.out.println("Pattern found at index " + i);
}
// Calculate the hash value for the next window of text
if (i < n - m) {
t = (d * (t - text.charAt(i) * h) + text.charAt(i + m)) %q;
if (t < 0)
t = t + q;
}
}
}
public static void main(String[] args) {
String text = "AABAACAADAABAAABAA";
String pattern = "AABA";
int q = 101; // A prime number
search(pattern, text, q);
}
}
Explanation:
- The
search
function takes the pattern, text, and a prime numberq
as input. - It initializes variables, including the lengths of the pattern (
m
) and text (n
), hash values for pattern (p
) and text (t
), and a valueh
for the highest significant digit. - It calculates the hash value of the pattern and the first window of the text.
- It iterates through the text, comparing the hash values of the current window in the text and pattern.
- If the hash values match, it performs an additional check for an exact match by comparing characters.
- If a match is found, it prints the index at which the pattern is found.
- It calculates the hash value for the next window of the text using a rolling hash technique.
- The
main
function demonstrates an example usage of the algorithm by callingsearch
with a sample text, pattern, and a prime number.
Finite Automaton Algorithm:
The Finite Automaton Algorithm generates a finite automaton based on the pattern and utilizes it to do pattern matching. It prepares the pattern by generating a state transition table. The time complexity of the algorithm is O(N+M), where N is the length of the text and M is the length of the pattern.
import java.util.Arrays;
class FiniteAutomatonAlgorithm {
static final int NO_OF_CHARS = 256;
static int[][] computeTransitionFunction(String pattern, int m) {
int[][] TF = new int[m + 1][NO_OF_CHARS];
int lps = 0;
// Fill the entries in TF[][] based on pattern[]
for (int state = 0; state <= m; state++) {
for (int x = 0; x < NO_OF_CHARS; x++)
TF[state][x] = getNextState(pattern, m, state, x);
}
return TF;
}
static int getNextState(String pattern, int m, int state, int x) {
// If the character at the current state matches the input character, move to the next state
if (state < m && x == pattern.charAt(state))
return state + 1;
int nextState, i;
// Find the longest prefix suffix
for (nextState = state; nextState > 0; nextState--) {
if (pattern.charAt(nextState - 1) == x) {
for (i = 0; i < nextState - 1; i++) {
if (pattern.charAt(i) != pattern.charAt(state - nextState + 1 + i))
break;
}
if (i == nextState - 1)
return nextState;
}
}
return 0;
}
static void search(String pattern, String text) {
int m = pattern.length();
int n = text.length();
int[][] TF = computeTransitionFunction(pattern, m);
// Process text over the finite automaton to find occurrences of the pattern
int state = 0;
for (int i = 0; i < n; i++) {
state = TF[state][text.charAt(i)];
if (state == m)
System.out.println("Pattern found at index " + (i - m + 1));
}
}
public static void main(String[] args) {
String text = "AABAACAADAABAAABAA";
String pattern = "AABA";
search(pattern, text);
}
}
Explanation:
- The
computeTransitionFunction
function calculates the transition function (TF) for the given pattern. - It initializes a 2D array
TF
with dimensions(m + 1) x NO_OF_CHARS
, wherem
is the length of the pattern andNO_OF_CHARS
is the number of characters in the input alphabet (e.g., ASCII). - It fills the entries in
TF
by iterating through each state and character, using thegetNextState
function to determine the next state based on the current state and input character. - The
getNextState
function calculates the next state based on the current state and the input character. - It checks if the character at the current state matches the input character. If so, it moves to the next state.
- If there is no match, it finds the longest prefix suffix and updates the next state accordingly.
- The
search
function performs the string search using the finite automaton. - It initializes the state as 0 and processes the text character by character.
- It updates the state using the transition function
TF[state][text.charAt(i)]
. - If the state becomes equal to the length of the pattern, a match is found, and it prints the index where the pattern occurs.
- The
main
function demonstrates an example usage of the algorithm by callingsearch
with a sample text and pattern.
The time and space complexity of the pattern-matching algorithms discussed:
Algorithm | Time Complexity | Space Complexity |
---|---|---|
Naive Pattern Searching | O((n – m + 1) * m) | O(1) |
KMP Algorithm | O(n + m) | O(m) |
Rabin-Karp Algorithm | O((n – m + 1) * m) | O(1) |
Finite Automaton Algorithm | O(n) | O(m * NO_OF_CHARS) |
where,
n
refers to the length of the text.m
refers to the length of the pattern.NO_OF_CHARS
refers to the number of characters in the input alphabet.
Note: also read about DSA: Binary Search
Follow Me
If you like my post please follow me to read my latest post on programming and technology.
https://www.instagram.com/coderz.py/
https://www.facebook.com/coderz.py
Staying up to the mark is what defines me. Hi all! I’m Rabecca Fatima a keen learner, great enthusiast, ready to take new challenges as stepping stones towards flying colors.
Leave a Comment