Pattern matching is a fundamental problem in computer science that is employed in a wide range of applications. Pattern matching in the context of strings refers to detecting instances of a given pattern (substring) inside a bigger string.
There are several popular algorithms and techniques for string pattern matching. Here are some commonly used ones:
The Naive Pattern Matching method is the most basic and easiest technique. It compares all characters in the main string to the pattern. It has a time complexity of O(N*M), where N is the text length and M is the pattern length.
class NaivePatternMatching {
static void search(char[] pattern, char[] text) {
int m = pattern.length;
int n = text.length;
for (int i = 0; i <= n - m; i++) {
int j;
for (j = 0; j < m; j++) {
if (text[i + j] != pattern[j])
break;
}
if (j == m)
System.out.println("Pattern found at index " + i);
}
}
public static void main(String[] args) {
char[] text = "AABAACAADAABAAABAA".toCharArray();
char[] pattern = "AABA".toCharArray();
search(pattern, text);
}
}
Explanation:
search
method takes two parameters: the pattern
(the pattern to search for) and the text
(the text in which to search for the pattern).m
and n
with the lengths of the pattern
and text
, respectively.for
loop iterates from 0 to n - m
(inclusive). This loop controls the starting position of the pattern in the text.for
loop that iterates over the characters of the pattern. It compares each character of the pattern with the corresponding character in the text starting from the current position.main
method is where the pattern search is invoked. It initializes the text
and pattern
arrays with the provided strings, and then calls the search
method, passing the pattern
and text
arrays as arguments.The KMP method enhances pattern-matching performance by utilizing a preprocessed table known as the Longest Proper Prefix, which is also a Suffix (LPS) table. It prevents redundant character comparisons based on the LPS table information. The KMP algorithm has a time complexity of O(N+M).
class KMPAlgorithm {
static void KMPAlgorithm(String pattern, String text) {
int m = pattern.length();
int n = text.length();
int[] lps = computeLPSArray(pattern, m);
int i = 0; // index for text[]
int j = 0; // index for pattern[]
while (i < n) {
if (pattern.charAt(j) == text.charAt(i)) {
i++;
j++;
}
if (j == m) {
System.out.println("Pattern found at index " + (i - j));
j = lps[j - 1];
} else if (i < n && pattern.charAt(j) != text.charAt(i)) {
if (j != 0) {
j = lps[j - 1];
} else {
i++;
}
}
}
}
static int[] computeLPSArray(String pattern, int m) {
int[] lps = new int[m];
int len = 0;
lps[0] = 0;
int i = 1;
while (i < m) {
if (pattern.charAt(i) == pattern.charAt(len)) {
len++;
lps[i] = len;
i++;
} else {
if (len != 0) {
len = lps[len - 1];
} else {
lps[i] = 0;
i++;
}
}
}
return lps;
}
public static void main(String[] args) {
String text = "AABAACAADAABAAABAA";
String pattern = "AABA";
KMPAlgorithm(pattern, text);
}
}
Explanation:
KMPAlgorithm
function takes the pattern and text strings as input.m
), length of the text (n
), and the Longest Proper Prefix which is also a Suffix array (lps
), which is computed using the computeLPSArray
function.j
) and text index (i
).j == m
), it prints the index at which the pattern is found and updates the pattern index using the lps
array.j
is not at the beginning of the pattern, it updates the pattern index using the lps
array.j
is at the beginning of the pattern, it increments the text index.computeLPSArray
function computes the Longest Proper Prefix which is also a Suffix array for the given pattern using the two-pointer technique.main
function demonstrates an example usage of the algorithm by calling KMPAlgorithm
with a sample text and pattern.The Rabin-Karp algorithm searches for patterns in text using hashing. It compares the hash value of the pattern to the hash values of the text’s substrings. It has an average time complexity of O(N+M), where N is the text length and M is the pattern length.
class RabinKarpAlgorithm {
static final int d = 256; // Number of characters in the input alphabet
static void search(String pattern, String text, int q) {
int m = pattern.length();
int n = text.length();
int p = 0; // Hash value for pattern
int t = 0; // Hash value for text
int h = 1;
// Calculate the hash value of the pattern and the first window of text
for (int i = 0; i < m - 1; i++)
h = (h * d) % q;
for (int i = 0; i < m; i++) {
p = (d * p + pattern.charAt(i)) % q;
t = (d * t + text.charAt(i)) % q;
}
// Slide the pattern over the text one by one
for (int i = 0; i <= n - m; i++) {
// If the hash values of the current window in text and pattern match, check for exact match
if (p == t) {
boolean found = true;
for (int j = 0; j < m; j++) {
if (text.charAt(i + j) != pattern.charAt(j)) {
found = false;
break;
}
}
if (found)
System.out.println("Pattern found at index " + i);
}
// Calculate the hash value for the next window of text
if (i < n - m) {
t = (d * (t - text.charAt(i) * h) + text.charAt(i + m)) %q;
if (t < 0)
t = t + q;
}
}
}
public static void main(String[] args) {
String text = "AABAACAADAABAAABAA";
String pattern = "AABA";
int q = 101; // A prime number
search(pattern, text, q);
}
}
Explanation:
search
function takes the pattern, text, and a prime number q
as input.m
) and text (n
), hash values for pattern (p
) and text (t
), and a value h
for the highest significant digit.main
function demonstrates an example usage of the algorithm by calling search
with a sample text, pattern, and a prime number.The Finite Automaton Algorithm generates a finite automaton based on the pattern and utilizes it to do pattern matching. It prepares the pattern by generating a state transition table. The time complexity of the algorithm is O(N+M), where N is the length of the text and M is the length of the pattern.
import java.util.Arrays;
class FiniteAutomatonAlgorithm {
static final int NO_OF_CHARS = 256;
static int[][] computeTransitionFunction(String pattern, int m) {
int[][] TF = new int[m + 1][NO_OF_CHARS];
int lps = 0;
// Fill the entries in TF[][] based on pattern[]
for (int state = 0; state <= m; state++) {
for (int x = 0; x < NO_OF_CHARS; x++)
TF[state][x] = getNextState(pattern, m, state, x);
}
return TF;
}
static int getNextState(String pattern, int m, int state, int x) {
// If the character at the current state matches the input character, move to the next state
if (state < m && x == pattern.charAt(state))
return state + 1;
int nextState, i;
// Find the longest prefix suffix
for (nextState = state; nextState > 0; nextState--) {
if (pattern.charAt(nextState - 1) == x) {
for (i = 0; i < nextState - 1; i++) {
if (pattern.charAt(i) != pattern.charAt(state - nextState + 1 + i))
break;
}
if (i == nextState - 1)
return nextState;
}
}
return 0;
}
static void search(String pattern, String text) {
int m = pattern.length();
int n = text.length();
int[][] TF = computeTransitionFunction(pattern, m);
// Process text over the finite automaton to find occurrences of the pattern
int state = 0;
for (int i = 0; i < n; i++) {
state = TF[state][text.charAt(i)];
if (state == m)
System.out.println("Pattern found at index " + (i - m + 1));
}
}
public static void main(String[] args) {
String text = "AABAACAADAABAAABAA";
String pattern = "AABA";
search(pattern, text);
}
}
Explanation:
computeTransitionFunction
function calculates the transition function (TF) for the given pattern.TF
with dimensions (m + 1) x NO_OF_CHARS
, where m
is the length of the pattern and NO_OF_CHARS
is the number of characters in the input alphabet (e.g., ASCII).TF
by iterating through each state and character, using the getNextState
function to determine the next state based on the current state and input character.getNextState
function calculates the next state based on the current state and the input character.search
function performs the string search using the finite automaton.TF[state][text.charAt(i)]
.main
function demonstrates an example usage of the algorithm by calling search
with a sample text and pattern.The time and space complexity of the pattern-matching algorithms discussed:
Algorithm | Time Complexity | Space Complexity |
---|---|---|
Naive Pattern Searching | O((n – m + 1) * m) | O(1) |
KMP Algorithm | O(n + m) | O(m) |
Rabin-Karp Algorithm | O((n – m + 1) * m) | O(1) |
Finite Automaton Algorithm | O(n) | O(m * NO_OF_CHARS) |
where,
n
refers to the length of the text.m
refers to the length of the pattern.NO_OF_CHARS
refers to the number of characters in the input alphabet.Note: also read about DSA: Binary Search
If you like my post please follow me to read my latest post on programming and technology.
https://www.instagram.com/coderz.py/
https://www.facebook.com/coderz.py
Staying up to the mark is what defines me. Hi all! I’m Rabecca Fatima a keen learner, great enthusiast, ready to take new challenges as stepping stones towards flying colors.
Problem Statement: Given n pairs of parentheses, write a function to generate all combinations of well-formed parentheses. Example…
Given an integer A. Compute and return the square root of A. If A is…
Given a zero-based permutation nums (0-indexed), build an array ans of the same length where…
A heap is a specialized tree-based data structure that satisfies the heap property. It is…
What is the Lowest Common Ancestor? In a tree, the lowest common ancestor (LCA) of…