Categories: Data Structure

Problems based on Pattern Matching

In the previous post, we discussed various pattern-matching algorithms, let us now see a few problems based on it.

Given two strings, one is a text string and the other is a pattern string. The task is to print the indexes of all the occurrences of pattern string in the text string. For printing, Starting Index of a string should be taken as 1.(using Rabin-Karp)

Example:

Input:
S = "batmanandrobinarebat", pat = "bat"
Output: 1 18
Explanation: The string "bat" occurs twice
in S, one starts are index 1 and the other
at index 18. 

Expected Time Complexity: O(|S|*|pat|).
Expected Auxiliary Space: O(1).
import java.util.*;

class Solution {
    public List<Integer> search(String S, String pat) {
        List<Integer> result = new ArrayList<>();
        int n = S.length();
        int m = pat.length();
        int patHash = hash(pat);
        int textHash = hash(S.substring(0, m));

        for (int i = 0; i <= n - m; i++) {
            if (patHash == textHash && checkEqual(S, i, i + m - 1, pat)) {
                result.add(i + 1); // Convert 0-based index to 1-based index
            }
            if (i < n - m) {
                textHash = recalculateHash(S, i, i + m, textHash, m);
            }
        }

        return result;
    }

    private int hash(String str) {
        int hashValue = 0;
        int p = 31; // A prime number
        int m = (int) 1e9 + 9; // A large prime number
        int pow = 1;

        for (char ch : str.toCharArray()) {
            hashValue = (hashValue + (ch - 'a' + 1) * pow) % m;
            pow = (pow * p) % m;
        }

        return hashValue;
    }

    private boolean checkEqual(String str1, int start1, int end1, String str2) {
        if (end1 - start1 + 1 != str2.length()) {
            return false;
        }
        int i = start1;
        for (char ch : str2.toCharArray()) {
            if (str1.charAt(i) != ch) {
                return false;
            }
            i++;
        }
        return true;
    }

    private int recalculateHash(String str, int oldIndex, int newIndex, int oldHash, int patternLen) {
        int newHash = oldHash - (str.charAt(oldIndex) - 'a' + 1);
        newHash = newHash / 31;
        newHash = newHash + (str.charAt(newIndex) - 'a' + 1) * (int) Math.pow(31, patternLen - 1);
        return newHash;
    }
}

The given code implements the Rabin-Karp algorithm to find all occurrences of a pattern string pat in a text string S. The search method takes the text string S and the pattern string pat as input and returns a list of start indices (1-based) of substring pat in the string S. The time complexity of the Rabin-Karp algorithm is O(|S| * |pat|), and the auxiliary space complexity is O(1).

Given two strings, one is a text string, txt and other is a pattern string, pat. The task is to print the indexes of all the occurrences of pattern string in the text string. For printing, Starting Index of a string should be taken as 1.(using KPM)


Example:

Input:
txt = "batmanandrobinarebat", pat = "bat"
Output: 1 18
Explanation: The string "bat" occurs twice
in txt, one starts are index 1 and the other
at index 18. 

Expected Time Complexity: O(|txt|).
Expected Auxiliary Space: O(|txt|).
import java.util.*;

class Solution {
    public List<Integer> search(String txt, String pat) {
        List<Integer> occurrences = new ArrayList<>();
        
        int[] lps = computeLPSArray(pat);
        int n = txt.length();
        int m = pat.length();
        
        int i = 0;  // Index for txt
        int j = 0;  // Index for pat
        
        while (i < n) {
            if (txt.charAt(i) == pat.charAt(j)) {
                i++;
                j++;
                
                if (j == m) {
                    occurrences.add(i - j + 1);
                    j = lps[j - 1];
                }
            } else {
                if (j != 0) {
                    j = lps[j - 1];
                } else {
                    i++;
                }
            }
        }
        
        return occurrences;
    }
    
    private int[] computeLPSArray(String pat) {
        int m = pat.length();
        int[] lps = new int[m];
        int len = 0;  // Length of the previous longest prefix suffix
        
        lps[0] = 0;
        int i = 1;
        
        while (i < m) {
            if (pat.charAt(i) == pat.charAt(len)) {
                len++;
                lps[i] = len;
                i++;
            } else {
                if (len != 0) {
                    len = lps[len - 1];
                } else {
                    lps[i] = 0;
                    i++;
                }
            }
        }
        
        return lps;
    }
}

The code uses the Knuth-Morris-Pratt (KMP) algorithm to find all occurrences of a pattern string (pat) in a text string (txt). The search function iterates through the text string and compares characters with the pattern. It uses the Longest Prefix Suffix (LPS) array to optimize the matching process.

The computeLPSArray function calculates the LPS array for the pattern string. It iterates through the pattern and updates the LPS values based on the longest common prefix and suffix.

Overall, the time complexity is O(|txt|), and the space complexity is O(|txt|) for storing the occurrences list.

Given a sentence that consists of some words separated by a single space, and a searchWord, check if searchWord is a prefix of any word in sentence.

Return the index of the word in sentence (1-indexed) where searchWord is a prefix of this word. If searchWord is a prefix of more than one word, return the index of the first word (minimum index). If there is no such word return -1.

A prefix of a string s is any leading contiguous substring of s.

Example:

Input: sentence = "i love eating burger", searchWord = "burg"
Output: 4
Explanation: "burg" is prefix of "burger" which is the 4th word in the sentence.
class Solution {
    public int isPrefixOfWord(String sentence, String searchWord) {
        String[] words = sentence.split(" ");
        
        for (int i = 0; i < words.length; i++) {
            if (words[i].startsWith(searchWord)) {
                return i + 1;
            }
        }
        
        return -1;
    }
}

Explanation: The isPrefixOfWord method takes a sentence and a searchWord as input and returns the index (1-indexed) of the word in the sentence where the searchWord is a prefix of that word. If there is no such word, it returns -1.

  1. Split the sentence into an array of words using the space as the delimiter. This can be done using the split method.
  2. Iterate over each word in the array.
  3. Check if the current word starts with the searchWord using the startsWith method.
  4. If the condition is true, return the index of the word plus 1, as it is 1-indexed.
  5. If no word is found with the given prefix, return -1.

The time complexity of this solution is O(n), where n is the length of the sentence, as we iterate over the words once.

Note: also read about DSA: Strings Pattern Matching

Follow Me

If you like my post please follow me to read my latest post on programming and technology.

https://www.instagram.com/coderz.py/

https://www.facebook.com/coderz.py

Recent Posts

Generate Parenthesis | Intuition + Code | Recursion Tree | Backtracking | Java

Problem Statement: Given n pairs of parentheses, write a function to generate all combinations of well-formed parentheses. Example…

3 months ago

Square Root of Integer

Given an integer A. Compute and return the square root of A. If A is…

1 year ago

Build Array From Permutation

Given a zero-based permutation nums (0-indexed), build an array ans of the same length where…

1 year ago

DSA: Heap

A heap is a specialized tree-based data structure that satisfies the heap property. It is…

2 years ago

DSA: Trie

What is a Trie in DSA? A trie, often known as a prefix tree, is…

2 years ago

Trees: Lowest Common Ancestor

What is the Lowest Common Ancestor? In a tree, the lowest common ancestor (LCA) of…

2 years ago