coderz.py

Keep Coding Keep Cheering!

Select a Random Element from a Stream

DSA Sheet

Problem Statement (Asked By Facebook)

You are given a stream of elements that is too large to fit into memory. Write an algorithm to select a random element from the stream with equal probability.

Problem link: N/A

Disclaimer: Try solving the problem on your own first! Use this solution only as a reference to learn and improve.


Explanation:

A naive solution to pick a random element from a stream involves storing all elements in a list, determining the size of the list, and then selecting a random index. However, this approach requires O(N) space, where N is the size of the stream, which is inefficient for very large or infinite streams.

Instead, we can use reservoir sampling, which allows us to select a random element from a stream with O(1) space complexity. The key idea is to maintain a single random element and update it dynamically as we process the stream:

  1. Base Case:
    • For the first element in the stream (i=0), select it as the random element.
  2. Updating the Random Element:
    • For each subsequent element eie_iei​ at index iii, replace the current random element with ei​​ with probability 1/(i+1).
  3. Proof of Uniformity:
    • Any element ek​ from the stream has an equal probability of 1/N of being selected after processing N elements. This is achieved by combining its initial selection and the probability of not being replaced.
import java.util.Random;

public class ReservoirSampling {

    public static <T> T pickRandomElement(Iterable<T> stream) {
        Random random = new Random();
        T randomElement = null;
        int index = 0;

        for (T element : stream) {
            // Replace the current random element with the current element with probability 1 / (index + 1)
            if (random.nextInt(index + 1) == 0) {
                randomElement = element;
            }
            index++;
        }

        return randomElement;
    }

    public static void main(String[] args) {
        // Example usage
        Iterable<Integer> stream = java.util.List.of(10, 20, 30, 40, 50);
        int trials = 100000;
        int[] counts = new int[5]; // To track frequency of selection for each element

        for (int i = 0; i < trials; i++) {
            int picked = pickRandomElement(stream);
            counts[stream.iterator().next() - 10]++; // Map element to index
        }

        // Print frequencies
        for (int i = 0; i < counts.length; i++) {
            System.out.printf("Element %d picked %d times\n", 10 + i * 10, counts[i]);
        }
    }
}

Explanation of Code:

  1. Initialization:
    • A Random instance is used to generate random numbers.
    • randomElement stores the currently selected random element.
    • index keeps track of the current position in the stream.
  2. Iterating Through the Stream:
    • For each element, replace randomElement with the current element with probability 1/(index+1).
    • This ensures each element has an equal probability of being selected after processing all elements.
  3. Output:
    • After processing the stream, randomElement contains the randomly selected element.

Complexity:

  1. Time Complexity:
    • O(N), where N is the number of elements in the stream.
  2. Space Complexity:
    • O(1), as we store only the current random element and a few auxiliary variables.

Key Features of Reservoir Sampling:

  1. Constant Space:
    • Only a single variable is maintained for the random element, making it highly space-efficient.
  2. Uniform Distribution:
    • Ensures that every element in the stream has an equal chance of being selected, regardless of the stream’s size.

This algorithm is ideal for selecting a random element from very large or infinite streams.


Did this solution help you understand the concept better? Let me know your thoughts in the comments below! Don’t forget to follow us on Instagram @coderz.py for more DSA tips and solutions to ace your coding interviews.

Leave a Comment

Advertisement