Class PubSubSplitPlanner

java.lang.Object
com.linkedin.venice.vpj.pubsub.input.PubSubSplitPlanner

public class PubSubSplitPlanner extends Object
Utility class for planning PubSubPartitionSplits for both MapReduce and Spark ingestion jobs.

This class encapsulates the common logic for:

  • Building a TopicManager from job configuration.
  • Reading split parameters (split type, max records per split, max splits per partition, time window, etc.).
  • Determining the number of partitions for a topic and generating splits using the appropriate PubSubTopicPartitionSplitStrategy.

Split planning is optimized in two ways:

  1. Batch position fetching: Start and end positions for all partitions are fetched in two bulk calls instead of 2N individual calls, drastically reducing pubsub queries.
  2. Parallel split computation: After batch-fetching positions, per-partition split computation (which is pure CPU work) is parallelized across available processors.