Class PubSubSplitPlanner
java.lang.Object
com.linkedin.venice.vpj.pubsub.input.PubSubSplitPlanner
Utility class for planning
PubSubPartitionSplits
for both MapReduce and Spark ingestion jobs.
This class encapsulates the common logic for:
- Building a
TopicManagerfrom job configuration. - Reading split parameters (split type, max records per split, max splits per partition, time window, etc.).
- Determining the number of partitions for a topic and generating splits using the appropriate
PubSubTopicPartitionSplitStrategy.
Split planning is optimized in two ways:
- Batch position fetching: Start and end positions for all partitions are fetched in two bulk calls instead of 2N individual calls, drastically reducing pubsub queries.
- Parallel split computation: After batch-fetching positions, per-partition split computation (which is pure CPU work) is parallelized across available processors.
-
Constructor Summary
Constructors -
Method Summary
-
Constructor Details
-
PubSubSplitPlanner
public PubSubSplitPlanner()
-
-
Method Details
-
plan
-