Class VeniceRawPubsubInputPartitionReader

java.lang.Object
com.linkedin.venice.spark.input.pubsub.raw.VeniceRawPubsubInputPartitionReader
All Implemented Interfaces:
Closeable, AutoCloseable, org.apache.spark.sql.connector.read.PartitionReader<org.apache.spark.sql.catalyst.InternalRow>

public class VeniceRawPubsubInputPartitionReader extends Object implements org.apache.spark.sql.connector.read.PartitionReader<org.apache.spark.sql.catalyst.InternalRow>
A Spark SQL data source partition reader implementation for Venice PubSub messages.

This reader consumes messages from a specific partition of a PubSub topic between specified start and end offsets, converting them into Spark's InternalRow format. The reader provides functionality for:

  • Reading messages from a specific topic partition
  • Filtering control messages when configured
  • Tracking consumption progress

This class is part of the Venice Spark connector enabling ETL and KIF functionality.

  • Constructor Details

    • VeniceRawPubsubInputPartitionReader

      public VeniceRawPubsubInputPartitionReader(VeniceBasicPubsubInputPartition inputPartition, PubSubConsumerAdapter consumer, PubSubTopicPartition topicPartition)
    • VeniceRawPubsubInputPartitionReader

      public VeniceRawPubsubInputPartitionReader(VeniceBasicPubsubInputPartition inputPartition, PubSubConsumerAdapter consumer, PubSubTopicPartition topicPartition, long pollTimeoutMs, int pollRetryTimes, long emptyPollSleepTimeMs, PubSubMessageConverter pubSubMessageConverter)
      Constructor for testing with custom timeout and retry values.
      Parameters:
      inputPartition - The input partition to read from
      consumer - The PubSub consumer adapter
      topicPartition - The topic partition
      pollTimeoutMs - The timeout in milliseconds for each poll operation
      pollRetryTimes - The number of retry attempts when polling returns empty results
      emptyPollSleepTimeMs - The sleep time in milliseconds between retries when polling returns empty results
      pubSubMessageConverter - The converter to use for converting PubSub messages to Spark rows
  • Method Details

    • getProgressPercent

      public float getProgressPercent()
    • get

      public org.apache.spark.sql.catalyst.InternalRow get()
      Assuming that Current row has meaningful data, this method will return that and counts the number of invocations.
      Specified by:
      get in interface org.apache.spark.sql.connector.read.PartitionReader<org.apache.spark.sql.catalyst.InternalRow>
      Returns:
      The current row as an InternalRow.
    • next

      public boolean next()
      Specified by:
      next in interface org.apache.spark.sql.connector.read.PartitionReader<org.apache.spark.sql.catalyst.InternalRow>
    • close

      public void close()
      Specified by:
      close in interface AutoCloseable
      Specified by:
      close in interface Closeable