Class VeniceRawPubsubInputPartitionReader
java.lang.Object
com.linkedin.venice.spark.input.pubsub.raw.VeniceRawPubsubInputPartitionReader
- All Implemented Interfaces:
Closeable
,AutoCloseable
,org.apache.spark.sql.connector.read.PartitionReader<org.apache.spark.sql.catalyst.InternalRow>
public class VeniceRawPubsubInputPartitionReader
extends Object
implements org.apache.spark.sql.connector.read.PartitionReader<org.apache.spark.sql.catalyst.InternalRow>
A Spark SQL data source partition reader implementation for Venice PubSub messages.
This reader consumes messages from a specific partition of a PubSub topic between
specified start and end offsets, converting them into Spark's InternalRow
format.
The reader provides functionality for:
- Reading messages from a specific topic partition
- Filtering control messages when configured
- Tracking consumption progress
This class is part of the Venice Spark connector enabling ETL and KIF functionality.
-
Constructor Summary
ConstructorsConstructorDescriptionVeniceRawPubsubInputPartitionReader
(VeniceBasicPubsubInputPartition inputPartition, PubSubConsumerAdapter consumer, PubSubTopicPartition topicPartition) VeniceRawPubsubInputPartitionReader
(VeniceBasicPubsubInputPartition inputPartition, PubSubConsumerAdapter consumer, PubSubTopicPartition topicPartition, long pollTimeoutMs, int pollRetryTimes, long emptyPollSleepTimeMs, PubSubMessageConverter pubSubMessageConverter) Constructor for testing with custom timeout and retry values. -
Method Summary
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
Methods inherited from interface org.apache.spark.sql.connector.read.PartitionReader
currentMetricsValues
-
Constructor Details
-
VeniceRawPubsubInputPartitionReader
public VeniceRawPubsubInputPartitionReader(VeniceBasicPubsubInputPartition inputPartition, PubSubConsumerAdapter consumer, PubSubTopicPartition topicPartition) -
VeniceRawPubsubInputPartitionReader
public VeniceRawPubsubInputPartitionReader(VeniceBasicPubsubInputPartition inputPartition, PubSubConsumerAdapter consumer, PubSubTopicPartition topicPartition, long pollTimeoutMs, int pollRetryTimes, long emptyPollSleepTimeMs, PubSubMessageConverter pubSubMessageConverter) Constructor for testing with custom timeout and retry values.- Parameters:
inputPartition
- The input partition to read fromconsumer
- The PubSub consumer adaptertopicPartition
- The topic partitionpollTimeoutMs
- The timeout in milliseconds for each poll operationpollRetryTimes
- The number of retry attempts when polling returns empty resultsemptyPollSleepTimeMs
- The sleep time in milliseconds between retries when polling returns empty resultspubSubMessageConverter
- The converter to use for converting PubSub messages to Spark rows
-
-
Method Details
-
getProgressPercent
public float getProgressPercent() -
get
public org.apache.spark.sql.catalyst.InternalRow get()Assuming that Current row has meaningful data, this method will return that and counts the number of invocations.- Specified by:
get
in interfaceorg.apache.spark.sql.connector.read.PartitionReader<org.apache.spark.sql.catalyst.InternalRow>
- Returns:
- The current row as an
InternalRow
.
-
next
public boolean next()- Specified by:
next
in interfaceorg.apache.spark.sql.connector.read.PartitionReader<org.apache.spark.sql.catalyst.InternalRow>
-
close
public void close()- Specified by:
close
in interfaceAutoCloseable
- Specified by:
close
in interfaceCloseable
-