Class VenicePubSubMessageToRow

java.lang.Object
com.linkedin.venice.spark.input.pubsub.VenicePubSubMessageToRow
All Implemented Interfaces:
PubSubMessageConverter

public class VenicePubSubMessageToRow extends Object implements PubSubMessageConverter
Converts a PubSub message to a Spark InternalRow. it preserves the schema, replication metadata, and other necessary fields
  • Constructor Details

    • VenicePubSubMessageToRow

      public VenicePubSubMessageToRow()
  • Method Details

    • convertPubSubMessageToRow

      public static org.apache.spark.sql.catalyst.InternalRow convertPubSubMessageToRow(@NotNull @NotNull PubSubMessage<KafkaKey,KafkaMessageEnvelope,PubSubPosition> pubSubMessage, String region, int partitionNumber)
      Static factory method to maintain backward compatibility.
    • convert

      public org.apache.spark.sql.catalyst.InternalRow convert(@NotNull @NotNull PubSubMessage<KafkaKey,KafkaMessageEnvelope,PubSubPosition> pubSubMessage, String region, int partitionNumber)
      Converts a PubSub message to a Spark InternalRow.
      Specified by:
      convert in interface PubSubMessageConverter
      Parameters:
      pubSubMessage - The PubSub message to process. Contains key, value, and metadata.
      region - The region identifier to include in the row.
      partitionNumber - The partition number to include in the row.
      Returns:
      An InternalRow containing the processed message data. The row includes the following fields: 1. Region (String) 2. Partition number (int) 3. Message type (int) 4. Offset (long) 5. Schema ID (int) 6. Key bytes (byte[]) 7. Value bytes (byte[]) 8. Replication metadata payload bytes (byte[]) 9. Replication metadata version ID (int) See SparkConstants.RAW_PUBSUB_INPUT_TABLE_SCHEMA for the schema definition.