Class StoreValueSchemasCacheService

  • All Implemented Interfaces:
    ReadOnlySchemaRepository, VeniceResource, java.io.Closeable, java.lang.AutoCloseable

    public class StoreValueSchemasCacheService
    extends AbstractVeniceService
    implements ReadOnlySchemaRepository
    This class implements the fast value schema/latest value schema lookup with acceptable delay. The reason to introduce this class is that we found two issues to use HelixReadOnlySchemaRepository directly in read compute path: 1. When ZK disconnect/re-connect happens, HelixReadOnlySchemaRepository will refresh its local cache, which would cause an increased GC count in read compute since HelixReadOnlySchemaRepository.refresh() is holding a giant write lock and all the value schema/latest value schema lookups in the read compute requests will be blocked. The GC count increase is significant (more than doubled in test cluster), which has been causing much higher CPU usage and higher latency; 2. The schema objects returned by HelixReadOnlySchemaRepository for the same schema are not always the same object since HelixReadOnlySchemaRepository.refresh() would always re-create new Schema objects, which will cause the inefficient de-serializer lookup in SerializerDeserializerFactory, which will compare the schema objects to find out the corresponding serializer/de-serializer (for read compute case, de-serializer is the concern). If the schema objects are not the same, Schema.hashCode() and Schema.equals(Object) will be used, in Avro-1.7 or above, Schema.hashCode() is optimized to only calculate once if it is read-only, but Schema.equals(Object) couldn't be avoided. Here how it works in this class: 1. It maintains a mapping between stores and their value schemas and latest value schema; 2. It will try to reuse the same Schema object for the same Schema Id within a store since value schema is immutable; 3. It maintains a refresh thread to update the local cache periodically; In theory, all the schema lookups shouldn't be blocked by invoking the underlying HelixReadOnlySchemaRepository since in reality, it will take a fair long time to register a new value schema/latest value schema and start using it in prod, so the periodical schema refresh should be able to take care of the new value schema/latest value schema discovery. Since the refresh is async, there is a delay there (at most 1 min), and it should be acceptable because of the previous assumption. So far, this class only supports value schema by id lookup and latest value schema lookup.