Proxy

Data Fabric provides a cache-enabled REST proxy to various external services. The list of currently proxied services is documented on the root API landing.

Diagram

Purpose

The proxy serves a few primary purposes for Data Fabric clients, including:

  • Central location for all data outreach (simplifying network egress controls and management).

  • Ability to cache response data for reducing latency and operating in DDIL environments.

  • Instrumented for observability to monitor all data request activity across all data consumer applications (clients of the proxy).

Enablement

Each proxied service must first be [enabled from the Catalog]({{< ref "/ui/catalog#enabling-a-data-source" >}}). During enablement, the Data Fabric user (or system account) provides the credentials necessary to authenticate with the target service.

The credentials are then stored securely in a Kubernetes Secret, and used by Data Fabric to invoke the target service on behalf of the user (or system account).

If a client attempts to invoke a proxied service without first enabling it, Data Fabric will respond with a 407 Proxy Authentication Required error code and a message indicating the service has not been enabled for that client.

Caching

One of the benefits to going through the Data Fabric proxy is leveraging the built-in cache. For any given request, the response from its upstream service is cached locally. Upon the next request for the same data, the proxy responds with the previously cached response. Once the response expires from the cache, the next request will again attempt to fetch from the upstream service (and cache the new response).

Diagram

Cached Responses

If a response came from the cache, the proxy will add a Cache-Ttl header to the response. The value will be the time left until the response expires from the cache.

Cache-Ttl: 55m47s

The absence of Cache-Ttl in the response indicates the response was fetched from the upstream service.

Controlling the Cache Behavior

There may be scenarios where you want to invalidate or skip over the cache for a particular request (testing a new upstream service, forcing a cache update, etc.).

To do this, the Data Fabric proxy supports an optional Cache-Mode request header which can have one of the following values.

  • Default - Default cache behavior (same as omitting the Cache-Mode header all together).

  • None - Ignore the cache completely (strictly pass-through to the upstream service). Useful for testing if the upstream is currently available without impacting the cache.

  • Only - Only return the cached response. If no cached response exists, returns 404. Useful for avoiding any external outreach attempts.

  • Invalidate - Ignore any existing cached response and fetch from upstream service. Update cache with new response. Useful for purging the cache of old data.