Lakehouse

Data Fabric provides a set of APIs exposed from the [Open API Spec]({{<ref "/api" >}}) which exposes Lakehouse data to users.

                             .-- Data Fabric -------------------------------------.
                            |                                                      |
                            |                                                      |
  .--------.                |    .---------------.    .--------.                   |
  |        +------------------> / Lakehouse API /--> / Trino  /--.    .--------.   |
  |        |                |  '---------------'    '--------'   |   |          |  |
  +--------+                |                                    |   |'--------'|  |
 /// ____ \\\               |                                    '-->| Lakehouse|  |
'------------'              |                                        |          |  |
    Client                  |                                         '--------'   |
                            |                                                      |
                             '----------------------------------------------------'

Purpose

The Lakehouse APIs provide a user the ability to interact with the Data Fabric Lakehouse.

The Lakehouse is structured with a medallion architecture and the layers are exposed through various schemas.

The Data Fabric uses [Trino](https://trino.io/docs/current/overview.html) as a distributed query engine which has a direct connection to the Lakehouse. Trino is useful in this architecture for several reasons. 1. Trino has out of the box support for interacting with the Lakehouse. 2. Trino allows us to implement [Open Policy Agent](https://www.openpolicyagent.org/docs/latest/) for tighter controls on data access 3. [Trino Gateway](https://github.com/trinodb/trino-gateway) supports a load balancer, proxy server, and configurable routing gateway for multiple Trino clusters. 4. Trino supports large workloads and queries by streaming results.

Endpoints

| Operation | Endpoint | Description | | ---| -------- | ----------- | | GET| api/v1/lakehouse/schemas | Exposes the avialable schemas. | | GET| api/v1/lakehouse/schemas/{schema} | Exposes all of the tables which belong to the particular {schema}. | | GET| api/v1/schemas/{schema}/{table} | Exposes all of the column data belonging to the {schema}/{table}. | | POST| api/v1/schemas/{schema} | Allows the user to provide custom queries to the Lakehouse. |

Enablement

In order for a user to utilize Data Fabric Lakehouse APIs, the user must have an account with MCS Keycloak or Data Fabric’s Keycloak. The [Open API Spec]({{<ref "/api" >}}) will expect a username and password or JSON Web Token (JWT) in order to authenticate with the Lakehouse APIs. The user will be required to belong to a group upon onboarding which would have an associated Open Policy Agent (OPA) policy for data access. If you require access, need access to a group, or need the groups policy updated, reach out to our help desk.