Data Querying and Interaction

The significance of this test is to demonstrate the following:

  1. "As a user (developer), I can use Lakehouse APIs or Jupyter notebooks to write Python code to interact with the data in delta lake"

    • This test focuses on the Lakehouse API functionality

    • Other ways to test interacting with data would be to use Superset to interact with df-delta database, or to use Jupyterhub notebooks such as the Trino example notebook.

Steps to Validate Lakehouse Querying

  • Browse to SDL instance URL and login w/ appropriate credentials

  • In the Catalog page, select target datasource (example datasource for this procedure: Palantir), then click Enable

  • To view data files via Lakehouse:

  • Browse to https://minio.{{ SDL_URL }}

  • If prompted, select Login with SSO

  • In the User > Object Browser frame, there should exist a datasource bucket named lakehouse-{{ datasource }} - in this case lakehouse-palantir

  • Open the data bucket and explore the directories to verify delta_log and/or parquet data exists

If the correct data exists in the datasource bucket, then this test has succeeded.

Steps to Validate Lakehouse Interacting with Data

  • Browse to SDL instance URL and login w/ appropriate credentials

  • In the left navigation bar, navigate to APIs > Swagger

  • Click on /api/v1/lakehouse

  • Click Authorize and follow instructions to authorize the API call

  • Expand GET /schemas, click Try it out and then click Execute

  • Note the response body, and copy one of the schemas in the json array

  • Expand GET /schemas/{schema}, click Try it out

  • Input the schema name copied from the previous steps into the schema property field of this API call, and click Execute

  • Note the response body, and copy one of the tables in the json array

  • Expand POST /schemas/{schema}, and click Try it out

  • Input the schema name copied from the previous steps into the schema property field of this API call

  • In the Request body field, input a SQL query in place of "string", using the table name from the previous step: ie select * from {{ table_name }} limit 5, and click Execute

  • Note the response body, it should include a 200 code and the Response body should include data from the table.

If the correct data exists in the final response body, then this test has succeeded.