Data Querying and Interaction
The significance of this test is to demonstrate the following:
-
"As a user (developer), I can use Lakehouse APIs or Jupyter notebooks to write Python code to interact with the data in delta lake"
-
This test focuses on the Lakehouse API functionality
-
Other ways to test interacting with data would be to use Superset to interact with
df-delta
database, or to use Jupyterhub notebooks such as theTrino
example notebook.
-
Steps to Validate Lakehouse Querying
-
Browse to SDL instance URL and login w/ appropriate credentials
-
In the Catalog page, select target datasource (example datasource for this procedure: Palantir), then click Enable
-
To view data files via Lakehouse:
-
Browse to
https://minio.{{ SDL_URL }}
-
If prompted, select Login with SSO
-
In the User > Object Browser frame, there should exist a datasource bucket named lakehouse-{{ datasource }} - in this case lakehouse-palantir
-
Open the data bucket and explore the directories to verify delta_log and/or parquet data exists
If the correct data exists in the datasource bucket, then this test has succeeded.
Steps to Validate Lakehouse Interacting with Data
-
Browse to SDL instance URL and login w/ appropriate credentials
-
In the left navigation bar, navigate to APIs > Swagger
-
Click on /api/v1/lakehouse
-
Click Authorize and follow instructions to authorize the API call
-
Expand GET /schemas, click Try it out and then click Execute
-
Note the response body, and copy one of the schemas in the json array
-
Expand GET /schemas/{schema}, click Try it out
-
Input the schema name copied from the previous steps into the schema property field of this API call, and click Execute
-
Note the response body, and copy one of the tables in the json array
-
Expand POST /schemas/{schema}, and click Try it out
-
Input the schema name copied from the previous steps into the schema property field of this API call
-
In the Request body field, input a SQL query in place of "string", using the table name from the previous step: ie
select * from {{ table_name }} limit 5
, and click Execute -
Note the response body, it should include a 200 code and the Response body should include data from the table.
If the correct data exists in the final response body, then this test has succeeded.