Trino
Data Fabric leverages Trino as a distributed query engine to access data within our Delta Lake, Postgres, Kafka, and Hive.
Trino is required to be deployed with HTTPS enabled as well as JWT as the authentication type. --- # Connecting to Trino via API 1. From the Data Fabric UI, navigate to the Data Fabric OpenAPI spec and select the Lakehouse link. 2. Once on the Swagger page, click on Authorize 3. You have an option of providing a username and password or JWT for authentication. Authentication will be handled by Keycloak and authorization will be handled by OPA policies. 4. Execute any of the APIs listed available for Lakehouse. --- # Connecting to Trino via CLI. 1. Requirements to connect via CLI is that you should be an admin and should have updated the /etc/hosts for trino.localhost or any DNS used other than localhost to correctly map the subdomain to the correct local or VM IP address. 2. SSH onto Trino Coordinator. 3. Example connecting from CLI
trino --server trino.localhost:443 --user=admin --insecure --access-token=<token> --catalog tpch
-
CLI connection template
trino --server trino.<DNS>:443 --user=<username> --insecure --access-token=<token> --catalog <catalog name>
Connecting to Trino via JDBC
Example using Python sqlalchemy. User will need to update DNS and catalog. 1. Pre-requirements: user must have JWT and OPA policy access. 2. Launch python 3 in a terminal. 3. Execute the following. 4. Connection example
import sqlalchemy as db
from sqlalchemy.sql.expression import select, text
from trino.auth import JWTAuthentication
engine = db.create_engine('trino://admin@trino.localhost:443/tpch',connect_args={
"auth": JWTAuthentication("<token>"),
"http_scheme":"https",
"verify": False})
connection = engine.connect()
rows = connection.execute(text('select * from sf1.region')).fetchall()
print(rows)
-
Connection template
import sqlalchemy as db
from sqlalchemy.sql.expression import select, text
from trino.auth import JWTAuthentication
engine = db.create_engine('trino://<user>@trino.<domain>:443/<catalog name>',connect_args={
"auth": JWTAuthentication("<token>"),
"http_scheme":"https",
"verify": False})
connection = engine.connect()
rows = connection.execute(text('<query>')).fetchall()
print(rows)