Pinot End to End Test Steps:
Setup (Pre-demo)
-
Update the
df-raft-trino-coordinator
configMap to enable group-based access control:
apiVersion: v1
data:
access-control.properties: |
fine-grained-access-control-enabled=true
access-control.name=OPA
Demo (Part 1) – Viewing Pinot Data with Proper Access Control
-
Create an enablement for mock UDL.
-
Enable the track data set once it is available.
-
Next, configure a local terminal to connect to Data Fabric:
# Set environment variables for password and client secret
DF_ADMIN_PASSWORD="admin-user-pw"
DF_BACKEND_CLIENT_SECRET=$(kubectl get secret -n data-fabric keycloak-realm-init \
-o jsonpath='{.data.keycloakAdminClientSecret}' | base64 -d )
BASE_URL="localhost"
# Get access token from Keycloak and save to variable
AUTH_TOKEN=$(curl --request POST \
--url http://$BASE_URL/auth/realms/data-fabric/protocol/openid-connect/token \
--header 'Content-Type: application/x-www-form-urlencoded' \
--data username=admin \
--data password=$DF_ADMIN_PASSWORD \
--data client_secret=$DF_BACKEND_CLIENT_SECRET \
--data client_id=df-backend \
--data scope=openid \
--data grant_type=password | jq -r '.access_token')
echo "Access token: $AUTH_TOKEN"
-
Fetch the information necessary to create a Pinot table. It’s best to get the Kafka topic value from the webpage:
# The name of the Pinot schema, NOT the schema in the catalog!
PINOT_SCHEMA=udl
# The name of the Pinot table
NEW_TABLE_NAME=udltrack
RETENTION_DAYS=7
# This command will only fetch the correct kafka topic if there is only a single mock UDL enabled (and nothing else). It's best to get copy this value from the frontend manually.
KAFKA_TOPIC=$(kubectl get datasets -A -o jsonpath='{.items[0].metadata.labels.datafabric\.goraft\.tech/dataset}')
# Get kafka password from secret
KAFKA_PASSWORD=$(kubectl -n data-fabric get secrets/df-kafka-user-internal --template={{.data.password}} | base64 -d)
-
POST
the schema to SDL. This schema should match the shape of the data the Pinot table is supposed to fetch from Kafka.
# Create Pinot schema
curl -X POST http://$BASE_URL/api/internal/pinot/schemas \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $AUTH_TOKEN" \
-d '{
"schemaName": "udl",
"dimensionFieldSpecs": [
{"name": "origin", "dataType": "STRING"},
{"name": "dataMode", "dataType": "STRING"},
{"name": "asset", "dataType": "INT"},
{"name": "objType", "dataType": "STRING"},
{"name": "id", "dataType": "STRING"},
{"name": "createdBy", "dataType": "STRING"},
{"name": "classificationMarking", "dataType": "STRING"},
{"name": "objIdent", "dataType": "STRING"},
{"name": "origNetwork", "dataType": "STRING"},
{"name": "trkId", "dataType": "STRING"},
{
"name": "__security__",
"dataType": "JSON"
}
],
"metricFieldSpecs": [
{"name": "alt", "dataType": "DOUBLE"},
{"name": "lon", "dataType": "DOUBLE"},
{"name": "lat", "dataType": "DOUBLE"}
],
"dateTimeFieldSpecs": [
{
"name": "time",
"dataType": "LONG",
"notNull": false,
"format": "1:MILLISECONDS:EPOCH",
"granularity": "1:MILLISECONDS"
}
]
}'
-
POST
the table configuration to SDL. The schema referenced should be the one that was saved on the prior step.
curl -X POST http://$BASE_URL/api/internal/pinot/tables \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $AUTH_TOKEN" \
-d "{
\"tableName\": \"$NEW_TABLE_NAME\",
\"tableType\": \"REALTIME\",
\"segmentsConfig\": {
\"timeColumnName\": \"time\",
\"timeType\": \"MILLISECONDS\",
\"retentionTimeUnit\": \"DAYS\",
\"retentionTimeValue\": $RETENTION_DAYS,
\"replication\": 1,
\"replicasPerPartition\": 1,
\"schemaName\": \"$PINOT_SCHEMA\"
},
\"tableIndexConfig\": {
\"loadMode\": \"MMAP\",
\"streamConfigs\": {
\"streamType\": \"kafka\",
\"stream.kafka.consumer.type\": \"lowLevel\",
\"stream.kafka.topic.name\": \"$KAFKA_TOPIC\",
\"stream.kafka.decoder.class.name\": \"org.apache.pinot.plugin.stream.kafka.KafkaJSONMessageDecoder\",
\"stream.kafka.consumer.factory.class.name\": \"org.apache.pinot.plugin.stream.kafka20.KafkaConsumerFactory\",
\"stream.kafka.broker.list\": \"df-kafka-bootstrap:9092\",
\"key.serializer\": \"org.apache.kafka.common.serialization.StringDeserializer\",
\"value.serializer\": \"org.apache.kafka.common.serialization.StringDeserializer\",
\"stream.kafka.consumer.prop.auto.offset.reset\": \"smallest\",
\"security.protocol\": \"SASL_PLAINTEXT\",
\"sasl.jaas.config\": \"org.apache.kafka.common.security.scram.ScramLoginModule required username=\\\"internalkafkauser\\\" password=\\\"$KAFKA_PASSWORD\\\";\",
\"sasl.mechanism\": \"SCRAM-SHA-512\",
\"stream.kafka.decoder.prop.format\": \"JSON\",
\"stream.kafka.decoder.prop.schema.registry.rest.url\": \"http://df-schema-registry:8081\"
}
},
\"ingestionConfig\": {
\"transformConfigs\": [
{
\"columnName\": \"time\",
\"transformFunction\": \"now()\"
}
]
},
\"tenants\": {
\"broker\": \"DefaultTenant\",
\"server\": \"DefaultTenant\"
},
\"metadata\": {
\"customConfigs\": {}
}
}"
-
With
df-backend
versions older than1.15.165
, you need to update the catalog data set storages manually. This will trigger the authorization service to create permissions to the data set in Pinot based on the data set enablement’s groups:
# Get information about enabled data sets.
RESPONSE=$(curl "http://localhost/api/v2/catalog/datasources/all/enablements/all/datasets" \
-H 'Content-Type: application/json' \
-H "Authorization: Bearer $AUTH_TOKEN")
# Parse the id of the storage for the enablement. NOTE: This assumes there's ONE enablement only!
STORAGE_ID=$(echo $RESPONSE | jq -r '.[] | select(.name == "track") | .storage[] | select(.name == "default") | .id')
echo $STORAGE_ID
# Parse the "self" link that makes up /datasources/{datasource-id}/enablements/{enablement-id}/datasets/{dataset-id}
PREFIX_URL=$(echo $RESPONSE | jq -r '.[0].links[] | select(.rel == "self") | .href')
echo $PREFIX_URL
STORAGE_URL="${PREFIX_URL}/storages/${STORAGE_ID}"
echo $STORAGE_URL
curl -vvv "http://localhost/api/v2/catalog/$STORAGE_URL" -H 'Content-Type: application/json' -H "Authorization: Bearer $AUTH_TOKEN" | jq
-
Sign into Superset, go to the SQL lab, and try to view data from the Pinot catalog. There should be a
default
schema and audltrack
table available. Query the data and confirm the results are visible to the user. -
To test that the group-based access control, sign out of Data Fabric and sign back in as a user that is NOT in any of the groups the enablement was created for.
-
Repeat step 8. This time, no schema nor table suggestions should show up in the dropdown.
-
Try to run the query
SELECT * FROM default.udltrack
and confirm that a Trino error is returned to the user.
-
Demo (Part 2) – Creating a Dashboard with Pinot Data
-
Switch back to the Admin user (or whatever user can view the enabled data set).
-
Create a data set from a SQL query:
-
Run the
SELECT * FROM default.udltrack LIMIT 100
query. -
On the “Save” button, hit the dropdown, and save the results as a data set named “test.”
-
Hit “Save and explore” to get redirected to the chart editing page.
-
-
Press “View all charts.”
-
On the left side of the screen, click “Map.”
-
Select the
deck.gl Scatterplot
and press the “Select” button.
-
-
Update the “Longitude & Latitude” fields to point to
lon
andlat
, respectively. -
On the “Map” dropdown, play around with the
https://map.localhost/geoserver/wms
URL until you get thenasa:blue_marble
WMS Layer value. -
Increase the point size from
1000
to7000
. -
Change the point color to bright red.
-
Click “Update Chart” and wait for the map to render.
-
Click “Save” on the top-right to save the chart to and existing dashboard, to save it to a new dashboard.