Pinot End to End Test Steps:

Setup (Pre-demo)

Update the df-raft-trino-coordinator configMap to enable group-based access control:

apiVersion: v1
data:
  access-control.properties: |
    fine-grained-access-control-enabled=true
    access-control.name=OPA

Demo (Part 1) – Viewing Pinot Data with Proper Access Control

Create an enablement for mock UDL.
Enable the track data set once it is available.
Next, configure a local terminal to connect to Data Fabric:

# Set environment variables for password and client secret
DF_ADMIN_PASSWORD="admin-user-pw"
DF_BACKEND_CLIENT_SECRET=$(kubectl get secret -n data-fabric keycloak-realm-init \
                          -o jsonpath='{.data.keycloakAdminClientSecret}' | base64 -d )
BASE_URL="localhost"
# Get access token from Keycloak and save to variable
AUTH_TOKEN=$(curl --request POST \
  --url http://$BASE_URL/auth/realms/data-fabric/protocol/openid-connect/token \
  --header 'Content-Type: application/x-www-form-urlencoded' \
  --data username=admin \
  --data password=$DF_ADMIN_PASSWORD \
  --data client_secret=$DF_BACKEND_CLIENT_SECRET \
  --data client_id=df-backend \
  --data scope=openid \
  --data grant_type=password | jq -r '.access_token')

echo "Access token: $AUTH_TOKEN"

Fetch the information necessary to create a Pinot table. It’s best to get the Kafka topic value from the webpage:

# The name of the Pinot schema, NOT the schema in the catalog!
PINOT_SCHEMA=udl
# The name of the Pinot table
NEW_TABLE_NAME=udltrack
RETENTION_DAYS=7
# This command will only fetch the correct kafka topic if there is only a single mock UDL enabled (and nothing else). It's best to get copy this value from the frontend manually.
KAFKA_TOPIC=$(kubectl get datasets -A -o jsonpath='{.items[0].metadata.labels.datafabric\.goraft\.tech/dataset}')
# Get kafka password from secret
KAFKA_PASSWORD=$(kubectl -n data-fabric get secrets/df-kafka-user-internal --template={{.data.password}} | base64 -d)

POST the schema to SDL. This schema should match the shape of the data the Pinot table is supposed to fetch from Kafka.

# Create Pinot schema
curl -X POST http://$BASE_URL/api/internal/pinot/schemas \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $AUTH_TOKEN" \
  -d '{
    "schemaName": "udl",
    "dimensionFieldSpecs": [
      {"name": "origin", "dataType": "STRING"},
      {"name": "dataMode", "dataType": "STRING"},
      {"name": "asset", "dataType": "INT"},
      {"name": "objType", "dataType": "STRING"},
      {"name": "id", "dataType": "STRING"},
      {"name": "createdBy", "dataType": "STRING"},
      {"name": "classificationMarking", "dataType": "STRING"},
      {"name": "objIdent", "dataType": "STRING"},
      {"name": "origNetwork", "dataType": "STRING"},
      {"name": "trkId", "dataType": "STRING"},
      {
        "name": "__security__",
        "dataType": "JSON"
      }
    ],
    "metricFieldSpecs": [
      {"name": "alt", "dataType": "DOUBLE"},
      {"name": "lon", "dataType": "DOUBLE"},
      {"name": "lat", "dataType": "DOUBLE"}
    ],
    "dateTimeFieldSpecs": [
      {
        "name": "time",
        "dataType": "LONG",
        "notNull": false,
        "format": "1:MILLISECONDS:EPOCH",
        "granularity": "1:MILLISECONDS"
      }
    ]
  }'

POST the table configuration to SDL. The schema referenced should be the one that was saved on the prior step.

curl -X POST http://$BASE_URL/api/internal/pinot/tables \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $AUTH_TOKEN" \
  -d "{
  \"tableName\": \"$NEW_TABLE_NAME\",
  \"tableType\": \"REALTIME\",
  \"segmentsConfig\": {
    \"timeColumnName\": \"time\",
    \"timeType\": \"MILLISECONDS\",
    \"retentionTimeUnit\": \"DAYS\",
    \"retentionTimeValue\": $RETENTION_DAYS,
    \"replication\": 1,
    \"replicasPerPartition\": 1,
    \"schemaName\": \"$PINOT_SCHEMA\"
  },
  \"tableIndexConfig\": {
    \"loadMode\": \"MMAP\",
    \"streamConfigs\": {
      \"streamType\": \"kafka\",
      \"stream.kafka.consumer.type\": \"lowLevel\",
      \"stream.kafka.topic.name\": \"$KAFKA_TOPIC\",
      \"stream.kafka.decoder.class.name\": \"org.apache.pinot.plugin.stream.kafka.KafkaJSONMessageDecoder\",
      \"stream.kafka.consumer.factory.class.name\": \"org.apache.pinot.plugin.stream.kafka20.KafkaConsumerFactory\",
      \"stream.kafka.broker.list\": \"df-kafka-bootstrap:9092\",
      \"key.serializer\": \"org.apache.kafka.common.serialization.StringDeserializer\",
      \"value.serializer\": \"org.apache.kafka.common.serialization.StringDeserializer\",
      \"stream.kafka.consumer.prop.auto.offset.reset\": \"smallest\",
      \"security.protocol\": \"SASL_PLAINTEXT\",
      \"sasl.jaas.config\": \"org.apache.kafka.common.security.scram.ScramLoginModule required username=\\\"internalkafkauser\\\" password=\\\"$KAFKA_PASSWORD\\\";\",
      \"sasl.mechanism\": \"SCRAM-SHA-512\",
      \"stream.kafka.decoder.prop.format\": \"JSON\",
      \"stream.kafka.decoder.prop.schema.registry.rest.url\": \"http://df-schema-registry:8081\"
    }
  },
  \"ingestionConfig\": {
    \"transformConfigs\": [
      {
        \"columnName\": \"time\",
        \"transformFunction\": \"now()\"
      }
    ]
  },
  \"tenants\": {
    \"broker\": \"DefaultTenant\",
    \"server\": \"DefaultTenant\"
  },
  \"metadata\": {
    \"customConfigs\": {}
  }
}"

With df-backend versions older than 1.15.165, you need to update the catalog data set storages manually. This will trigger the authorization service to create permissions to the data set in Pinot based on the data set enablement’s groups:

# Get information about enabled data sets.
RESPONSE=$(curl "http://localhost/api/v2/catalog/datasources/all/enablements/all/datasets" \
        -H 'Content-Type: application/json' \
        -H "Authorization: Bearer $AUTH_TOKEN")

# Parse the id of the storage for the enablement. NOTE: This assumes there's ONE enablement only!
STORAGE_ID=$(echo $RESPONSE | jq -r '.[] | select(.name == "track") | .storage[] | select(.name == "default") | .id')
echo $STORAGE_ID

# Parse the "self" link that makes up /datasources/{datasource-id}/enablements/{enablement-id}/datasets/{dataset-id}
PREFIX_URL=$(echo $RESPONSE | jq -r '.[0].links[] | select(.rel == "self") | .href')
echo $PREFIX_URL

STORAGE_URL="${PREFIX_URL}/storages/${STORAGE_ID}"
echo $STORAGE_URL

curl -vvv "http://localhost/api/v2/catalog/$STORAGE_URL" -H 'Content-Type: application/json' -H "Authorization: Bearer $AUTH_TOKEN" | jq

Sign into Superset, go to the SQL lab, and try to view data from the Pinot catalog. There should be a default schema and a udltrack table available. Query the data and confirm the results are visible to the user.
To test that the group-based access control, sign out of Data Fabric and sign back in as a user that is NOT in any of the groups the enablement was created for.
1. Repeat step 8. This time, no schema nor table suggestions should show up in the dropdown.
2. Try to run the query SELECT * FROM default.udltrack and confirm that a Trino error is returned to the user.

Demo (Part 2) – Creating a Dashboard with Pinot Data

Switch back to the Admin user (or whatever user can view the enabled data set).
Create a data set from a SQL query:
1. Run the SELECT * FROM default.udltrack LIMIT 100 query.
2. On the “Save” button, hit the dropdown, and save the results as a data set named “test.”
3. Hit “Save and explore” to get redirected to the chart editing page.
Press “View all charts.”
1. On the left side of the screen, click “Map.”
2. Select the deck.gl Scatterplot and press the “Select” button.
Update the “Longitude & Latitude” fields to point to lon and lat, respectively.
On the “Map” dropdown, play around with the https://map.localhost/geoserver/wms URL until you get the nasa:blue_marble WMS Layer value.
Increase the point size from 1000 to 7000.
Change the point color to bright red.
Click “Update Chart” and wait for the map to render.
Click “Save” on the top-right to save the chart to and existing dashboard, to save it to a new dashboard.