Performance Efficiency — Scoring Methodology¶

This pillar assesses how well your workspace leverages serverless compute, parallel processing, native Spark operations, Photon, and hardware-appropriate configurations to achieve efficient workload performance.

How scores are calculated¶

The WAF Reload job runs the control scoring query below against Databricks system tables and stores results in waf_cache.waf_controls_p. Each control receives a percentage score (0–100). If the score meets or exceeds the threshold, the control is marked Met.

Control Scoring Query¶

The following SQL computes scores for all Performance Efficiency controls. Run by the WAF Reload job and stored in waf_cache.waf_controls_p.

PE-02-05 and PE-02-07

This is the current (fixed) waf_controls_p query. It adds PE-02-05 (Python UDFs check) and updates PE-02-07 to measure cluster policy attachment for interactive clusters — replacing the earlier serverless-only proxy. The old query (waf_controls_p_old) is excluded.

View SQL

WITH serverless_usage AS (

  SELECT 

    COUNT(*) as total_compute,

    SUM(CASE WHEN usage_type LIKE '%SERVERLESS%' OR sku_name LIKE '%SERVERLESS%' THEN 1 ELSE 0 END) as serverless_count

  FROM system.billing.usage

  WHERE usage_date >= current_date() - INTERVAL 30 DAYS

    AND usage_type LIKE '%COMPUTE%'

),

photon_usage AS (

  SELECT 

    COUNT(*) as total_compute,

    SUM(CASE WHEN product_features.is_photon = true THEN 1 ELSE 0 END) as photon_compute

  FROM system.billing.usage

  WHERE usage_date >= current_date() - INTERVAL 30 DAYS

    AND usage_type LIKE '%COMPUTE%'

    AND billing_origin_product IN ('JOBS', 'INTERACTIVE', 'PIPELINES', 'ALL_PURPOSE')

),

cluster_workers AS (

  SELECT 

    COUNT(*) as total_clusters,

    SUM(CASE WHEN worker_count > 1 THEN 1 ELSE 0 END) as clusters_multi_worker,

    SUM(CASE WHEN worker_count > 3 THEN 1 ELSE 0 END) as clusters_large

  FROM (

    SELECT worker_count, delete_time,

           ROW_NUMBER() OVER (PARTITION BY cluster_id ORDER BY change_time DESC) AS rn

    FROM system.compute.clusters

    WHERE change_time >= current_date() - INTERVAL 30 DAYS

  ) WHERE rn = 1 AND delete_time IS NULL

),

python_udfs AS (

  SELECT COUNT(*) as python_udf_count

  FROM system.information_schema.routines

  WHERE external_language = 'Python'

),

cluster_policies AS (

  SELECT
    COUNT(*) as total_clusters,
    SUM(CASE WHEN policy_id IS NOT NULL THEN 1 ELSE 0 END) as policy_clusters

  FROM (

    SELECT policy_id,

           ROW_NUMBER() OVER (PARTITION BY account_id, workspace_id, cluster_id ORDER BY change_time DESC) AS rn

    FROM system.compute.clusters

    WHERE change_time >= current_date() - INTERVAL 30 DAYS

      AND cluster_source IN ('API', 'UI')

  ) WHERE rn = 1

),

waf_status AS (

  SELECT
    waf_id,
    principle,
    best_practice,
    CASE 
    WHEN waf_id = 'PE-01-01' THEN (
      SELECT CASE WHEN total_compute > 0 THEN (serverless_count * 100.0 / total_compute) ELSE 0 END FROM serverless_usage
    )
    WHEN waf_id = 'PE-01-02' THEN (
      CASE WHEN EXISTS (SELECT 1 FROM system.billing.usage WHERE sku_name LIKE '%SERVERLESS_REAL_TIME_INFERENCE%' LIMIT 1) THEN 100 ELSE 0 END
    )
    WHEN waf_id = 'PE-02-02' THEN (
      SELECT CASE WHEN total_clusters > 0 THEN (clusters_multi_worker * 100.0 / total_clusters) ELSE 0 END FROM cluster_workers
    )
    WHEN waf_id = 'PE-02-04' THEN (
      SELECT CASE WHEN total_clusters > 0 THEN (clusters_large * 100.0 / total_clusters) ELSE 0 END FROM cluster_workers
    )
    WHEN waf_id = 'PE-02-05' THEN (
      SELECT CASE WHEN python_udf_count = 0 THEN 100 ELSE 0 END FROM python_udfs
    )
    WHEN waf_id = 'PE-02-06' THEN (
      SELECT CASE WHEN total_compute > 0 THEN (photon_compute * 100.0 / total_compute) ELSE 0 END FROM photon_usage
    )
    WHEN waf_id = 'PE-02-07' THEN (
      SELECT CASE WHEN total_clusters > 0 THEN (policy_clusters * 100.0 / total_clusters) ELSE 0 END FROM cluster_policies
    )
    ELSE 0
    END AS current_percentage,

    CASE 
    WHEN waf_id = 'PE-01-01' AND (
      SELECT CASE WHEN total_compute > 0 THEN (serverless_count * 100.0 / total_compute) ELSE 0 END FROM serverless_usage
    ) >= 50 THEN 'Yes'
    WHEN waf_id = 'PE-01-02' AND EXISTS (
      SELECT 1 FROM system.billing.usage WHERE sku_name LIKE '%SERVERLESS_REAL_TIME_INFERENCE%' LIMIT 1
    ) THEN 'Yes'
    WHEN waf_id = 'PE-02-02' AND (
      SELECT CASE WHEN total_clusters > 0 THEN (clusters_multi_worker * 100.0 / total_clusters) ELSE 0 END FROM cluster_workers
    ) >= 80 THEN 'Yes'
    WHEN waf_id = 'PE-02-04' AND (
      SELECT CASE WHEN total_clusters > 0 THEN (clusters_large * 100.0 / total_clusters) ELSE 0 END FROM cluster_workers
    ) >= 50 THEN 'Yes'
    WHEN waf_id = 'PE-02-05' AND (
      SELECT python_udf_count FROM python_udfs
    ) = 0 THEN 'Yes'
    WHEN waf_id = 'PE-02-06' AND (
      SELECT CASE WHEN total_compute > 0 THEN (photon_compute * 100.0 / total_compute) ELSE 0 END FROM photon_usage
    ) >= 80 THEN 'Yes'
    WHEN waf_id = 'PE-02-07' AND (
      SELECT CASE WHEN total_clusters > 0 THEN (policy_clusters * 100.0 / total_clusters) ELSE 0 END FROM cluster_policies
    ) >= 50 THEN 'Yes'
    ELSE 'No'
    END AS implemented

  FROM (
    SELECT * FROM VALUES
    ('PE-01-01', 'Utilize serverless capabilities', 'Use serverless architecture'),
    ('PE-01-02', 'Utilize serverless capabilities', 'Use an enterprise grade model serving service'),
    ('PE-02-02', 'Design workloads for performance', 'Use parallel computation where it is beneficial'),
    ('PE-02-04', 'Design workloads for performance', 'Prefer larger clusters'),
    ('PE-02-05', 'Design workloads for performance', 'Use native Spark operations'),
    ('PE-02-06', 'Design workloads for performance', 'Use native platform engines'),
    ('PE-02-07', 'Design workloads for performance', 'Understand your hardware and workload type')
    AS waf(waf_id, principle, best_practice)
  )

)

SELECT
  waf_id,
  principle,
  best_practice,
  ROUND(current_percentage, 1) as score_percentage,
  CASE 
  WHEN waf_id = 'PE-01-01' THEN 50
  WHEN waf_id = 'PE-01-02' THEN 100
  WHEN waf_id = 'PE-02-02' THEN 80
  WHEN waf_id = 'PE-02-04' THEN 50
  WHEN waf_id = 'PE-02-05' THEN 100
  WHEN waf_id = 'PE-02-06' THEN 80
  WHEN waf_id = 'PE-02-07' THEN 50
  END as threshold_percentage,
  CASE 
  WHEN implemented = 'Yes' THEN 'Met'
  ELSE 'Not Met'
  END as threshold_met,
  implemented

FROM waf_status

ORDER BY principle, waf_id;

Controls¶

PE-01-01 — Use serverless architecture¶

Field	Value
Principle	Utilize serverless capabilities
Threshold	50%
waf_cache table	`waf_controls_p`

What it measures

% of compute usage running on serverless SKUs (from system.billing.usage where sku_name or usage_type contains SERVERLESS).

Recommendation if Not Met

Drive more workloads onto serverless compute for faster startup and better efficiency.

Serverless SQL warehouses: https://docs.databricks.com/aws/en/admin/sql/serverless
Serverless pipelines: https://docs.databricks.com/aws/en/ldp/serverless
Start with interactive BI + scheduled queries; then expand to eligible ETL.

Detail Query

View SQL

WITH CTE AS (
  SELECT
    COUNT(*) AS runs,
    SUM(usage_quantity) AS dbu_usage,
    CASE WHEN billing_origin_product = 'ALL_PURPOSE' THEN 'INTERACTIVE'
         ELSE billing_origin_product END AS billing_origin_product,
    CASE WHEN sku_name LIKE '%SERVERLESS%' OR product_features.is_serverless = true
         THEN true ELSE false END AS is_serverless
  FROM system.billing.usage 
  WHERE usage_date BETWEEN current_date() - :rollback_days AND current_date()
    AND usage_unit = 'DBU'
    AND billing_origin_product IN ('JOBS','MODEL_SERVING','LAKEFLOW_CONNECT','SQL','INTERACTIVE','DLT','ALL_PURPOSE')
  GROUP BY 2, 4
)
SELECT 
    billing_origin_product,
    SUM(runs) AS runs_total,
    SUM(dbu_usage) AS dbu_usage_total,
    SUM(CASE WHEN is_serverless = true THEN dbu_usage ELSE 0 END) AS sum_serverless_dbu,
    SUM(CASE WHEN is_serverless = true THEN dbu_usage ELSE 0 END)/SUM(dbu_usage) AS pct_serverless_dbu
FROM CTE
GROUP BY billing_origin_product 
ORDER BY runs_total DESC;

PE-01-02 — Use an enterprise grade model serving service¶

Field	Value
Principle	Utilize serverless capabilities
Threshold	100%
waf_cache table	`waf_controls_p`

What it measures

Enterprise-grade real-time inference usage present (checks sku_name contains SERVERLESS_REAL_TIME_INFERENCE). Score is 100 if any such usage exists, 0 otherwise.

Recommendation if Not Met

Adopt Mosaic AI Model Serving (including serverless real-time inference SKUs) for enterprise inference.

Model Serving overview: https://docs.databricks.com/aws/en/machine-learning/model-serving/
Use route-optimized endpoints and autoscaling to meet latency/SLA targets.
Monitor serving costs via system tables: https://docs.databricks.com/aws/en/admin/system-tables/model-serving-cost

Detail Query

View SQL

select usage_metadata.endpoint_name endpoint_name,
  billing_origin_product,
  sum(usage_quantity) usage_dbus
from system.billing.usage
WHERE sku_name LIKE '%SERVERLESS_REAL_TIME_INFERENCE%'
  AND usage_date >= current_timestamp() - interval 7 days
  AND usage_metadata.endpoint_name is not null
group by endpoint_name, billing_origin_product
order by usage_dbus desc

PE-02-02 — Use parallel computation where it is beneficial¶

Field	Value
Principle	Design workloads for performance
Threshold	80%
waf_cache table	`waf_controls_p`

What it measures

% of clusters with more than 1 worker (parallel compute), from system.compute.clusters.

Recommendation if Not Met

Refactor workloads to exploit parallelism: partitioning, parallel reads/writes, and adequate worker counts.

Use compute autoscaling where appropriate: https://docs.databricks.com/aws/en/compute/configure
Avoid single-node clusters for distributed ETL unless it's intentionally small.
Benchmark and right-size using job metrics and query history.

Detail Query

View SQL

-- waf_PE-02-*_cluster_metrics
WITH usage AS (    
  SELECT usage_metadata.cluster_id AS cluster_id, account_id, workspace_id,
    count(*) as runs, SUM(usage_quantity) AS dbu_usage
  FROM system.billing.usage 
  WHERE usage_date BETWEEN current_date()-30 AND current_date()
    AND usage_metadata.cluster_id IS NOT NULL
  GROUP BY account_id, workspace_id, usage_metadata.cluster_id
),
compute_met AS (
  SELECT * FROM (
    select row_number() over(partition by account_id, workspace_id, cluster_id order by change_time desc) AS rn,
      account_id, workspace_id, c.cluster_id, c.cluster_name, c.worker_node_type,
      worker_count, max_autoscale_workers, min_autoscale_workers
    from system.compute.clusters c
  )
  WHERE rn = 1
)
SELECT *, row_number() over(order by dbu_usage desc) AS rank
FROM (
  SELECT SUM(u.dbu_usage) AS dbu_usage, SUM(u.runs) AS runs, c.cluster_id, c.cluster_name, c.worker_node_type,
    CASE WHEN ifnull(worker_count,ifnull(max_autoscale_workers,0)) > 1 THEN 'Multi-Node' ELSE 'Single-Node' END AS is_multi_worker,
    ifnull(worker_count,ifnull(max_autoscale_workers,0)) AS max_worker_count,
    CASE WHEN ifnull(c.min_autoscale_workers,0) = ifnull(c.max_autoscale_workers,0) THEN 0 ELSE 1 END AS is_autoscaling
  FROM usage u
  INNER JOIN compute_met c ON u.cluster_id = c.cluster_id AND u.account_id = c.account_id AND u.workspace_id = c.workspace_id
  GROUP BY c.cluster_id, c.cluster_name, c.worker_node_type,
    CASE WHEN ifnull(worker_count,ifnull(max_autoscale_workers,0)) > 1 THEN 'Multi-Node' ELSE 'Single-Node' END,
    ifnull(worker_count,ifnull(max_autoscale_workers,0)),
    CASE WHEN ifnull(c.min_autoscale_workers,0) = ifnull(c.max_autoscale_workers,0) THEN 0 ELSE 1 END
)

PE-02-04 — Prefer larger clusters¶

Field	Value
Principle	Design workloads for performance
Threshold	50%
waf_cache table	`waf_controls_p`

What it measures

% of clusters considered 'large' by worker count thresholds in query (worker_count > 3).

Recommendation if Not Met

Right-size compute for heavy workloads (scale up/out) and avoid chronic under-provisioning.

Compute configuration (workers, autoscaling): https://docs.databricks.com/aws/en/compute/configure
Use workload-specific policy families (ETL-large, BI-large) and enforce minimums.
Track saturation symptoms (spill, long GC, queueing) and iterate.

PE-02-05 — Use native Spark operations¶

Field	Value
Principle	Design workloads for performance
Threshold	100%
waf_cache table	`waf_controls_p`

What it measures

Score = 100 if no Python UDFs found in system.information_schema.routines; 0 if any Python UDFs exist.

Why Python UDFs hurt performance

Python UDFs introduce serialization overhead between the JVM and Python runtime, and prevent the Catalyst optimizer from pushing predicates through UDF boundaries. Native Spark SQL functions and pandas UDFs (vectorized) avoid these costs.

Recommendation if Not Met

Migrate Python UDFs to native Spark SQL functions or Spark DataFrame operations.

Use Spark SQL built-in functions wherever possible: https://spark.apache.org/docs/latest/api/sql/
Where UDFs are necessary, prefer pandas UDFs (vectorized) over row-based Python UDFs.
Review existing Python UDFs with: SELECT routine_catalog, routine_schema, routine_name FROM system.information_schema.routines WHERE external_language = 'Python'

Detail Query

View SQL

-- waf_PE-02-05_python_udfs
SELECT COUNT(*) as count, routine_catalog as catalog_name,
  routine_schema AS schema_name,
  concat(routine_catalog,'.',routine_schema) AS full_schema
FROM system.information_schema.routines
WHERE external_language = 'Python' 
GROUP BY routine_catalog, routine_schema 
ORDER BY count DESC;

PE-02-06 — Use native platform engines¶

Field	Value
Principle	Design workloads for performance
Threshold	80%
waf_cache table	`waf_controls_p`

What it measures

% of compute usage with Photon enabled (product_features.is_photon = true or sku_name LIKE '%PHOTON%').

Recommendation if Not Met

Enable Photon broadly for eligible workloads to improve performance efficiency.

Photon overview and enablement: https://docs.databricks.com/aws/en/compute/photon
Validate with representative ETL/SQL jobs; monitor runtime/DBU changes.

Detail Query

View SQL

-- waf_PE-02-06_photon_workloads
WITH CTE AS (
  SELECT
    COUNT(*) AS runs,
    SUM(usage_quantity) AS dbu_usage,
    billing_origin_product,
    CASE WHEN sku_name LIKE '%PHOTON%' OR product_features.is_photon = true
         THEN true ELSE false END AS is_photon
  FROM system.billing.usage 
  WHERE usage_date BETWEEN current_date()-30 AND current_date()
    AND billing_origin_product IN ('JOBS','LAKEFLOW_CONNECT','VECTOR_SEARCH','DATABASE','DLT','ALL_PURPOSE','ONLINE_TABLES','INTERACTIVE')
    AND usage_unit = 'DBU'
  GROUP BY billing_origin_product, 4
)
SELECT 
    billing_origin_product,
    SUM(runs) AS runs_total,
    SUM(dbu_usage) AS dbu_usage_total,
    SUM(CASE WHEN is_photon = true THEN dbu_usage ELSE 0 END) AS sum_photon_dbu,
    SUM(CASE WHEN is_photon = true THEN dbu_usage ELSE 0 END)/SUM(dbu_usage) AS pct_photon_dbu
FROM CTE
GROUP BY billing_origin_product 
ORDER BY runs_total DESC;

PE-02-07 — Understand your hardware and workload type¶

Field	Value
Principle	Design workloads for performance
Threshold	50%
waf_cache table	`waf_controls_p`

What it measures

% of active interactive clusters (cluster_source IN ('API', 'UI')) that have a cluster policy attached (policy_id IS NOT NULL).

PE-02-07 fix

An earlier version of this control used serverless adoption as a proxy for hardware awareness. The current query directly measures policy attachment on interactive clusters, which more accurately reflects whether teams are being guided to appropriate hardware families.

Recommendation if Not Met

Attach cluster policies to interactive clusters to enforce hardware selection and workload guidelines.

Cluster policy overview: https://docs.databricks.com/aws/en/compute/clusters-manage#cluster-policy
Create policies per workload type (ETL, BI, ML) with appropriate instance families and sizes.
Policies help ensure appropriate hardware is selected for each workload type.