Column Masking Strategies
with ABAC

Master six column masking techniques using Databricks ABAC โ€” from simple PII redaction to VARIANT and STRUCT masking, SHA2 pseudonymization, and production-grade PCI-DSS and HIPAA PHI masking patterns. All policies are tag-driven and schema-scoped.

How Column Masking Works

A column mask policy attaches a mask function to columns matching a tag condition. When a non-exempt user queries the table, the function is called with the column value and its return replaces the real value โ€” transparently, at query time.

๐Ÿšจ

Critical: Return Type Must Match Column Type

The mask function must return the same data type as the column. A mask on a DATE column must return DATE. A mask on a DOUBLE column must return DOUBLE. Returning STRING from a DATE column causes a cast error at query time.

โš ๏ธ

EXCEPT only works in CREATE POLICY

ALTER TABLE ... SET MASK function EXCEPT group is not valid syntax. Use CREATE COLUMN MASK POLICY ... EXCEPT group for admin bypass.

Prerequisites

Governed Tags

  • abac_tut_pii (ssn, email, name, phone, address)
  • abac_tut_sensitivity (internal, confidential)
  • pci_data (credit_card, ssn, income)
  • phi_level (name, dob, id, contact, diagnosis)

Account Groups

  • abac_tut_admins
  • abac_tut_hr_team
  • abac_tut_finance_team

6-Step Core Tutorial

1

Basic String Redaction

The simplest mask โ€” replace any PII-tagged column value with a fixed string. The policy uses has_tag() to catch all PII columns in one shot.

SQL
CREATE OR REPLACE FUNCTION {catalog}.{schema}.redact(val STRING)
RETURNS STRING
RETURN '***REDACTED***'

CREATE OR REPLACE POLICY redact_pii
ON SCHEMA {catalog}.{schema}
COLUMN MASK {catalog}.{schema}.redact
TO `account users`
EXCEPT abac_tut_admins
FOR TABLES
MATCH COLUMNS has_tag('abac_tut_pii') AS m
ON COLUMN m

Admin sees

alice@example.com
123-45-6789

Non-admin sees

***REDACTED***
***REDACTED***
2

Partial Reveal โ€” SSN Last 4

Show only the last 4 digits of SSN. Useful when users need to verify identity without exposing the full value. Drop redact_pii first to avoid conflict.

SQL
CREATE OR REPLACE FUNCTION {catalog}.{schema}.mask_ssn(val STRING)
RETURNS STRING
RETURN CONCAT('***-**-', RIGHT(val, 4))

Before

123-45-6789

After

***-**-6789
3

SHA2 Pseudonymization

SHA2 hashing produces a consistent masked value per input โ€” the same email always hashes to the same token. This preserves referential integrity for analytics while preventing reverse-lookup of real values.

SQL
CREATE OR REPLACE FUNCTION {catalog}.{schema}.pseudonymize(val STRING)
RETURNS STRING
RETURN SHA2(val, 256)

Before

alice@example.com

After

a948904f2f0f479b...
4

VARIANT Column Masking

VARIANT columns can't be matched by tag in ALTER TABLE SET MASK. The solution: tag the column, then use a schema-level policy with has_tag_value().

SQL
-- 1. Tag the column
ALTER TABLE {catalog}.{schema}.employees
  ALTER COLUMN metadata SET TAGS ('abac_tut_sensitivity' = 'internal')

-- 2. Simple mask function โ€” returns STRING VARIANT for any input type
CREATE OR REPLACE FUNCTION {catalog}.{schema}.flexible_mask(val VARIANT)
RETURNS VARIANT
RETURN parse_json('"***"')

-- 3. Schema policy using the tag
CREATE OR REPLACE POLICY flexible_mask_metadata
ON SCHEMA {catalog}.{schema}
COLUMN MASK {catalog}.{schema}.flexible_mask
TO `account users`
EXCEPT abac_tut_admins
FOR TABLES
MATCH COLUMNS has_tag_value('abac_tut_sensitivity', 'internal') AS m
ON COLUMN m
5

STRUCT Column Masking

STRUCT columns can be masked at the field level while preserving the structure. Use ALTER TABLE ... SET MASK directly โ€” no tag needed for a single known column.

SQL
CREATE OR REPLACE FUNCTION {catalog}.{schema}.mask_contact_info(
  val STRUCT<primary_email:STRING, secondary_email:STRING, phone:STRING>
)
RETURNS STRUCT<primary_email:STRING, secondary_email:STRING, phone:STRING>
RETURN named_struct(
  'primary_email',   '***REDACTED***',
  'secondary_email',  '***REDACTED***',
  'phone',            CONCAT('***-', RIGHT(val.phone, 4))
)

ALTER TABLE {catalog}.{schema}.employees
  ALTER COLUMN contact_info SET MASK {catalog}.{schema}.mask_contact_info
6

Combining Multiple Policies

Different mask strategies for different PII types โ€” all active simultaneously. Each policy targets a specific tag value, so they apply to different columns with no conflict.

SQL
-- SSN: last-4 reveal  (abac_tut_pii = 'ssn')
-- Email: SHA2 hash    (abac_tut_pii = 'email')
-- Name/phone/address: full redact (OR condition)
CREATE OR REPLACE POLICY redact_other_pii
ON SCHEMA {catalog}.{schema}
COLUMN MASK {catalog}.{schema}.redact
TO `account users`
EXCEPT abac_tut_admins
FOR TABLES
MATCH COLUMNS (has_tag_value('abac_tut_pii', 'name')
  OR has_tag_value('abac_tut_pii', 'phone')
  OR has_tag_value('abac_tut_pii', 'address')) AS m
ON COLUMN m

๐Ÿ’ณ

Financial Services โ€” PCI-DSS

Payment Card Industry Data Security Standard compliant column masking

The PCI-DSS section creates a credit_accounts table with sensitive financial columns tagged using the pci_data governed tag.

Credit card tokenization

SQL
CREATE OR REPLACE FUNCTION {catalog}.{schema}.mask_credit_card(val STRING)
RETURNS STRING
RETURN CONCAT('****-****-****-', RIGHT(val, 4))

Expected output (non-admin user)

SELECT account_id, customer_name, credit_card_number, ssn, income FROM credit_accounts
+----------+----------------+----------------------+-----------+--------+
|account_id|customer_name |credit_card_number |ssn |income |
+----------+----------------+----------------------+-----------+--------+
|1001 |***REDACTED*** |****-****-****-9010 |XXX-XX-6789|0.0 |
|1002 |***REDACTED*** |****-****-****-4532 |XXX-XX-4321|0.0 |
+----------+----------------+----------------------+-----------+--------+

๐Ÿฅ

Healthcare โ€” HIPAA PHI

45 CFR ยง 164.514(b) Safe Harbor de-identification for patient records

HIPAA's Safe Harbor method requires removing 18 categories of PHI. The patient_records table is tagged with phi_level values, and four policies apply different masking strategies per PHI category.

ColumnTagMask AppliedExample Output
patient_namephi_level = nameFirst initial + ****A****
dobphi_level = dobSentinel date1900-01-01
ssnphi_level = idSHA2 hash prefixID-01a54629efb9...
insurance_idphi_level = idSHA2 hash prefixID-4ac41bb0e86d...
addressphi_level = contactFull redaction***PHI-REDACTED***
diagnosisphi_level = diagnosisFull redaction***PHI-REDACTED***

Expected results by role

SELECT patient_id, patient_name, dob, ssn, diagnosis FROM patient_records
+----------+--------------+----------+------------------+------------------+
|patient_id|patient_name |dob |ssn |diagnosis |
+----------+--------------+----------+------------------+------------------+
-- abac_tut_admins sees:
|1001 |Alice Brown |1985-03-12|123-45-6789 |Type 2 Diabetes |
-- Non-admin sees:
|1001 |A**** |1900-01-01|ID-01a54629efb95 |***PHI-REDACTED***|
+----------+--------------+----------+------------------+------------------+

โœ…

Key Learnings

  • Mask function return type must match the column type โ€” returning STRING from a DATE or DOUBLE column throws a cast error.
  • EXCEPT only works in CREATE POLICY โ€” not in ALTER TABLE ... SET MASK.
  • MATCH COLUMNS requires has_tag() or has_tag_value() โ€” column name comparison (name = 'col') is not allowed.
  • Drop previous step's policies before the next step to avoid multiple-masks-on-same-column errors.
  • For VARIANT columns: tag the column, use a schema-level policy with has_tag_value().
  • For STRUCT columns: use ALTER TABLE ... SET MASK directly (note: no admin EXCEPT support here).