Column Masking Strategies
with ABAC
Master six column masking techniques using Databricks ABAC โ from simple PII redaction to VARIANT and STRUCT masking, SHA2 pseudonymization, and production-grade PCI-DSS and HIPAA PHI masking patterns. All policies are tag-driven and schema-scoped.
How Column Masking Works
A column mask policy attaches a mask function to columns matching a tag condition. When a non-exempt user queries the table, the function is called with the column value and its return replaces the real value โ transparently, at query time.
Critical: Return Type Must Match Column Type
The mask function must return the same data type as the column.
A mask on a DATE column must return DATE.
A mask on a DOUBLE column must return DOUBLE.
Returning STRING from a DATE column causes a cast error at query time.
EXCEPT only works in CREATE POLICY
ALTER TABLE ... SET MASK function EXCEPT group is not valid syntax.
Use CREATE COLUMN MASK POLICY ... EXCEPT group for admin bypass.
Prerequisites
Governed Tags
- abac_tut_pii (ssn, email, name, phone, address)
- abac_tut_sensitivity (internal, confidential)
- pci_data (credit_card, ssn, income)
- phi_level (name, dob, id, contact, diagnosis)
Account Groups
- abac_tut_admins
- abac_tut_hr_team
- abac_tut_finance_team
6-Step Core Tutorial
Basic String Redaction
The simplest mask โ replace any PII-tagged column value with a fixed string.
The policy uses has_tag() to catch all PII columns in one shot.
CREATE OR REPLACE FUNCTION {catalog}.{schema}.redact(val STRING) RETURNS STRING RETURN '***REDACTED***' CREATE OR REPLACE POLICY redact_pii ON SCHEMA {catalog}.{schema} COLUMN MASK {catalog}.{schema}.redact TO `account users` EXCEPT abac_tut_admins FOR TABLES MATCH COLUMNS has_tag('abac_tut_pii') AS m ON COLUMN m
Admin sees
Non-admin sees
Partial Reveal โ SSN Last 4
Show only the last 4 digits of SSN. Useful when users need to verify identity
without exposing the full value. Drop redact_pii first to avoid conflict.
CREATE OR REPLACE FUNCTION {catalog}.{schema}.mask_ssn(val STRING) RETURNS STRING RETURN CONCAT('***-**-', RIGHT(val, 4))
Before
After
SHA2 Pseudonymization
SHA2 hashing produces a consistent masked value per input โ the same email always hashes to the same token. This preserves referential integrity for analytics while preventing reverse-lookup of real values.
CREATE OR REPLACE FUNCTION {catalog}.{schema}.pseudonymize(val STRING) RETURNS STRING RETURN SHA2(val, 256)
Before
After
VARIANT Column Masking
VARIANT columns can't be matched by tag in ALTER TABLE SET MASK.
The solution: tag the column, then use a schema-level policy with has_tag_value().
-- 1. Tag the column ALTER TABLE {catalog}.{schema}.employees ALTER COLUMN metadata SET TAGS ('abac_tut_sensitivity' = 'internal') -- 2. Simple mask function โ returns STRING VARIANT for any input type CREATE OR REPLACE FUNCTION {catalog}.{schema}.flexible_mask(val VARIANT) RETURNS VARIANT RETURN parse_json('"***"') -- 3. Schema policy using the tag CREATE OR REPLACE POLICY flexible_mask_metadata ON SCHEMA {catalog}.{schema} COLUMN MASK {catalog}.{schema}.flexible_mask TO `account users` EXCEPT abac_tut_admins FOR TABLES MATCH COLUMNS has_tag_value('abac_tut_sensitivity', 'internal') AS m ON COLUMN m
STRUCT Column Masking
STRUCT columns can be masked at the field level while preserving the structure.
Use ALTER TABLE ... SET MASK directly โ no tag needed for a single known column.
CREATE OR REPLACE FUNCTION {catalog}.{schema}.mask_contact_info( val STRUCT<primary_email:STRING, secondary_email:STRING, phone:STRING> ) RETURNS STRUCT<primary_email:STRING, secondary_email:STRING, phone:STRING> RETURN named_struct( 'primary_email', '***REDACTED***', 'secondary_email', '***REDACTED***', 'phone', CONCAT('***-', RIGHT(val.phone, 4)) ) ALTER TABLE {catalog}.{schema}.employees ALTER COLUMN contact_info SET MASK {catalog}.{schema}.mask_contact_info
Combining Multiple Policies
Different mask strategies for different PII types โ all active simultaneously. Each policy targets a specific tag value, so they apply to different columns with no conflict.
-- SSN: last-4 reveal (abac_tut_pii = 'ssn') -- Email: SHA2 hash (abac_tut_pii = 'email') -- Name/phone/address: full redact (OR condition) CREATE OR REPLACE POLICY redact_other_pii ON SCHEMA {catalog}.{schema} COLUMN MASK {catalog}.{schema}.redact TO `account users` EXCEPT abac_tut_admins FOR TABLES MATCH COLUMNS (has_tag_value('abac_tut_pii', 'name') OR has_tag_value('abac_tut_pii', 'phone') OR has_tag_value('abac_tut_pii', 'address')) AS m ON COLUMN m
Financial Services โ PCI-DSS
Payment Card Industry Data Security Standard compliant column masking
The PCI-DSS section creates a credit_accounts table with sensitive financial columns
tagged using the pci_data governed tag.
Credit card tokenization
CREATE OR REPLACE FUNCTION {catalog}.{schema}.mask_credit_card(val STRING) RETURNS STRING RETURN CONCAT('****-****-****-', RIGHT(val, 4))
Expected output (non-admin user)
Healthcare โ HIPAA PHI
45 CFR ยง 164.514(b) Safe Harbor de-identification for patient records
HIPAA's Safe Harbor method requires removing 18 categories of PHI.
The patient_records table is tagged with phi_level values, and four policies
apply different masking strategies per PHI category.
| Column | Tag | Mask Applied | Example Output |
|---|---|---|---|
patient_name | phi_level = name | First initial + **** | A**** |
dob | phi_level = dob | Sentinel date | 1900-01-01 |
ssn | phi_level = id | SHA2 hash prefix | ID-01a54629efb9... |
insurance_id | phi_level = id | SHA2 hash prefix | ID-4ac41bb0e86d... |
address | phi_level = contact | Full redaction | ***PHI-REDACTED*** |
diagnosis | phi_level = diagnosis | Full redaction | ***PHI-REDACTED*** |
Expected results by role
Key Learnings
- Mask function return type must match the column type โ returning
STRINGfrom aDATEorDOUBLEcolumn throws a cast error. EXCEPTonly works inCREATE POLICYโ not inALTER TABLE ... SET MASK.MATCH COLUMNSrequireshas_tag()orhas_tag_value()โ column name comparison (name = 'col') is not allowed.- Drop previous step's policies before the next step to avoid multiple-masks-on-same-column errors.
- For VARIANT columns: tag the column, use a schema-level policy with
has_tag_value(). - For STRUCT columns: use
ALTER TABLE ... SET MASKdirectly (note: no adminEXCEPTsupport here).