Unified Catalog - Bulk Term Import Guide¶
Last Updated: October 10, 2025
Status: ✅ Fully Tested and Validated
Overview¶
This guide covers bulk import functionality for Microsoft Purview Unified Catalog terms. The CLI supports importing multiple terms from CSV or JSON files.
Key Features¶
- ✅ CSV Import: Import terms from comma-separated files
- ✅ JSON Import: Import terms from structured JSON files
- ✅ Dry-Run Mode: Validate data before importing
- ✅ Error Handling: Detailed error messages and validation
- ✅ Progress Tracking: Real-time progress with Rich console output
- ✅ Domain Override: Specify domain ID via CLI flag (JSON only)
Important Notes¶
⚠️ No Native Bulk Endpoint: Microsoft Purview Unified Catalog API does not provide a native /terms/bulk endpoint. This implementation uses sequential POST requests (same approach as glossary term import).
Commands¶
CSV Import¶
pvw uc term import-csv --csv-file <PATH> --domain-id <DOMAIN_ID> [--dry-run]
Parameters:
- --csv-file (required): Path to CSV file
- --domain-id (required): Target domain ID for all terms
- --dry-run (optional): Validate data without creating terms
JSON Import¶
pvw uc term import-json --json-file <PATH> [--domain-id <DOMAIN_ID>] [--dry-run]
Parameters:
- --json-file (required): Path to JSON file
- --domain-id (optional): Override domain_id from JSON file
- --dry-run (optional): Validate data without creating terms
CSV File Format¶
Structure¶
name,description,status,acronym,owner_id,resource_name,resource_url
Customer Acquisition Cost,The cost associated with acquiring a new customer,Draft,CAC,0360aff3-add5-4b7c-b172-52add69b0199,Marketing Metrics Guide,https://docs.example.com/marketing/cac
Monthly Recurring Revenue,Predictable revenue generated each month,Draft,MRR,0360aff3-add5-4b7c-b172-52add69b0199,Finance Dashboard,https://finance.example.com/mrr
Required Fields¶
- name: Term name (string, max 200 characters)
- description: Term description (string)
- status: Either "Draft" or "Approved"
Optional Fields¶
- acronym: Single acronym for the term
- owner_id: Entra ID Object ID (GUID format, e.g.,
0360aff3-add5-4b7c-b172-52add69b0199) - resource_name: Name of related resource
- resource_url: URL of related resource
Sample File¶
See: samples/csv/uc_terms_bulk_example.csv (8 sample terms)
JSON File Format¶
Structure¶
{
"terms": [
{
"name": "Data Lake",
"description": "Centralized repository storing structured and unstructured data at scale",
"domain_id": "59ae27b5-40bc-4c90-abfe-fe1a0638fe3a",
"status": "Draft",
"acronyms": ["DL"],
"owner_ids": ["0360aff3-add5-4b7c-b172-52add69b0199"],
"resources": [
{
"name": "Data Lake Architecture Guide",
"url": "https://architecture.example.com/data-lake"
}
]
},
{
"name": "Data Warehouse",
"description": "Central repository of integrated data from disparate sources",
"domain_id": "59ae27b5-40bc-4c90-abfe-fe1a0638fe3a",
"status": "Draft",
"acronyms": ["DW", "DWH"],
"owner_ids": ["0360aff3-add5-4b7c-b172-52add69b0199"],
"resources": [
{
"name": "Data Warehouse Documentation",
"url": "https://docs.example.com/dwh"
}
]
}
]
}
Required Fields per Term¶
- name: Term name
- description: Term description
- domain_id: Target domain ID (can be overridden with
--domain-idflag)
Optional Fields per Term¶
- status: "Draft" or "Approved" (defaults to "Draft")
- acronyms: Array of acronym strings
- owner_ids: Array of Entra ID Object IDs (GUIDs)
- resources: Array of objects with
nameandurlproperties
Sample Files¶
samples/json/term/uc_terms_bulk_example.json(8 sample terms)samples/json/term/uc_terms_sample.json(8 sample terms - alternate)
Usage Examples¶
Example 1: CSV Import with Dry-Run¶
Validate data before importing:
pvw uc term import-csv \
--csv-file "samples/csv/uc_terms_bulk_example.csv" \
--domain-id "59ae27b5-40bc-4c90-abfe-fe1a0638fe3a" \
--dry-run
Output:
🔍 DRY RUN MODE - No terms will be created
Preview of terms to be imported:
┌────────────────────────────┬──────────────────────────────────────┬────────┬─────────┐
│ Name │ Description │ Status │ Acronym │
├────────────────────────────┼──────────────────────────────────────┼────────┼─────────┤
│ Customer Acquisition Cost │ The cost associated with acquiring… │ Draft │ CAC │
│ Monthly Recurring Revenue │ Predictable revenue generated each… │ Draft │ MRR │
│ ... │ ... │ ... │ ... │
└────────────────────────────┴──────────────────────────────────────┴────────┴─────────┘
Example 2: CSV Import (Actual)¶
Import all terms:
pvw uc term import-csv \
--csv-file "samples/csv/uc_terms_bulk_example.csv" \
--domain-id "59ae27b5-40bc-4c90-abfe-fe1a0638fe3a"
Output:
Found 8 term(s) in CSV file
✅ Created: Customer Acquisition Cost (ID: f85d19ec-8c3c-4c35-a731-d997d0b929cd)
✅ Created: Monthly Recurring Revenue (ID: dc296259-249f-4e1f-ae70-7939241d7b56)
✅ Created: Net Promoter Score (ID: 1fb141c5-28e6-487e-b32d-2fe354a511da)
✅ Created: Key Performance Indicator (ID: 8c1bbd7e-cd09-4369-b667-2d5df94d896a)
✅ Created: Return on Investment (ID: 23845a0e-0cc8-4295-95e0-f21879a73c1e)
✅ Created: Service Level Agreement (ID: 08728338-6489-4f0b-9ea7-fed789053fd1)
✅ Created: Customer Success Score (ID: a667443c-8c03-4b1d-8a24-b9b193a41ff5)
✅ Created: Total Contract Value (ID: 465feb86-9ad0-41c7-afc9-985e6b4ee313)
📊 Import Summary:
Total terms: 8
✅ Successfully created: 8
❌ Failed: 0
Example 3: JSON Import with Dry-Run¶
pvw uc term import-json \
--json-file "samples/json/term/uc_terms_bulk_example.json" \
--dry-run
Example 4: JSON Import (Actual)¶
pvw uc term import-json \
--json-file "samples/json/term/uc_terms_bulk_example.json"
Example 5: JSON Import with Domain Override¶
Override domain_id from JSON file:
pvw uc term import-json \
--json-file "samples/json/term/uc_terms_bulk_example.json" \
--domain-id "59ae27b5-40bc-4c90-abfe-fe1a0638fe3a"
Critical Requirements¶
1. Owner IDs Must Be GUIDs ⚠️¶
❌ INCORRECT (Will Fail):
owner_id
marketing@company.com
✅ CORRECT:
owner_id
0360aff3-add5-4b7c-b172-52add69b0199
Error Message if Using Email:
HTTP 400: Error converting value "marketing@company.com" to type 'System.Guid'
How to Get Object IDs:
- Azure Portal:
- Navigate to: Azure Active Directory → Users
-
Select user → Copy "Object ID"
-
Azure CLI:
az ad user show --id user@domain.com --query id -o tsv -
PowerShell:
(Get-AzADUser -UserPrincipalName user@domain.com).Id
For complete details, see: OWNER_ID_FIX.md
2. Domain Publishing Status ⚠️¶
Terms cannot be created with "Published" status in an unpublished domain.
❌ WILL FAIL if domain is unpublished:
name,description,status
My Term,Description,Published
Error Message:
HTTP 400: DataCatalogInvalidEntity - Entity cannot be published in an unpublished domain.
✅ SOLUTION:
Use status: "Draft" for unpublished domains:
name,description,status
My Term,Description,Draft
You can update terms to "Approved" status later using:
pvw uc term update --term-id <ID> --status "Approved"
Troubleshooting¶
Issue: "Entity cannot be published in an unpublished domain"¶
Cause: Trying to create terms with status: "Published" in an unpublished domain.
Solution:
- Use status: "Draft" in your CSV/JSON files
- OR publish your domain first using: pvw uc domain publish --domain-id <ID>
Issue: "Error converting value to type 'System.Guid'"¶
Cause: Using email addresses instead of Entra ID Object IDs (GUIDs) for owner_id.
Solution: 1. Get the user's Object ID from Azure Portal, Azure CLI, or PowerShell 2. Replace email addresses with GUIDs in your CSV/JSON files 3. See "Critical Requirements" section above for details
Issue: "No ID returned" in Response¶
Cause: API returned an error instead of a term ID.
Solution: 1. Check the error message in the response 2. Common causes: - Invalid domain_id - Invalid owner_id format (must be GUID) - Domain publishing status mismatch - Missing required fields
Issue: CSV File Encoding Errors¶
Cause: CSV file has special characters or incorrect encoding.
Solution: - Save CSV file with UTF-8 encoding - Avoid special characters in term names - Quote fields containing commas
Issue: JSON Parsing Errors¶
Cause: Invalid JSON syntax.
Solution:
- Validate JSON file using: python -m json.tool your-file.json
- Check for:
- Missing commas between array items
- Trailing commas (not allowed in strict JSON)
- Unquoted strings
- Missing braces/brackets
Testing Status¶
✅ Testing Complete - All Tests Passed
Test Date: October 10, 2025
Test Results: 8/8 terms imported successfully (100% success rate)
✅ Created: Customer Acquisition Cost (ID: f85d19ec-8c3c-4c35-a731-d997d0b929cd)
✅ Created: Monthly Recurring Revenue (ID: dc296259-249f-4e1f-ae70-7939241d7b56)
✅ Created: Net Promoter Score (ID: 1fb141c5-28e6-487e-b32d-2fe354a511da)
✅ Created: Key Performance Indicator (ID: 8c1bbd7e-cd09-4369-b667-2d5df94d896a)
✅ Created: Return on Investment (ID: 23845a0e-0cc8-4295-95e0-f21879a73c1e)
✅ Created: Service Level Agreement (ID: 08728338-6489-4f0b-9ea7-fed789053fd1)
✅ Created: Customer Success Score (ID: a667443c-8c03-4b1d-8a24-b9b193a41ff5)
✅ Created: Total Contract Value (ID: 465feb86-9ad0-41c7-afc9-985e6b4ee313)
Best Practices¶
- Always Use Dry-Run First
- Validate data before importing
- Catch errors early
-
Preview what will be created
-
Start with Draft Status
- Create terms as "Draft" first
- Review and update to "Approved" later
-
Avoid domain publishing issues
-
Use Sample Files as Templates
- Modify existing samples rather than creating from scratch
- Samples have correct structure and formatting
-
Reduce errors and troubleshooting
-
Validate Owner IDs
- Always use GUIDs, never email addresses
- Test with one term first if unsure
-
Owner assignment is optional - can be added later
-
Check Domain Status
- Verify domain exists:
pvw uc domain read --domain-id <ID> - Check domain status before importing "Published" terms
-
Use
pvw uc domain listto see all domains -
Handle Large Imports
- For 100+ terms, consider splitting into smaller batches
- Use progress tracking to monitor import
-
Sequential POST means ~1-2 seconds per term
-
Verify After Import
- List terms to confirm:
pvw uc term list --domain-id <ID> - Check term details:
pvw uc term read --term-id <ID> - Update any errors individually
API Reference¶
Endpoint Used¶
POST https://{purview-account}.purview.azure.com/datagovernance/catalog/terms
Request Payload Structure¶
{
"name": "string",
"description": "string",
"domain": {
"id": "string (GUID)"
},
"status": "Draft | Approved",
"acronyms": ["string"],
"contacts": {
"owner": [
{
"id": "string (GUID)"
}
]
},
"resources": [
{
"name": "string",
"url": "string"
}
]
}
Response Structure (Success)¶
{
"id": "string (GUID)",
"name": "string",
"description": "string",
"domain": {
"id": "string (GUID)"
},
"status": "Draft | Approved",
...
}
Response Structure (Error)¶
{
"error": {
"code": "string",
"message": "string",
"details": null,
"target": null
}
}
Related Documentation¶
- Unified Catalog Overview:
docs/commands/unified-catalog.md - Term Management:
docs/commands/unified-catalog/term.md - Domain Management:
docs/commands/unified-catalog/domain.md - Owner ID Fix Guide:
OWNER_ID_FIX.md - Implementation Summary:
BULK_IMPORT_COMPLETE.md
Jupyter Notebook Examples¶
See: samples/notebooks (plus)/unified_catalog_terms_examples.ipynb
Examples Included: - Example 10: Generate CSV file with Python - Example 11: CSV import with dry-run - Example 12: CSV import (actual) - Example 13: Generate JSON file with Python - Example 14: JSON import with dry-run - Example 15: JSON import (actual) - Example 16: Verify imported terms
Support¶
For questions or issues:
1. Check this documentation first
2. Review sample files in samples/csv/ and samples/json/term/
3. Try notebook examples in samples/notebooks (plus)/
4. Check OWNER_ID_FIX.md for GUID-related issues
Last Updated: October 10, 2025
Status: ✅ Fully Tested and Production Ready