Skip to content

Unified Catalog - Bulk Term Import Guide

Last Updated: October 10, 2025
Status: ✅ Fully Tested and Validated


Overview

This guide covers bulk import functionality for Microsoft Purview Unified Catalog terms. The CLI supports importing multiple terms from CSV or JSON files.

Key Features

  • CSV Import: Import terms from comma-separated files
  • JSON Import: Import terms from structured JSON files
  • Dry-Run Mode: Validate data before importing
  • Error Handling: Detailed error messages and validation
  • Progress Tracking: Real-time progress with Rich console output
  • Domain Override: Specify domain ID via CLI flag (JSON only)

Important Notes

⚠️ No Native Bulk Endpoint: Microsoft Purview Unified Catalog API does not provide a native /terms/bulk endpoint. This implementation uses sequential POST requests (same approach as glossary term import).


Commands

CSV Import

pvw uc term import-csv --csv-file <PATH> --domain-id <DOMAIN_ID> [--dry-run]

Parameters: - --csv-file (required): Path to CSV file - --domain-id (required): Target domain ID for all terms - --dry-run (optional): Validate data without creating terms

JSON Import

pvw uc term import-json --json-file <PATH> [--domain-id <DOMAIN_ID>] [--dry-run]

Parameters: - --json-file (required): Path to JSON file - --domain-id (optional): Override domain_id from JSON file - --dry-run (optional): Validate data without creating terms


CSV File Format

Structure

name,description,status,acronym,owner_id,resource_name,resource_url
Customer Acquisition Cost,The cost associated with acquiring a new customer,Draft,CAC,0360aff3-add5-4b7c-b172-52add69b0199,Marketing Metrics Guide,https://docs.example.com/marketing/cac
Monthly Recurring Revenue,Predictable revenue generated each month,Draft,MRR,0360aff3-add5-4b7c-b172-52add69b0199,Finance Dashboard,https://finance.example.com/mrr

Required Fields

  • name: Term name (string, max 200 characters)
  • description: Term description (string)
  • status: Either "Draft" or "Approved"

Optional Fields

  • acronym: Single acronym for the term
  • owner_id: Entra ID Object ID (GUID format, e.g., 0360aff3-add5-4b7c-b172-52add69b0199)
  • resource_name: Name of related resource
  • resource_url: URL of related resource

Sample File

See: samples/csv/uc_terms_bulk_example.csv (8 sample terms)


JSON File Format

Structure

{
  "terms": [
    {
      "name": "Data Lake",
      "description": "Centralized repository storing structured and unstructured data at scale",
      "domain_id": "59ae27b5-40bc-4c90-abfe-fe1a0638fe3a",
      "status": "Draft",
      "acronyms": ["DL"],
      "owner_ids": ["0360aff3-add5-4b7c-b172-52add69b0199"],
      "resources": [
        {
          "name": "Data Lake Architecture Guide",
          "url": "https://architecture.example.com/data-lake"
        }
      ]
    },
    {
      "name": "Data Warehouse",
      "description": "Central repository of integrated data from disparate sources",
      "domain_id": "59ae27b5-40bc-4c90-abfe-fe1a0638fe3a",
      "status": "Draft",
      "acronyms": ["DW", "DWH"],
      "owner_ids": ["0360aff3-add5-4b7c-b172-52add69b0199"],
      "resources": [
        {
          "name": "Data Warehouse Documentation",
          "url": "https://docs.example.com/dwh"
        }
      ]
    }
  ]
}

Required Fields per Term

  • name: Term name
  • description: Term description
  • domain_id: Target domain ID (can be overridden with --domain-id flag)

Optional Fields per Term

  • status: "Draft" or "Approved" (defaults to "Draft")
  • acronyms: Array of acronym strings
  • owner_ids: Array of Entra ID Object IDs (GUIDs)
  • resources: Array of objects with name and url properties

Sample Files

  • samples/json/term/uc_terms_bulk_example.json (8 sample terms)
  • samples/json/term/uc_terms_sample.json (8 sample terms - alternate)

Usage Examples

Example 1: CSV Import with Dry-Run

Validate data before importing:

pvw uc term import-csv \
  --csv-file "samples/csv/uc_terms_bulk_example.csv" \
  --domain-id "59ae27b5-40bc-4c90-abfe-fe1a0638fe3a" \
  --dry-run

Output:

🔍 DRY RUN MODE - No terms will be created

Preview of terms to be imported:
┌────────────────────────────┬──────────────────────────────────────┬────────┬─────────┐
│ Name                       │ Description                          │ Status │ Acronym │
├────────────────────────────┼──────────────────────────────────────┼────────┼─────────┤
│ Customer Acquisition Cost  │ The cost associated with acquiring…  │ Draft  │ CAC     │
│ Monthly Recurring Revenue  │ Predictable revenue generated each…  │ Draft  │ MRR     │
│ ...                        │ ...                                  │ ...    │ ...     │
└────────────────────────────┴──────────────────────────────────────┴────────┴─────────┘

Example 2: CSV Import (Actual)

Import all terms:

pvw uc term import-csv \
  --csv-file "samples/csv/uc_terms_bulk_example.csv" \
  --domain-id "59ae27b5-40bc-4c90-abfe-fe1a0638fe3a"

Output:

Found 8 term(s) in CSV file
✅ Created: Customer Acquisition Cost (ID: f85d19ec-8c3c-4c35-a731-d997d0b929cd)
✅ Created: Monthly Recurring Revenue (ID: dc296259-249f-4e1f-ae70-7939241d7b56)
✅ Created: Net Promoter Score (ID: 1fb141c5-28e6-487e-b32d-2fe354a511da)
✅ Created: Key Performance Indicator (ID: 8c1bbd7e-cd09-4369-b667-2d5df94d896a)
✅ Created: Return on Investment (ID: 23845a0e-0cc8-4295-95e0-f21879a73c1e)
✅ Created: Service Level Agreement (ID: 08728338-6489-4f0b-9ea7-fed789053fd1)
✅ Created: Customer Success Score (ID: a667443c-8c03-4b1d-8a24-b9b193a41ff5)
✅ Created: Total Contract Value (ID: 465feb86-9ad0-41c7-afc9-985e6b4ee313)

📊 Import Summary:
  Total terms: 8
  ✅ Successfully created: 8
  ❌ Failed: 0

Example 3: JSON Import with Dry-Run

pvw uc term import-json \
  --json-file "samples/json/term/uc_terms_bulk_example.json" \
  --dry-run

Example 4: JSON Import (Actual)

pvw uc term import-json \
  --json-file "samples/json/term/uc_terms_bulk_example.json"

Example 5: JSON Import with Domain Override

Override domain_id from JSON file:

pvw uc term import-json \
  --json-file "samples/json/term/uc_terms_bulk_example.json" \
  --domain-id "59ae27b5-40bc-4c90-abfe-fe1a0638fe3a"

Critical Requirements

1. Owner IDs Must Be GUIDs ⚠️

❌ INCORRECT (Will Fail):

owner_id
marketing@company.com

✅ CORRECT:

owner_id
0360aff3-add5-4b7c-b172-52add69b0199

Error Message if Using Email:

HTTP 400: Error converting value "marketing@company.com" to type 'System.Guid'

How to Get Object IDs:

  1. Azure Portal:
  2. Navigate to: Azure Active Directory → Users
  3. Select user → Copy "Object ID"

  4. Azure CLI:

    az ad user show --id user@domain.com --query id -o tsv
    

  5. PowerShell:

    (Get-AzADUser -UserPrincipalName user@domain.com).Id
    

For complete details, see: OWNER_ID_FIX.md

2. Domain Publishing Status ⚠️

Terms cannot be created with "Published" status in an unpublished domain.

❌ WILL FAIL if domain is unpublished:

name,description,status
My Term,Description,Published

Error Message:

HTTP 400: DataCatalogInvalidEntity - Entity cannot be published in an unpublished domain.

✅ SOLUTION: Use status: "Draft" for unpublished domains:

name,description,status
My Term,Description,Draft

You can update terms to "Approved" status later using:

pvw uc term update --term-id <ID> --status "Approved"


Troubleshooting

Issue: "Entity cannot be published in an unpublished domain"

Cause: Trying to create terms with status: "Published" in an unpublished domain.

Solution: - Use status: "Draft" in your CSV/JSON files - OR publish your domain first using: pvw uc domain publish --domain-id <ID>


Issue: "Error converting value to type 'System.Guid'"

Cause: Using email addresses instead of Entra ID Object IDs (GUIDs) for owner_id.

Solution: 1. Get the user's Object ID from Azure Portal, Azure CLI, or PowerShell 2. Replace email addresses with GUIDs in your CSV/JSON files 3. See "Critical Requirements" section above for details


Issue: "No ID returned" in Response

Cause: API returned an error instead of a term ID.

Solution: 1. Check the error message in the response 2. Common causes: - Invalid domain_id - Invalid owner_id format (must be GUID) - Domain publishing status mismatch - Missing required fields


Issue: CSV File Encoding Errors

Cause: CSV file has special characters or incorrect encoding.

Solution: - Save CSV file with UTF-8 encoding - Avoid special characters in term names - Quote fields containing commas


Issue: JSON Parsing Errors

Cause: Invalid JSON syntax.

Solution: - Validate JSON file using: python -m json.tool your-file.json - Check for: - Missing commas between array items - Trailing commas (not allowed in strict JSON) - Unquoted strings - Missing braces/brackets


Testing Status

Testing Complete - All Tests Passed

Test Date: October 10, 2025
Test Results: 8/8 terms imported successfully (100% success rate)

✅ Created: Customer Acquisition Cost (ID: f85d19ec-8c3c-4c35-a731-d997d0b929cd)
✅ Created: Monthly Recurring Revenue (ID: dc296259-249f-4e1f-ae70-7939241d7b56)
✅ Created: Net Promoter Score (ID: 1fb141c5-28e6-487e-b32d-2fe354a511da)
✅ Created: Key Performance Indicator (ID: 8c1bbd7e-cd09-4369-b667-2d5df94d896a)
✅ Created: Return on Investment (ID: 23845a0e-0cc8-4295-95e0-f21879a73c1e)
✅ Created: Service Level Agreement (ID: 08728338-6489-4f0b-9ea7-fed789053fd1)
✅ Created: Customer Success Score (ID: a667443c-8c03-4b1d-8a24-b9b193a41ff5)
✅ Created: Total Contract Value (ID: 465feb86-9ad0-41c7-afc9-985e6b4ee313)

Best Practices

  1. Always Use Dry-Run First
  2. Validate data before importing
  3. Catch errors early
  4. Preview what will be created

  5. Start with Draft Status

  6. Create terms as "Draft" first
  7. Review and update to "Approved" later
  8. Avoid domain publishing issues

  9. Use Sample Files as Templates

  10. Modify existing samples rather than creating from scratch
  11. Samples have correct structure and formatting
  12. Reduce errors and troubleshooting

  13. Validate Owner IDs

  14. Always use GUIDs, never email addresses
  15. Test with one term first if unsure
  16. Owner assignment is optional - can be added later

  17. Check Domain Status

  18. Verify domain exists: pvw uc domain read --domain-id <ID>
  19. Check domain status before importing "Published" terms
  20. Use pvw uc domain list to see all domains

  21. Handle Large Imports

  22. For 100+ terms, consider splitting into smaller batches
  23. Use progress tracking to monitor import
  24. Sequential POST means ~1-2 seconds per term

  25. Verify After Import

  26. List terms to confirm: pvw uc term list --domain-id <ID>
  27. Check term details: pvw uc term read --term-id <ID>
  28. Update any errors individually

API Reference

Endpoint Used

POST https://{purview-account}.purview.azure.com/datagovernance/catalog/terms

Request Payload Structure

{
  "name": "string",
  "description": "string",
  "domain": {
    "id": "string (GUID)"
  },
  "status": "Draft | Approved",
  "acronyms": ["string"],
  "contacts": {
    "owner": [
      {
        "id": "string (GUID)"
      }
    ]
  },
  "resources": [
    {
      "name": "string",
      "url": "string"
    }
  ]
}

Response Structure (Success)

{
  "id": "string (GUID)",
  "name": "string",
  "description": "string",
  "domain": {
    "id": "string (GUID)"
  },
  "status": "Draft | Approved",
  ...
}

Response Structure (Error)

{
  "error": {
    "code": "string",
    "message": "string",
    "details": null,
    "target": null
  }
}

  • Unified Catalog Overview: docs/commands/unified-catalog.md
  • Term Management: docs/commands/unified-catalog/term.md
  • Domain Management: docs/commands/unified-catalog/domain.md
  • Owner ID Fix Guide: OWNER_ID_FIX.md
  • Implementation Summary: BULK_IMPORT_COMPLETE.md

Jupyter Notebook Examples

See: samples/notebooks (plus)/unified_catalog_terms_examples.ipynb

Examples Included: - Example 10: Generate CSV file with Python - Example 11: CSV import with dry-run - Example 12: CSV import (actual) - Example 13: Generate JSON file with Python - Example 14: JSON import with dry-run - Example 15: JSON import (actual) - Example 16: Verify imported terms


Support

For questions or issues: 1. Check this documentation first 2. Review sample files in samples/csv/ and samples/json/term/ 3. Try notebook examples in samples/notebooks (plus)/ 4. Check OWNER_ID_FIX.md for GUID-related issues


Last Updated: October 10, 2025
Status: ✅ Fully Tested and Production Ready