Photo by rc.xyz on Unsplash

Programmatically Setting Databricks Secrets in CI/CD

March 22, 2025

Databricks Secrets are an awesome and secure way to access secrets like a database password in your Databricks workflows, but setting them manually - especially if you have multiple environments - is a pain, messy, and error prone.

Two truths I’ve picked up from my time as a software engineer are:

  1. Source code is king. Anything you have to do manually via a user interface or manually running commands could come back to bite you later, because it makes your process non-reproducible.
  2. Secrets managers are queen, and for similar reasons to why source code is king. Storing your project’s secrets like API keys or passwords in one helps you to manage access so only the correct people or applications are accessing them. It also makes it easier to update the secrets because you only have to update them in one place.

As said above, Databricks Secrets is a great feature, but you should probably not use it as the secrets manager for your project. True secrets managers are paid products that have great integrations and features for managing your secrets. Databricks Secrets lacks a UI, is only hosted within Databricks, and is best used for passing secrets into a notebook.

With that preamble out of the way, let’s get into how you leverage source code and a secrets manager to pass secrets into Databricks.

Passing Secrets from Github Environments to Databricks Secrets

Below is the Github Actions Workflow I’m using to set secrets in Databricks. I’m using the secrets section in Github Environments to store secrets for my client’s project, but this could easily be extended to a third party secrets manager like Hashicorp Vault.

on:
  workflow_call:
    inputs:
      environment:
        description: The Github environment
        required: true
        type: string

jobs:
  set-dbx-secrets:
    name: "Set Databricks Secrets"
    runs-on: ubuntu-latest
    environment: ${{ inputs.environment }}

    steps:
      - uses: actions/checkout@v4
      - uses: databricks/setup-cli@main
      - name: Configure Dev Databricks CLI
        if: ${{ inputs.environment == 'dev' }}
        run: |
          ./scripts/configure_dbx.sh \
            user1 ${{ secrets.DBX_user1_PAT }} \
            user2 ${{ secrets.DBX_USER2_PAT }}

          echo "DATABRICKS_CONFIG_PROFILE=${{ github.actor }}" >> $GITHUB_ENV
      - name: Configure Stage or Production Databricks CLI
        if: ${{ inputs.environment != 'dev' }}
        run: |
          echo "DATABRICKS_HOST=${{ vars.DBX_HOST }}" >> $GITHUB_ENV
          echo "DATABRICKS_CLIENT_ID=${{ vars.DBX_SP_ID }}" >> $GITHUB_ENV
          echo "DATABRICKS_CLIENT_SECRET=${{ secrets.DBX_SP_SECRET }}" >> $GITHUB_ENV
      - name: Set Variables
        run: |
          SCOPE=${{ inputs.environment }}-db
          if ! databricks secrets -t ${{ inputs.environment }} list-scopes -o json | jq -e --arg scope "$SCOPE" '[.[] | select(.name == $scope)] | length > 0' > /dev/null; then
            databricks secrets -t ${{ inputs.environment }} create-scope $SCOPE
          fi

          databricks secrets -t ${{ inputs.environment }} put-secret $SCOPE host --string-value "${{ vars.DBX_DB_HOST }}"
          databricks secrets -t ${{ inputs.environment }} put-secret $SCOPE database --string-value "${{ vars.DBX_DB_DATABASE }}"
          databricks secrets -t ${{ inputs.environment }} put-secret $SCOPE username --string-value "${{ vars.DBX_DB_USERNAME }}"
          databricks secrets -t ${{ inputs.environment }} put-secret $SCOPE password --string-value "${{ secrets.DBX_DB_PASSWORD }}"

Let’s break it down.

Workflow Trigger

This workflow is designed to be reusable by other workflows using workflow_call. It accepts an environment input, which allows you to specify which GitHub environment (like dev, stage, or prod) to use. This lets you reuse the same workflow logic for different deployment environments.

on:
  workflow_call:
    inputs:
      environment:
        description: The Github environment
        required: true
        type: string

Job Definition: set-dbx-secrets

The job is called set-dbx-secrets, and it runs on Ubuntu machines, as shown by runs-on: ubuntu-latest. It uses the environment input passed earlier to determine which set of secrets to retrieve.

jobs:
  set-dbx-secrets:
    name: "Set Databricks Secrets"
    runs-on: ubuntu-latest
    environment: ${{ inputs.environment }}

Step 1: Checkout the Code

This step uses GitHub’s actions/checkout action to check out the repository’s code so you can run scripts or access any files in the repo.

steps:
  - uses: actions/checkout@v4

Step 2: Set Up the Databricks CLI

These next three steps install the Databricks CLI and configure it to authenticate with the correct Databricks workspace and/or user. Specifically, I wanted the CLI to authenticate as the user developing the feature if the workflow is running in the dev environment. Alternatively, if we’re deploying to the stage or prod environment, we only want to authenticate as a service principal. For more information about how I set that up, see my post on deploying Databricks with Github Actions.

- uses: databricks/setup-cli@main
- name: Configure Dev Databricks CLI
  if: ${{ inputs.environment == 'dev' }}
  run: |
    ./scripts/configure_dbx.sh \
      user1 ${{ secrets.DBX_user1_PAT }} \
      user2 ${{ secrets.DBX_USER2_PAT }}

    echo "DATABRICKS_CONFIG_PROFILE=${{ github.actor }}" >> $GITHUB_ENV
- name: Configure Stage or Production Databricks CLI
  if: ${{ inputs.environment != 'dev' }}
  run: |
    echo "DATABRICKS_HOST=${{ vars.DBX_HOST }}" >> $GITHUB_ENV
    echo "DATABRICKS_CLIENT_ID=${{ vars.DBX_SP_ID }}" >> $GITHUB_ENV
    echo "DATABRICKS_CLIENT_SECRET=${{ secrets.DBX_SP_SECRET }}" >> $GITHUB_ENV

Dev Environment Configuration

For the dev environment, the workflow runs a custom shell script called configure_dbx.sh to create the .databrickscfg config file for authentication using personal access tokens (PATs). The Tokens are securely retrieved from GitHub Secrets. I’ll include that bash script below:

#!/bin/bash
set -exo pipefail

echo "configuring databrick-cli authentication for dev"

if (( $# % 2 != 0 )); then
  echo "Error: Arguments must be in pairs of <username> <PAT-token>" >&2
  exit 1
fi

declare DATABRICKS_URL="https://dev_workspace123.cloud.databricks.com/"
CONFIG_FILE=~/.databrickscfg

echo "Populating [$CONFIG_FILE]"
> "$CONFIG_FILE" # Clear existing file

while (( $# )); do
  DBX_PROFILE="$1"
  DBX_ACCESS_TOKEN="$2"
  shift 2  # Move to the next pair

  echo "Adding profile: $DBX_PROFILE"
  echo "[$DBX_PROFILE]" >> "$CONFIG_FILE"
  echo "host = $DATABRICKS_URL" >> "$CONFIG_FILE"
  echo "token = $DBX_ACCESS_TOKEN" >> "$CONFIG_FILE"
  echo "" >> "$CONFIG_FILE"
done

Stage/Production Configuration

For non-dev environments, authentication is done using a service principal. The workflow sets Databricks-specific environment variables (like DATABRICKS_HOST, DATABRICKS_CLIENT_ID, and DATABRICKS_CLIENT_SECRET), with values pulled from GitHub Secrets and Variables.

- name: Configure Stage or Production Databricks CLI
  if: ${{ inputs.environment != 'dev' }}
  run: |
    echo "DATABRICKS_HOST=${{ vars.DBX_HOST }}" >> $GITHUB_ENV
    echo "DATABRICKS_CLIENT_ID=${{ vars.DBX_SP_ID }}" >> $GITHUB_ENV
    echo "DATABRICKS_CLIENT_SECRET=${{ secrets.DBX_SP_SECRET }}" >> $GITHUB_ENV

Step 3: Set Secrets in Databricks

This step automates the process of creating a Databricks secrets scope (if it doesn’t already exist) and then sets several secrets within that scope.

1. Create the Secrets Scope:

It uses the Databricks CLI to check if a secrets scope named <environment>-db already exists. If not, it creates the scope.

SCOPE=${{ inputs.environment }}-db
if ! databricks secrets -t ${{ inputs.environment }} list-scopes -o json | jq -e --arg scope "$SCOPE" '[.[] | select(.name == $scope)] | length > 0' > /dev/null; then
    databricks secrets -t ${{ inputs.environment }} create-scope $SCOPE
fi

2. Set Secrets in the Scope:

Once the scope exists, it stores several secrets, including database credentials (like host, username, and password), using values retrieved from GitHub Secrets and Variables.

databricks secrets -t ${{ inputs.environment }} put-secret $SCOPE host --string-value "${{ vars.DBX_DB_HOST }}"
databricks secrets -t ${{ inputs.environment }} put-secret $SCOPE database --string-value "${{ vars.DBX_DB_DATABASE }}"
databricks secrets -t ${{ inputs.environment }} put-secret $SCOPE username --string-value "${{ vars.DBX_DB_USERNAME }}"
databricks secrets -t ${{ inputs.environment }} put-secret $SCOPE password --string-value "${{ secrets.DBX_DB_PASSWORD }}"

Why This Matters

Automating the setup of Databricks secrets with GitHub Actions ensures that:

  • Your Workflow is Reproducible: No more manual steps! Everything is defined in code, making it easier to track changes, collaborate with teammates, and debug issues.
  • You Leverage a Proper Secrets Manager: By using GitHub Secrets (or extending this to a dedicated secrets manager), you avoid overloading Databricks Secrets and benefit from better access control, auditability, and centralized secrets management.
  • Environment-Specific Flexibility: This workflow can be run in different environments (like dev, stage, and prod) without duplicating code, making it scalable and maintainable.

With this approach, you get the best of both worlds: Databricks Secrets for securely passing secrets into notebooks and a proper secrets manager for centralized management and automation.