Databricks - 4MINDS

Overview

The 4MINDS platform integrates with Databricks, allowing you to securely connect your Databricks workspace, browse your Unity Catalog, import tables and volume files into 4MINDS datasets, and register Databricks-hosted models as chat models inside 4MINDS. You can also submit Spark and GPU jobs, track ML experiments with MLflow, and share data across organizations via Delta Sharing.

Diagrams

Getting Started

Prerequisites

A Databricks workspace with Unity Catalog enabled
A SQL warehouse (classic or serverless) that you can query
One of the following authentication credentials:
- An OAuth application configured in your Databricks workspace (recommended), or
- A Personal Access Token (PAT) from Databricks, or
- A Service Principal (Client ID, Client Secret, Account ID) for automated workloads
Appropriate Unity Catalog permissions for the catalogs, schemas, tables, and volumes you want to access

Connecting Your Databricks Workspace

The Databricks integration supports three authentication methods: OAuth U2M (recommended), Personal Access Token, and OAuth M2M / Service Principal.

Option A: OAuth U2M (User-to-Machine) — Recommended

Open Integrations from the main navigation bar in 4MINDS.
Find the Databricks integration and click Connect.
Select the OAuth tab.
Enter your Workspace URL (e.g. https://adb-xxx.azuredatabricks.net) and SQL Warehouse ID.
Click Connect with Databricks. A popup window will open.
Log in to your Databricks account and authorize 4MINDS when prompted.
The popup will close automatically once authorization is complete.

Note: OAuth uses OAuth 2.0 with PKCE (Proof Key for Code Exchange). Your Databricks password is never stored by 4MINDS. Access tokens are encrypted and refreshed automatically.

Admin setup: Before users can use OAuth, an admin must configure the Databricks OAuth application (Client ID + Client Secret) once in 4MINDS. See Admin Setup for OAuth below.

Option B: Personal Access Token (PAT)

Open Integrations from the main navigation bar in 4MINDS.
Find the Databricks integration and click Connect.
Select the Personal Access Token tab.
Enter your Workspace URL, Personal Access Token, and SQL Warehouse ID.
- To generate a PAT: log in to Databricks, click your user icon > User Settings > Developer > Access tokens > Generate new token.
Click Test Connection to verify your credentials.
Click Save Credentials to complete the setup.

Note: Personal Access Tokens are long-lived and do not auto-refresh. If a token is revoked in Databricks, you will need to reconnect with a new token.

Option C: Service Principal (OAuth M2M)

For automated pipelines or shared service accounts without a human user:

Open Integrations from the main navigation bar in 4MINDS.
Find the Databricks integration and click Connect.
Select the Service Principal tab.
Enter your Workspace URL.
The connection uses the service principal credentials configured by your admin. See Admin Setup for Service Principal below.

Disconnecting

To disconnect your Databricks workspace, open Integrations from the main navigation bar, find Databricks, and click Disconnect. This removes your stored credentials and revokes active tokens.

Admin Setup for OAuth

Organization admins configure a Databricks OAuth application once per organization:

In Databricks, create an OAuth app and note the Client ID and Client Secret.
In 4MINDS, open Integrations > Databricks > Admin Settings.
Enter the Client ID and Client Secret, then save.
Users in the organization can now connect via the OAuth tab.

The Client Secret is AES-encrypted at rest. Admins can update the Client ID at any time without re-entering the secret; the existing secret is preserved unless a new one is provided.

Admin Setup for Service Principal

In Databricks, create a service principal and generate OAuth credentials (Client ID + Client Secret). Note your Account ID from the Databricks account console.
In 4MINDS, open Integrations > Databricks > Admin Settings.
Enter the service principal’s Client ID, Client Secret, and Account ID, then save.
Users can now connect via the Service Principal tab.

OAuth U2M and M2M credentials can coexist — users choose which flow to use when connecting.

Authentication Methods

Method 1: OAuth U2M (User-to-Machine) — Recommended

This is the primary and recommended authentication method. It uses the OAuth 2.0 Authorization Code grant with PKCE (Proof Key for Code Exchange), providing the strongest security model for interactive users. User connection flow:

The user clicks “Connect with Databricks” and enters their workspace URL and SQL warehouse ID
A browser popup opens to the Databricks authorization page
The user logs in with their Databricks credentials and grants consent
Databricks redirects back to 4MINDS with an authorization code
The 4MINDS backend exchanges the code for access and refresh tokens, validating the PKCE code verifier to prevent interception
The user’s identity (email, name) is retrieved from the Databricks OIDC userinfo endpoint
The connection is established — the user sees their email and connection status in the 4MINDS UI

Scopes requested: all-apis, offline_access PKCE details: The platform generates a cryptographically random 32-byte code verifier, computes a SHA-256 code challenge, and sends the challenge with the authorization request. The verifier is submitted during the token exchange, ensuring that even if the authorization code is intercepted, it cannot be used without the original verifier. Token lifecycle:

Access tokens are refreshed automatically when they are within 5 minutes of expiration. Users never need to re-authenticate unless the refresh token is revoked.
If Databricks returns a new refresh token during renewal, the updated token is stored immediately (refresh token rotation support).
Tokens are stored server-side only — they are never sent to or exposed in the browser.
A CSRF state token is validated on every OAuth callback to prevent cross-site request forgery.

Method 2: Personal Access Token (PAT)

For users or environments where OAuth is not configured, the integration supports connecting with a Databricks Personal Access Token. This is the simplest method and is useful for quick setup, testing, or workspaces that haven’t configured an OAuth application. How it works:

The PAT is sent as a Bearer token in the Authorization header on all Databricks API requests
PATs do not expire automatically but can be revoked by the user in their Databricks workspace settings
No automatic token refresh is needed since PATs are long-lived
The token is encrypted at rest and passed to the backend Databricks client for each API call

Trade-offs vs OAuth:

Simpler to set up (no admin OAuth app configuration needed)
Less secure (static token vs short-lived rotating tokens)
No user identity verification (the platform trusts whoever provides the token)
No refresh mechanism (if the PAT is revoked, the user must manually reconnect)

Method 3: OAuth M2M (Machine-to-Machine) — Service Principal

For automated workflows and service accounts, the integration supports OAuth 2.0 Client Credentials grant using a Databricks service principal. This is designed for scenarios where no interactive user is present. How it works:

The user selects “Connect with Service Principal” and enters their workspace URL
The backend retrieves the M2M credentials from the admin configuration
A token is requested directly from the Databricks accounts-level OIDC endpoint (https://accounts.cloud.databricks.com/oidc/accounts/{account_id}/v1/token) using the client_credentials grant
The access token is stored and the connection is established

Scope requested: sql Token lifecycle:

M2M tokens are not refreshed — when one expires, a new token is obtained using the same client credentials
No refresh token is issued (standard behavior for client credentials grants)
The connection shows as “Service Principal” in the UI rather than a user email

When to use M2M:

Scheduled or automated data pipelines that run without user interaction
Shared service accounts where individual user OAuth is not practical
Environments with service principal-based access controls in Unity Catalog

Authentication Summary

	OAuth U2M	Personal Access Token	OAuth M2M
Security	Highest (PKCE, short-lived tokens, auto-refresh)	Moderate (static long-lived token)	High (OAuth, but shared identity)
User interaction	One-time popup authorization	Enter token manually	None (admin configures credentials)
Token refresh	Automatic	Not applicable	New token on expiry
Identity	User email from Databricks	None (anonymous)	Service principal
Admin setup	Configure OAuth app (Client ID/Secret)	None	Configure service principal (Client ID/Secret/Account ID)
Best for	Interactive users, production environments	Quick setup, testing, development	Automated pipelines, service accounts

Multi-Tenant Credential Management

Each organization in 4MINDS manages their own Databricks OAuth credentials independently:

OAuth U2M: Admins configure their workspace’s OAuth Client ID and Client Secret through the 4MINDS UI. Secrets are encrypted at rest with AES. Admins can update credentials without disconnecting existing users.
OAuth M2M: Admins configure their service principal’s Client ID, Client Secret, and Account ID separately. These can coexist alongside U2M credentials — an organization can have both configured simultaneously.
Environment variable fallback: For simpler or single-tenant deployments, OAuth credentials can also be set via environment variables, which serve as a fallback when no per-organization configuration exists.

Unity Catalog Integration

Data Discovery

The integration provides full hierarchical browsing of Unity Catalog, matching the structure users see in Databricks:

Workspace
  └── Catalogs
       └── Schemas
            ├── Tables (MANAGED, EXTERNAL, VIEW)
            └── Volumes
                 └── Files & Folders

Users navigate this tree in the 4MINDS UI with breadcrumb navigation. At each level, metadata is displayed including owner, description, creation date, and data source format.

What We Access

Object	Description
Catalogs	All accessible catalogs in the workspace
Schemas	Schemas within a selected catalog
Tables	Tables within a schema, including type (managed, external, view, Iceberg) and column metadata (names, types, comments)
Volumes	Unity Catalog volumes and their recursive folder/file contents
Table data	Preview and export via SQL queries executed through a SQL warehouse

SQL Warehouse Handling

The integration detects whether a customer’s SQL warehouse is serverless or classic, and adapts accordingly:

Stopped warehouses are automatically started before queries
Serverless warehouses get shorter polling intervals (they start faster)
Warehouse type and serverless status are surfaced in the UI so users know what they’re running on

Importing Data

Creating a Dataset with Databricks Data

Create a new dataset (or edit an existing one).
Select Databricks as a data source.
The platform checks your Databricks connection. If not connected, you will be prompted to connect first.
Browse your Unity Catalog — select a catalog, then a schema, then pick tables or volume files.
Preview the selection before committing to a full import.
Click Add to stage the selected tables/files for import.
Complete the dataset creation to trigger the import.

How Table Imports Work

Table imports use the Databricks SQL Statement Execution API. The platform runs a SELECT query through your configured SQL warehouse, exports results as JSON or CSV (selectable by the user), uploads the results to Azure Blob Storage, and processes them through the 4MINDS ETL pipeline.

You can limit rows, filter columns, and preview the data before importing.
Stopped warehouses are auto-started before the query runs.

How Volume File Imports Work

Volume files are downloaded directly from Unity Catalog volumes via the Databricks Files API. Files are transferred to Azure Blob Storage and processed by the same 4MINDS ETL pipeline used for all dataset sources. Supported file types include JSON, CSV, Parquet, XLSX, and text/document formats.

Combining with Other Sources

Databricks data can be combined with files from other sources in the same dataset. For example, you can import a Unity Catalog table alongside Google Drive documents, Gong transcripts, or uploaded files.

Dataset Sync

Overview

Beyond one-time imports, 4MINDS supports automatic, continuous synchronization with Databricks Unity Catalog volumes. When you enable dataset sync on an imported dataset, 4MINDS monitors the source volume for new and modified files and automatically pulls them in — keeping your dataset up to date without manual re-imports. This works similarly to rsync: the platform maintains a manifest of every file it has already synced (tracking file path, size, and modification timestamp), and on each sync cycle only downloads files that are new or have changed.

How to Set Up Sync

Import files from a Databricks Unity Catalog volume into a 4MINDS dataset (using the standard import flow above).
On the dataset, toggle Dataset Sync on.
Select a sync frequency (see table below).
From that point on, 4MINDS automatically checks the source volume at the configured interval and pulls in new or modified files.

Sync Frequencies

Frequency	Interval	Best For
Every minute	1 minute	Real-time dashboards, rapidly changing data (Enterprise tier)
Hourly	1 hour	Frequently updated data pipelines (Teams & Enterprise)
Daily	24 hours	Standard business reporting (All paid tiers)
Weekly	7 days	Slowly changing reference data
Monthly	30 days	Compliance snapshots, archival data

How It Works Internally

Event-driven scheduler: The sync system uses an intelligent, event-driven scheduler rather than polling. It calculates the exact time each sync is due based on the configured frequency and last sync timestamp, then sleeps until the earliest next sync. For example, a daily sync that last ran at 8:00 AM will sleep exactly 24 hours — not poll every 30 minutes. When a user creates or modifies a sync configuration, the scheduler wakes up immediately to accommodate the change. Change detection: On each sync cycle, the platform fetches the full file listing from the configured Databricks volume path (recursively including all subfolders). It compares each file against its internal manifest using the file’s unique path, size, and modification timestamp. Files are classified as:

New — File path not in the manifest (never seen before)
Modified — File path exists in the manifest but size or modification time has changed
Unchanged — File matches the manifest exactly (skipped)

Only new and modified files are downloaded, which minimizes bandwidth and processing time. File processing pipeline: Changed files are downloaded from the Databricks volume via the Files API, uploaded to Azure Blob Storage (scoped to the user’s dataset), and processed through the 4MINDS ETL pipeline. The manifest is updated with the new file metadata after successful processing. Sync statistics (total files synced, total size, last sync status) are tracked and visible to the user. Concurrent processing: Multiple dataset syncs can run in parallel. The scheduler processes up to 5 sync configurations concurrently in each batch, with each sync getting its own isolated database session to prevent conflicts.

Authentication for Automated Sync

Since dataset sync runs in the background without user interaction, the platform handles authentication automatically:

M2M (Service Principal) — preferred for sync: If the organization has configured M2M OAuth credentials, the sync system uses the service principal to obtain a fresh access token for each sync cycle. This is the ideal approach for automated workloads because it requires no user interaction and the token is always fresh.
U2M (User OAuth) — fallback: If M2M is not configured, the sync system uses the user’s existing OAuth connection. It automatically refreshes the access token using the stored refresh token when needed, and persists the updated tokens back to the database so subsequent syncs continue to work.
Personal Access Token — simplest: If the user connected with a PAT, the sync system uses it directly. Since PATs are long-lived, no refresh is needed unless the user revokes the token.

Dormant Mode

When no sync configurations exist across the entire platform, the scheduler enters dormant mode — checking only once every 5 minutes for newly created configurations. This ensures zero overhead when the feature is not in use. As soon as a sync configuration is created, the scheduler exits dormant mode and resumes event-driven scheduling.

Model Serving Endpoints

Overview

4MINDS integrates with Mosaic AI Model Serving so that models hosted in your Databricks workspace can be used directly inside 4MINDS as chat models. You can discover all serving endpoints you have access to, review their readiness and classification, and register any endpoint as an external model — making it available in the 4MINDS model picker alongside first-party providers.

Discovering Endpoints

In the 4MINDS model picker, open the Databricks Models section.
4MINDS calls GET /databricks/serving-endpoints and lists every serving endpoint your connected identity can see.
Each endpoint shows its name, state (READY / NOT_READY), classification, and relevant metadata.

For each endpoint, the integration returns:

Name, state, creator, and creation timestamp
Served entities — the raw list of entities backing the endpoint
Task — e.g. llm/v1/chat, llm/v1/completions
Endpoint type — e.g. STANDARD, FOUNDATION_MODEL_API
Entity type — FOUNDATION_MODEL, PT_FOUNDATION_MODEL, UC_MODEL, or EXTERNAL_MODEL
External model provider/name — e.g. openai / gpt-4o, anthropic / claude-3-opus, custom
model_type — a derived classification 4MINDS applies to each endpoint (see below)

Model Type Classification

4MINDS classifies every discovered endpoint into one of the following categories so the UI can present them cleanly:

`model_type`	Description
`DATABRICKS_FM_PPT`	Databricks-hosted foundation model, pay-per-token (name begins with `databricks-`, entity is `FOUNDATION_MODEL`)
`DATABRICKS_FM_PT`	Provisioned-throughput foundation model (`PT_FOUNDATION_MODEL`)
`DATABRICKS_FM_UC_SYSTEM_AI`	UC model under `system.ai.*`
`DATABRICKS_FM_UC_AGENTS`	UC model with an `llm/v1/*` task — treated as an agentic/chat model
`DATABRICKS_CLASSIC_ML`	UC model with no task — classic ML model
`FM_EXTERNAL_MODEL`	External provider proxied through Databricks (OpenAI, Anthropic, etc.)
`FM_EXTERNAL_MODEL_CUSTOM`	External model with a custom provider
`AGENT_BRICKS_KA`	Agent Bricks — Knowledge Assistant
`AGENT_BRICKS_MAS`	Agent Bricks — Multi-Agent Supervisor
`AGENT_BRICKS_KIE`	Agent Bricks — Knowledge / Information Extraction
`AGENT_BRICKS_MS`	Agent Bricks — Model Specialization

Agent Bricks endpoints are detected via tile_endpoint_metadata.problem_type.

Registering an Endpoint as a 4MINDS Model

In the Databricks Models section of the model picker, select an endpoint and click Register.
Enter a display name and optional description.
Configure optional model metadata: max_tokens (default 4096), supports_streaming (default true), context_window, parameters (e.g. "8B", "70B"), inference_speed, and a linked dataset_id.
Save. The endpoint now appears in your model picker alongside first-party providers.

Under the hood, this calls POST /databricks/register-model and persists the endpoint as an external model in 4MINDS.

Chatting with a Registered Endpoint

Once registered, select the model in any 4MINDS chat. Requests are proxied through the same Databricks connection that discovered it, using your stored OAuth/PAT credentials. Responses stream back to the UI just like first-party models.

Serverless GPU Compute

What We Support

The integration supports submitting deep learning and ML workloads to Databricks Serverless GPU Compute through the Jobs API. Users can run GPU-accelerated Python scripts for model training, fine-tuning, inference, and other custom AI workloads.

Supported Accelerators

GPU	Best For	Multi-GPU	Multi-Node
NVIDIA A10	Fine-tuning smaller models, classic ML, computer vision, inference	Yes	Yes
NVIDIA H100	LLM fine-tuning, large-scale model training, distributed deep learning	Up to 8 per node	No (single node)

A10 is the default GPU when none is specified. The num_gpus parameter (per node) is validated at submission — H100 jobs are capped at 8 GPUs because H100 is single-node.

How It Works

When a GPU job is submitted through 4MINDS:

The platform automatically configures the job for serverless GPU compute (GPU jobs always run serverless)
Required dependencies (serverless_gpu, torch) are auto-injected into the job environment
The GPU environment version is selected (separate from the CPU serverless environment)
The job is submitted via the Databricks Jobs API with the specified GPU type and count
The user’s Python script uses the serverless_gpu library’s @distributed decorator to leverage GPU resources

Managed Environments

Two base environments are available for GPU workloads:

Default — Minimal environment with stable client APIs. Best for users who want full control over their dependencies.
AI — Pre-installed with PyTorch, Transformers, Ray, XGBoost, and other popular ML libraries. Best for getting started quickly with training workloads.

Limitations

H100 accelerators are single-node only (up to 8 GPUs in one node)
Only Python workloads are supported
Additional Databricks-side limits (maximum workload runtime, Private Link support, regional availability) apply as documented by Databricks

Additional Capabilities

The integration supports Databricks Delta Sharing for secure cross-organization data access:

Create and manage Delta Shares
Add and remove tables from shares
Create sharing recipients (token-based or Databricks-to-Databricks)
Manage share permissions (grant/revoke SELECT access)

MLflow Integration

Users can create MLflow experiments and log metrics from within 4MINDS:

Create experiments with custom artifact storage locations and tags
Log metrics to MLflow runs with step and timestamp tracking

Spark Job Submission

Beyond GPU workloads, the integration supports submitting general Spark jobs using:

Classic compute — With user-defined cluster configuration
Serverless CPU compute — Using Spark Connect APIs with pip-based dependency management

User-Agent Telemetry

All HTTP requests from 4MINDS to Databricks APIs include the following User-Agent header:

User-Agent: 4MINDSPlatform

This header is sent consistently on every request to Databricks, across all API surfaces:

OAuth operations — Token exchange, token refresh, user info retrieval, M2M token acquisition
Unity Catalog — Catalog, schema, table, and volume listing; table metadata retrieval
SQL execution — Statement execution via SQL warehouses
Warehouse management — Warehouse info queries and auto-start operations
Jobs — Spark and GPU job submission
MLflow — Experiment creation and metrics logging
Delta Sharing — Share and recipient management
Model Serving — Serving endpoint discovery and invocation
File operations — Volume file listing and downloads

The header is set centrally in the Databricks API client, so any new API calls added in the future will automatically include it. There are no code paths that make Databricks API calls without the User-Agent header.

Troubleshooting

Issue	Solution
”No Databricks connection found”	Open Integrations from the main nav and connect your Databricks workspace using OAuth, PAT, or Service Principal.
”Invalid workspace URL”	Verify your workspace URL is the full HTTPS URL (e.g. `https://adb-xxx.azuredatabricks.net`) and does not include trailing paths.
”Warehouse not found” or “Warehouse ID invalid”	Copy the warehouse ID from Databricks SQL > Warehouses. The ID is a short alphanumeric string, not the warehouse name.
OAuth popup blocked	Enable popups for the 4MINDS site in your browser settings, then try again.
OAuth error: “redirect_uri mismatch”	Your admin needs to register the 4MINDS callback URL in the Databricks OAuth app configuration.
401 Unauthorized on API calls	Your credentials may have expired or been revoked. For OAuth, try refreshing the page (token auto-refreshes). For PAT, verify the token is still active in Databricks > User Settings > Access tokens. If issues persist, disconnect and reconnect.
403 Forbidden on a catalog/schema/table	Unity Catalog permissions. The integration respects Databricks access controls — you can only see objects your identity has `USE CATALOG` / `USE SCHEMA` / `SELECT` permissions for. Contact your Databricks admin.
Warehouse takes a long time to start	Classic warehouses take 2–5 minutes to start from stopped state; serverless warehouses start in under a minute. The platform polls automatically; larger warehouses may need longer.
Table import fails with “query timeout”	The SQL warehouse may be under heavy load or the table is very large. Try filtering rows or selecting specific columns, or use a larger warehouse.
Volume file import missing some files	Verify you have `READ FILES` permission on the volume. Files you cannot see will be silently skipped.
Sync is not pulling new files	Check the sync status on the dataset. Verify the source volume still exists and your credentials are still valid. Sync uses the customer’s OAuth/PAT/M2M credentials — if they are revoked, sync will fail.
Service Principal connection fails	Verify the admin has configured Client ID, Client Secret, and Account ID correctly. The Account ID is required for the accounts-level token endpoint.
Model serving endpoint shows NOT_READY	The endpoint is still deploying or has encountered an error in Databricks. Check the endpoint status directly in Databricks > Serving.
GPU job submission fails with “GPU type invalid”	Only `A10` and `H100` are supported. H100 jobs are capped at 8 GPUs per node and single-node only.

Security & Privacy

Authentication

OAuth 2.0 with PKCE — SHA-256 code challenge prevents authorization code interception
CSRF protection — Random state tokens validated on every OAuth callback
Encrypted secrets — All client secrets, PATs, and tokens stored with AES encryption at rest
Minimal scopes — Only the scopes needed for the integration are requested (all-apis, offline_access for U2M; sql for M2M)
Automatic token refresh — Users stay authenticated without manual intervention; refresh token rotation is supported
Server-side token storage — OAuth tokens are never sent to or stored in the browser

Data Access

Respects Unity Catalog permissions — Users can only access data they have permissions for in Databricks. The integration does not elevate or bypass any Databricks access controls.
No persistent data caching — Table data is queried fresh on each request; no local copies are retained in 4MINDS outside of the imported dataset
Scoped cloud storage — Imported data is uploaded to Azure Blob Storage with per-user/per-dataset scoped access
Credential isolation — Each user’s connection credentials are stored independently; no shared tokens across users
Per-organization OAuth apps — Each organization configures its own Databricks OAuth application; credentials are never shared across organizations

FAQ

Q: What Databricks data can I access? A: Anything your identity has Unity Catalog permissions for — catalogs, schemas, tables (managed, external, views, Iceberg), and volumes. Table data is queried through your SQL warehouse; volume files are downloaded via the Files API. Q: Do I need Unity Catalog to use this integration? A: Yes. The integration is built around Unity Catalog for data discovery and access control. Q: Can I import data from an existing dataset? A: Yes. You can add Databricks tables or volume files to both new and existing datasets, and combine them with data from other sources. Q: How often does dataset sync run? A: You choose the frequency when setting up sync: every minute, hourly, daily, weekly, or monthly. Frequencies below daily require a paid tier. Q: What happens to my data if I disconnect my Databricks workspace? A: Previously imported data remains in your datasets. Automatic syncing will stop, and you will not be able to import new data from Databricks until you reconnect. Q: Can I choose which tables or files to sync? A: Yes. Sync is configured per-dataset and operates on the volume path you originally imported from. You can enable/disable sync and change frequency at any time. Q: Are OAuth tokens exposed to my frontend? A: No. OAuth tokens are stored server-side only. The browser only ever sees the connection status. Q: Can I connect multiple Databricks workspaces? A: Each user account in 4MINDS supports one Databricks connection at a time. To switch workspaces, disconnect the current connection and reconnect with different credentials. Q: Does 4MINDS respect my Databricks permissions? A: Yes. Every API call is made with your OAuth/PAT/service principal credentials. You can only see and query data that Databricks itself allows you to access. Q: Can I use my Databricks-hosted models in 4MINDS chats? A: Yes. Discover your workspace’s Mosaic AI Model Serving endpoints in the model picker, register any endpoint as a 4MINDS model, and it becomes available alongside first-party providers. See Model Serving Endpoints. Q: Does 4MINDS support Agent Bricks? A: Yes. Agent Bricks endpoints (Knowledge Assistant, Multi-Agent Supervisor, Knowledge/Information Extraction, Model Specialization) are detected and classified automatically. They can be registered and used as chat models. Q: What GPU types are supported for Spark jobs? A: NVIDIA A10 (multi-GPU, multi-node) and NVIDIA H100 (up to 8 GPUs, single-node). A10 is the default if none is specified. Q: Is there a User-Agent identifying 4MINDS on all requests? A: Yes. Every request to Databricks includes User-Agent: 4MINDSPlatform. See User-Agent Telemetry.

​Overview

​Diagrams

​Getting Started

​Prerequisites

​Connecting Your Databricks Workspace

​Option A: OAuth U2M (User-to-Machine) — Recommended

​Option B: Personal Access Token (PAT)

​Option C: Service Principal (OAuth M2M)

​Disconnecting

​Admin Setup for OAuth

​Admin Setup for Service Principal

​Authentication Methods

​Method 1: OAuth U2M (User-to-Machine) — Recommended

​Method 2: Personal Access Token (PAT)

​Method 3: OAuth M2M (Machine-to-Machine) — Service Principal

​Authentication Summary

​Multi-Tenant Credential Management

​Unity Catalog Integration

​Data Discovery

​What We Access

​SQL Warehouse Handling

​Importing Data

​Creating a Dataset with Databricks Data

​How Table Imports Work

​How Volume File Imports Work

​Combining with Other Sources

​Dataset Sync

​Overview

​How to Set Up Sync

​Sync Frequencies

​How It Works Internally

​Authentication for Automated Sync

​Dormant Mode

​Model Serving Endpoints

​Overview

​Discovering Endpoints

​Model Type Classification

​Registering an Endpoint as a 4MINDS Model

​Chatting with a Registered Endpoint

​Serverless GPU Compute

​What We Support

​Supported Accelerators

​How It Works

​Managed Environments

​Limitations

​Additional Capabilities

​Delta Sharing

​MLflow Integration

​Spark Job Submission

​User-Agent Telemetry

​Troubleshooting

​Security & Privacy

​Authentication

​Data Access

​FAQ

Overview

Diagrams

Getting Started

Prerequisites

Connecting Your Databricks Workspace

Option A: OAuth U2M (User-to-Machine) — Recommended

Option B: Personal Access Token (PAT)

Option C: Service Principal (OAuth M2M)

Disconnecting

Admin Setup for OAuth

Admin Setup for Service Principal

Authentication Methods

Method 1: OAuth U2M (User-to-Machine) — Recommended

Method 2: Personal Access Token (PAT)

Method 3: OAuth M2M (Machine-to-Machine) — Service Principal

Authentication Summary

Multi-Tenant Credential Management

Unity Catalog Integration

Data Discovery

What We Access

SQL Warehouse Handling

Importing Data

Creating a Dataset with Databricks Data

How Table Imports Work

How Volume File Imports Work

Combining with Other Sources

Dataset Sync

Overview

How to Set Up Sync

Sync Frequencies

How It Works Internally

Authentication for Automated Sync

Dormant Mode

Model Serving Endpoints

Overview

Discovering Endpoints

Model Type Classification

Registering an Endpoint as a 4MINDS Model

Chatting with a Registered Endpoint

Serverless GPU Compute

What We Support

Supported Accelerators

How It Works

Managed Environments

Limitations

Additional Capabilities

Delta Sharing

MLflow Integration

Spark Job Submission

User-Agent Telemetry

Troubleshooting

Security & Privacy

Authentication

Data Access

FAQ