Skip to content

Detailed comparison with microsoft/dbt-fabricspark

This report provides a detailed technical comparison between the FabricSpark adapter in this package and Microsoft's dedicated dbt-fabricspark repository. Both target the same compute engine -- Microsoft Fabric Lakehouse with Spark SQL via Livy sessions -- but take fundamentally different architectural approaches.

dbt-fabric-samdebruyn microsoft/dbt-fabricspark
PyPI package dbt-fabric-samdebruyn[spark] dbt-fabricspark

Architecture

This is the most significant difference and influences nearly every other comparison point.

This adapter: multiple inheritance from dbt-spark

This adapter's FabricSpark adapter uses multiple inheritance: FabricSparkAdapter(BaseFabricAdapter, SparkAdapter). It inherits from dbt-spark's SparkAdapter and a shared BaseFabricAdapter also used by the T-SQL adapter.

  • Plugin registration declares dependencies=["spark"], so dbt-spark's macros are available at runtime.
  • Adapter code is thin because it delegates heavily to dbt-spark and the shared base.
  • Macros are primarily overrides of dbt-spark macros for Fabric-specific behavior.

Upstream: standalone SQLAdapter

The upstream is fully standalone: FabricSparkAdapter(SQLAdapter). No dbt-spark dependency.

  • Plugin registration has no dependencies -- all Spark SQL behavior is self-contained.
  • Adapter code is significantly larger because it reimplements everything dbt-spark would provide.
  • Macros include utility functions normally inherited from dbt-spark.
Aspect dbt-fabric-samdebruyn microsoft/dbt-fabricspark
Code reuse High (inherits dbt-spark + shared base) None (self-contained)
Maintenance burden Lower per-adapter, coupled to dbt-spark Higher, no external coupling
dbt-spark compatibility Automatic (inherits macros/behaviors) Manual (must reimplement)

Features

Materializations

Materialization dbt-fabric-samdebruyn microsoft/dbt-fabricspark
Table ✅ ✅
View ✅ ✅
Incremental append, merge, insert_overwrite, microbatch append, merge, insert_overwrite, microbatch
Snapshot ✅ ✅
Ephemeral ✅ ✅
Materialized View / Lake View ✅ (standard dbt MV pattern) ✅ (Fabric-specific MLV with REST API refresh)
Clone ✅ ✅
Seed ✅ ✅

Notable differences:

  • Materialized Lake View: The upstream uses Fabric REST API for on-demand and scheduled refresh. This adapter uses standard CREATE OR REPLACE without REST API calls.

Authentication methods

Method dbt-fabric-samdebruyn microsoft/dbt-fabricspark
Azure CLI ✅ ✅
Service Principal ✅ ✅
Token Credential ✅ ✅
Workload Identity ✅ (federated OIDC) ❌
Static Access Token ✅ ✅
Fabric Notebook ✅ ✅

Livy session management

Feature dbt-fabric-samdebruyn microsoft/dbt-fabricspark
High-concurrency Livy ✅ ✅
Session reuse Deterministic session tag (HC) Via session_id_file + reuse_session flag (singleton) / deterministic session tag (HC)
HC session cleanup Connection manager close() path atexit handler (fragile — see Code quality)
Polling interval Fixed 3 seconds Adaptive (configurable)
Session idle timeout 15 min default 30 min default, configurable
Local Livy mode ❌ ✅ (livy_mode: local)
Statement timeout 24 hours 12 hours (configurable)
Thread-safe token refresh ❌ ✅ (_token_lock)

Unique to this adapter

Feature Description
Purview integration Sync dbt metadata to Microsoft Purview
Python model support Submit Python models to Livy
Workload identity auth Federated OIDC for CI/CD
Shared T-SQL + Spark One package, two adapters
Capability declarations SchemaMetadataByRelations, TableLastModifiedMetadata
PEP 249 cursor Proper type conversion for all Spark SQL types

Unique to upstream

Feature Description
MLV REST API On-demand refresh, scheduled refresh via Fabric API
OneLake shortcuts ShortcutClient for shortcut CRUD
Local Livy mode Connect to local Livy for development
Credential validation UUID format, HTTPS domain whitelist

Lakehouse schema support

Aspect dbt-fabric-samdebruyn microsoft/dbt-fabricspark
Schema detection Via dbt-spark Auto-detected via API, process-level cache
Schema-enabled naming Always 3-part Dynamic: 3-part or 2-part based on detection
Non-schema mode Not explicitly handled Full support with identifier prefixing

Test suite

Aspect dbt-fabric-samdebruyn microsoft/dbt-fabricspark
Testing approach Integration tests against real Fabric Unit tests (mock) + functional tests (real infra)
dbt-tests-adapter coverage Broad (standard adapter base classes) Narrower (custom test suite)
Community package tests ✅ ❌

dbt Core compatibility

For supported dbt-core and Python versions, see the compatibility page.


dbt best practices

Practice dbt-fabric-samdebruyn microsoft/dbt-fabricspark
Inherits official base ✅ (SparkAdapter + BaseFabricAdapter) Partially (SQLAdapter only)
Capability declarations ✅ ❌
@available methods ✅ (inherited) ✅ (MLV, schema detection)
Plugin dependencies dependencies=["spark"] None
Dispatch fallback dbt-spark macros available Must reimplement everything

Maturity

dbt-fabric-samdebruyn microsoft/dbt-fabricspark
Python 3.11-3.13 3.10-3.13
Documentation Dedicated docs site README + CONTRIBUTING.md
Code style ruff, PEP 604 ruff, older typing style
License MIT MIT

Code quality

A detailed review of the upstream's Python source code reveals several significant issues that affect reliability and maintainability.

Global mutable state

The upstream stores critical runtime state in module-level and class-level global variables — authentication tokens, Livy session handles, connection managers, and relation configuration are all shared across threads via globals or ClassVar attributes. This leads to data races in multi-threaded dbt runs (e.g., reading a token after releasing its lock, mutating session state outside locks). This adapter uses instance-based encapsulation with no module-level mutable state.

atexit handler for session cleanup

The upstream registers atexit handlers at module import time (in both singleton_livy.py and concurrent_livy.py) to delete Livy sessions and HC sessions on process exit. This is fragile: atexit handlers run in undefined order, logging/network may already be torn down, and merely importing the module registers the handler even if no session was created. The HC implementation adds a second atexit handler with a global _active_sessions set, compounding the global mutable state problem.

This adapter manages session lifecycle through dbt's normal connection manager close() path.

Exception swallowing

Both LivySession.__exit__ and LivyCursor.__exit__ return True (livysession.py lines 489-495, 855-859), which suppresses all exceptions — including database errors, timeouts, and KeyboardInterrupt — inside any with block using these objects.

Regex bug in SQL sanitization

_getLivySQL() passes re.DOTALL as the count parameter instead of flags=re.DOTALL, silently limiting comment-stripping to 16 replacements instead of enabling multiline matching.

Dead code and copy-paste artifacts

  • Thrift exception handling (connections.py lines 97-113): References thrift_resp.status.errorMessage, a pattern from Apache Thrift used by dbt-spark. This adapter uses Livy over HTTP, not Thrift — this code path is dead.
  • AWS logging (connections.py lines 39-46): Sets botocore and boto3 (AWS libraries) to DEBUG level at import time. These are leftovers from a Spark/Databricks ancestor.
  • Hardcoded 2028 timestamp (livysession.py lines 194-198): The int_tests auth path creates a token with expires_on = 1845972874 (a date in 2028), bypassing all token refresh logic.
  • Duplicated functions: _parse_retry_after is copied identically in both livysession.py and mlv_api.py, using the deprecated datetime.utcnow().
  • Dead parameter: get_headers() has a tokenPrint parameter that logs the full bearer token when True, but is never called with True.

Inconsistent style

The upstream mixes camelCase (tokenPrint, accessToken, _submitLivyCode, _getLivySQL) with snake_case throughout. Pre-3.9 typing aliases (Dict, List, Optional, Union) are used despite targeting Python 3.13.


Summary

This adapter takes a code-reuse approach (thin adapter on dbt-spark), while the upstream takes a self-contained approach (everything reimplemented). The fork's approach results in significantly less code with proper instance-based lifecycle management and no global mutable state.

The upstream has more Fabric-specific features (MLV REST API refresh, OneLake shortcuts, local Livy mode), while this adapter offers broader dbt ecosystem integration (dbt-spark inheritance, Purview, capability declarations, shared T-SQL + Spark in one package) and significantly higher code quality.