Career Roadmap

Azure Data Engineer: Zero to Hero

This roadmap reflects the current Azure data engineering certification landscape. DP-203 (Azure Data Engineer Associate) retired March 31, 2025 and is no longer available. DP-700 (Microsoft Fabric Data Engineer Associate) is the current credential, built around the Microsoft Fabric unified analytics platform. DP-700 is fundamentally different from DP-203 — it tests Fabric, OneLake, Lakehouse architecture, KQL, and Real-Time Intelligence, not individual Azure services like Synapse, Data Factory, and Azure Data Lake Storage in isolation. The three domains are equally weighted at approximately 30-35% each. Updated for the April 20, 2026 skills outline revision.

10 steps2 certifications~5-7 months01-Jun-202623 views

Embark on your career roadmap by setting a target and staying accountable

Set target

Step 0 - Data engineering and programming foundations

Build the technical foundation that every Microsoft Fabric data engineering concept depends on. DP-700 is rated among the more demanding associate-level Microsoft exams — strong foundations accelerate every subsequent step.

3-4 weeks

Python and PySpark basics — writing ETL logic, data transformations, working with DataFrames, reading from and writing to storage
SQL fluency — SELECT, JOIN, GROUP BY, window functions, CTEs, MERGE statements, understanding query plans
KQL (Kusto Query Language) introduction — basic KQL syntax is essential for Fabric Real-Time Intelligence, which is new content with no DP-203 equivalent
Data formats — Parquet, Delta Lake, CSV, JSON, Avro — what each is optimized for, why Delta dominates Fabric workloads
Data engineering concepts — batch versus streaming, ELT versus ETL, medallion architecture (Bronze/Silver/Gold), idempotent pipeline design
Distributed computing basics — what Spark is, partitioning for parallelism, why columnar storage improves analytical query performance
Git fundamentals — version-controlled data engineering projects, branching, pull requests, CI/CD concepts for data pipelines

💡 KQL is genuinely new territory for engineers coming from SQL-only backgrounds. Microsoft Fabric's Real-Time Intelligence workload uses KQL as its primary query language. This is not optional for DP-700 — the exam tests KQL queries at a recognition and application level.

💡 Delta Lake format is central to Microsoft Fabric. OneLake stores data in Delta Parquet format by default. Understanding what Delta adds over plain Parquet (ACID transactions, time travel, schema evolution, optimized reads) is foundational knowledge for the entire roadmap.

💡 PySpark in Fabric Notebooks is the primary transformation engine for Lakehouse workloads. Candidates who only know SQL will find Notebook-based transformation scenarios harder than they should be.

Step 1 - Data fundamentals and Azure basics (DP-900 and AZ-900)

Build familiarity with core data concepts and the Azure platform before diving into Microsoft Fabric specifics. DP-900 is the natural starting point for candidates newer to data engineering.

2-3 weeks

Core data concepts — structured versus semi-structured versus unstructured data, OLTP versus OLAP, relational versus non-relational
Batch versus streaming data processing — when each is appropriate, latency trade-offs
Data roles — data engineer versus data analyst versus data scientist and how Fabric serves each role differently
Microsoft Fabric overview — the unified analytics platform, workloads (Data Engineering, Data Factory, Data Science, Data Warehouse, Real-Time Intelligence, Power BI)
Azure basics — resource groups, subscriptions, the relationship between Azure and Microsoft Fabric
Microsoft Fabric capacity — F-SKUs, trial capacity, capacity management basics

Certifications

DP-900 (DP-900)

💡 DP-900 (Microsoft Azure Data Fundamentals) is optional but recommended for candidates newer to data concepts. 40-60 questions, 45 minutes, 700/1000 passing score, no prerequisites required.

💡 Microsoft Fabric is not just an Azure service — it is a SaaS platform built on Azure with its own capacity model, OneLake storage, and workspace architecture. Candidates who only understand individual Azure services (Synapse, ADF, ADLS) without understanding how Fabric unifies them will find DP-700 confusing.

💡 Use ExamOS quizzes to confirm data fundamentals and Fabric platform understanding before beginning DP-700 domain-specific preparation.

Step 2 - Microsoft Fabric architecture and platform foundations

Build deep understanding of the Microsoft Fabric platform architecture before studying individual workloads. DP-700 assumes you understand how Fabric is organized before asking you to engineer within it.

2-3 weeks

OneLake — the unified logical data lake, one copy of data for all Fabric workloads, shortcuts to external data sources (ADLS Gen2, S3, Google Cloud Storage)
Fabric workspaces — workspace creation, capacity assignment, workspace roles (Admin, Member, Contributor, Viewer)
Fabric items — Lakehouses, Warehouses, Notebooks, Data Pipelines, Dataflows Gen2, Eventstreams, KQL Databases
Fabric capacity and SKUs — what capacity governs (compute, storage, concurrency), CU consumption, burst capacity
Fabric licensing — per-user versus capacity licensing, free trial capacity for lab work
Git integration for Fabric — connecting workspace to Azure DevOps or GitHub, branching strategy for workspace items, committing and syncing changes
Deployment pipelines — promoting content through development, test, and production stages, deployment rules for environment-specific configurations
Fabric security model — workspace roles, item permissions, row-level security, column-level security

Certifications

Microsoft Certified Fabric Data Engineer Associate (DP-700)

💡 OneLake is the most important Fabric architectural concept to internalize. Every Fabric workload reads from and writes to OneLake automatically. Data stored in a Lakehouse table is accessible from a Warehouse, a Notebook, or Power BI without copying it — they all point to the same Delta Parquet files in OneLake.

💡 Shortcuts are specifically tested in DP-700. A shortcut allows Fabric items to reference data stored outside OneLake (in ADLS Gen2, S3, or another Fabric workspace) without copying the data. Know when shortcuts are appropriate versus when data should be ingested into OneLake directly.

💡 Git integration for workspaces is heavily tested in Domain 1. Know how to configure a workspace to sync with an Azure DevOps or GitHub repository, what items are Git-enabled, and how to manage conflicts between workspace state and repository state.

💡 Deployment pipelines are tested in scenarios about promoting data solutions through environments. Know how deployment rules override item properties per stage and how pipeline permissions are configured.

Step 3 - DP-700 Domain 2 — Fabric Lakehouse and data ingestion (30-35%)

Build and populate Microsoft Fabric Lakehouses using batch and streaming ingestion patterns. Ingest and transform data is one of the two largest domains and the one with the most hands-on technical depth.

4-5 weeks

Fabric Lakehouse architecture — Files section versus Tables section, Delta Lake format for managed tables, unmanaged files
Medallion architecture implementation — Bronze (raw ingestion), Silver (cleansed and validated), Gold (business-ready) layers in Lakehouse
Data Factory pipelines in Fabric — pipeline activities (Copy Data, Notebook, Stored Procedure, Web, Script), linked services, datasets
Copy Data activity — supported sources and destinations, schema mapping, column mapping, incremental copy patterns
Dataflows Gen2 — Power Query-based transformations, connecting to data sources, loading to Lakehouse tables or Warehouse, refresh scheduling
Fabric Notebooks for batch processing — PySpark transformations, reading from and writing to Lakehouse tables, Delta operations (merge, upsert, delete)
Delta Lake operations — MERGE INTO for upserts, time travel with VERSION AS OF and TIMESTAMP AS OF, VACUUM for log cleanup, OPTIMIZE for file compaction
Streaming ingestion — Fabric Eventstream for real-time data ingestion from Event Hubs, Kafka, and custom sources
Data validation — schema enforcement in Delta tables, constraint checking in Notebooks, data quality rules in Dataflows

Certifications

Microsoft Certified Fabric Data Engineer Associate (DP-700)

💡 Medallion architecture is central to DP-700. The exam presents data engineering scenarios and asks candidates to design or identify the correct medallion layer for described data characteristics. Bronze is raw, unmodified source data. Silver applies cleansing, validation, and standardization. Gold is aggregated and business-optimized for reporting.

💡 Delta Lake MERGE INTO is the primary pattern for implementing incremental loads — inserting new records and updating existing ones in a single operation. Know the WHEN MATCHED, WHEN NOT MATCHED, and WHEN NOT MATCHED BY SOURCE clauses and when each is used.

💡 Dataflows Gen2 uses Power Query under the hood. The exam tests when Dataflows Gen2 is appropriate (non-technical users, Power Query familiarity, moderate data volumes) versus when Notebooks with PySpark are more appropriate (complex transformations, large data volumes, custom logic).

💡 Delta time travel is tested in data recovery and audit scenarios. Know that `VERSION AS OF` and `TIMESTAMP AS OF` allow querying historical table states, and that VACUUM removes old Delta log files (defaulting to retaining 7 days of history).

💡 Use ExamOS for data ingestion scenario practice that tests choosing between pipeline activities, Dataflows Gen2, and Notebooks for described data characteristics and transformation requirements.

Step 4 - DP-700 Domain 2 — Fabric Warehouse and SQL analytics (30-35%)

Build and query Fabric Warehouses using T-SQL for structured analytical workloads where a relational model is more appropriate than Lakehouse.

3-4 weeks

Fabric Warehouse architecture — Synapse-based compute, columnar storage in OneLake, T-SQL endpoint
Fabric Warehouse versus Lakehouse — when to use each, what determines the choice in DP-700 scenarios
T-SQL in Fabric Warehouse — DDL (CREATE TABLE, CREATE VIEW), DML (INSERT, UPDATE, DELETE, MERGE), stored procedures
Loading data into Warehouse — COPY INTO from external files, pipeline integration, cross-warehouse queries
Warehouse table design — distribution considerations, when clustered columnstore indexes help, partitioning
SQL analytics endpoint — read-only SQL access to Lakehouse Delta tables, creating views and stored procedures against Lakehouse data
Cross-workspace queries — querying tables in multiple Lakehouses and Warehouses from a single SQL endpoint
Lakehouse versus Warehouse for Power BI — connecting Power BI to both, DirectLake mode for Lakehouse

Certifications

Microsoft Certified Fabric Data Engineer Associate (DP-700)

💡 Fabric Warehouse versus Fabric Lakehouse is the design decision that appears most frequently across DP-700 scenarios. Warehouse is appropriate when T-SQL is the primary development language, when strict schema enforcement is required, and when the team has a relational database background. Lakehouse is appropriate when Spark/Python is the primary tool, when schema-on-read flexibility is needed, and when the data is primarily semi-structured or unstructured.

💡 The SQL analytics endpoint is specifically tested. It provides read-only T-SQL access to Lakehouse Delta tables without moving data. A data analyst who only knows SQL can query Lakehouse data through the SQL analytics endpoint without needing Spark access.

💡 DirectLake mode for Power BI is a key differentiator from traditional Import and DirectQuery modes. DirectLake reads Delta Parquet files from OneLake directly, providing import-speed performance without the data duplication of import mode.

💡 Use ExamOS for Warehouse scenario practice that tests schema design decisions and choosing between Lakehouse and Warehouse for described analytical workloads.

Step 5 - DP-700 Domain 2 — Real-Time Intelligence and KQL (30-35%)

Build real-time data engineering solutions using Fabric Eventstreams, KQL Databases, and the Real-Time Intelligence workload. This is entirely new content with no DP-203 equivalent.

3-4 weeks

Fabric Eventstream — creating and configuring eventstreams, source types (Azure Event Hubs, Azure IoT Hub, Kafka, Custom App), routing to destinations
Eventstream destinations — KQL Database, Lakehouse, Warehouse, Activator, Derived Stream
KQL Database — the managed database for real-time analytics, data ingestion methods (one-click, Eventstream, pipeline)
KQL (Kusto Query Language) — query syntax, summarize for aggregations, extend for computed columns, where and project for filtering and selecting, join types, render for visualization
KQL time-series operations — bin() for time bucketing, make-series for trend analysis, series_decompose for anomaly detection
Real-Time hub — discovering and connecting to real-time data sources across the organization
Fabric Activator — setting alerts and actions based on real-time data conditions, when to use Activator versus Logic Apps

Certifications

Microsoft Certified Fabric Data Engineer Associate (DP-700)

💡 KQL is non-negotiable for DP-700. Microsoft has confirmed that Real-Time Intelligence is tested across the exam domains, not confined to a single topic. Candidates who skip KQL preparation find significant gaps on exam day.

💡 The fundamental KQL query structure appears in scenario questions - `TableName | where Column == "value" | summarize count() by OtherColumn | order by count_ desc`. Know this pattern and be able to interpret queries that use it.

💡 Eventstream routing scenarios test which destination is appropriate for described requirements. Data landing in Lakehouse for batch analytics, KQL Database for real-time queries, and Activator for event-triggered actions are the primary routing patterns.

💡 The `bin()` function for time bucketing is the KQL function most consistently tested in time-series scenarios. `summarize count() by bin(Timestamp, 1h)` groups events into one-hour buckets for trend analysis.

💡 Use ExamOS for Real-Time Intelligence scenario practice that tests KQL query interpretation and Eventstream destination selection.

Step 6 - DP-700 Domain 1 — Implement and manage an analytics solution (30-35%)

Configure and manage Microsoft Fabric workspaces, implement security and governance controls, manage the analytics solution lifecycle, and prepare data for consumption. Domain 1 is the operational and governance domain.

3-4 weeks

Workspace configuration — capacity assignment, workspace settings, OneLake storage configuration, Git repository connection
Workspace lifecycle management — development, test, and production stages, deployment pipelines, promotion workflows
Fabric security architecture — workspace roles, item-level permissions, row-level security (RLS) in Lakehouses and Warehouses, column-level security, object-level security
Microsoft Purview integration — scanning Fabric items for sensitive data discovery, classifying Lakehouse tables, data lineage tracking across Fabric workloads
Microsoft Purview Data Map — cataloging Fabric data assets, sensitivity labels applied to Lakehouse tables and Warehouse schemas
OneLake data access control — managed versus unmanaged access, shortcut permissions
Fabric capacity management — monitoring CU consumption, smoothing, throttling behavior, pausing capacity
Monitoring Fabric workloads — Monitoring Hub for pipeline runs, Notebook executions, and data loading activities

Certifications

Microsoft Certified Fabric Data Engineer Associate (DP-700)

💡 Microsoft Purview integration with Fabric is heavily tested and is entirely new content with no DP-203 equivalent. Know how Purview scans Fabric Lakehouses and Warehouses to discover and classify sensitive data, how sensitivity labels propagate to downstream items, and how data lineage is tracked across pipeline runs.

💡 Row-level security (RLS) in Fabric is implemented differently in Lakehouses versus Warehouses. In Warehouses, RLS uses T-SQL predicates on security policy objects (similar to Azure SQL). In Lakehouses, RLS is applied through the SQL analytics endpoint. Know the implementation approach for both.

💡 Deployment pipeline scenarios test the ability to promote workspace content through stages. Know what deployment rules do (override item properties per stage, such as connection strings or data source references), how to configure them, and when automatic versus manual deployment is appropriate.

💡 Fabric Monitoring Hub is the primary operational monitoring surface for DP-700. Know what information it provides for pipeline runs (status, duration, data read/written, activity details) and how to navigate to specific failed activity details.

💡 Use ExamOS for workspace management and security scenario practice that tests permission design decisions and Purview governance configuration.

Step 7 - DP-700 Domain 3 — Monitor and optimize an analytics solution (30-35%)

Monitor running data solutions, troubleshoot failures, optimize performance, and manage costs. The third domain at 30-35% covers everything that happens after the solution is built.

2-3 weeks

Fabric Monitoring Hub — monitoring pipeline runs, Notebook executions, Warehouse queries, KQL ingestion
Performance optimization for Lakehouses — Delta file compaction (OPTIMIZE command), Z-ordering for multi-column filtering, partitioning Lakehouse tables
Spark performance optimization — cluster configuration in Fabric, executor sizing, caching DataFrames, broadcast joins for small tables
Warehouse performance — statistics management (automatic versus manual), query store for identifying slow queries, result set caching
KQL Database performance — ingestion time, query performance analysis, materialized views for pre-aggregation
Cost optimization — monitoring CU consumption per workspace and per item, identifying expensive Notebook jobs, right-sizing Spark configurations
Troubleshooting pipeline failures — identifying failed activities in Monitoring Hub, reading activity error messages, configuring retry policies
Troubleshooting Notebook failures — reading Spark error messages, identifying data skew, handling schema evolution errors
Delta Lake maintenance — OPTIMIZE for file compaction, VACUUM for log cleanup, identifying when maintenance is needed

Certifications

Microsoft Certified Fabric Data Engineer Associate (DP-700)

💡 Delta OPTIMIZE command is specifically tested in performance scenarios. As a Lakehouse grows with many small Delta files from frequent writes, query performance degrades. OPTIMIZE compacts small files into larger ones and optionally applies Z-ordering for multi-dimensional filtering performance.

💡 Z-ordering in Delta is tested in specific filter optimization scenarios. If queries frequently filter on both `Region` and `ProductCategory`, `OPTIMIZE table ZORDER BY (Region, ProductCategory)` collocates data with the same filter values in the same files. This reduces the data scanned per query.

💡 CU consumption monitoring appears in cost optimization scenarios. Fabric charges based on capacity unit consumption. Identifying which workloads consume the most CUs and optimizing them (reducing Spark concurrency, scheduling non-urgent jobs during off-peak hours) are operational skills the exam tests.

💡 Use ExamOS for monitoring and optimization scenario practice that presents symptoms of pipeline failures or performance degradation and asks for the most appropriate diagnostic or remediation action.

Step 8 - Exam readiness and follow-on paths

Consolidate DP-700 preparation through integrated scenario practice, case study technique, and targeted gap closure before booking the exam.

2-3 weeks

Case study question technique — DP-700 includes case studies with multiple related questions drawing on an extended organizational scenario
Domain-weighted gap analysis — equal domain weights mean no domain can be neglected
End-to-end architecture scenarios — designing a complete Fabric solution from ingestion through transformation to consumption
Exam format familiarization — interactive components, performance-based questions alongside multiple choice
Follow-on credential paths — DP-600 (Fabric Analytics Engineer), DP-100 (Azure Data Scientist), AZ-104 (Azure Administrator) for infrastructure depth

Certifications

Microsoft Certified Fabric Data Engineer Associate (DP-700)

💡 Microsoft Learn documentation is available during the DP-700 exam via split screen. Use it for verification on genuine uncertainty, not as a substitute for preparation. Time pressure (100 minutes for up to 60 questions) makes extensive searching impractical.

💡 Consistent performance above 80% on Legend mode across five or more consecutive ExamOS sessions is the clearest DP-700 readiness signal. All three domains carry equal weight — stable scores across all three matter more than strength in one.

💡 DP-600 (Fabric Analytics Engineer) is the natural follow-on for data engineers who also work with semantic models and Power BI. AZ-104 is worth pursuing for data engineers who want infrastructure depth alongside their Fabric credentials.

Final step - Certification, validation, and the platform shift

The most important message for anyone following an Azure data engineering path in 2026: DP-203 does not exist anymore. It retired March 31, 2025. Any study material, practice exam, or roadmap that references DP-203 as a current or schedulable exam is out of date. DP-700 is a fundamentally different exam on a fundamentally different platform. If you have existing Azure data engineering experience (Synapse, ADF, ADLS), much of that knowledge transfers — but OneLake, Lakehouse architecture, KQL, Eventstreams, and deployment pipelines are genuinely new and require dedicated study. Before booking DP-700, ensure hands-on experience in Microsoft Fabric across all three exam domains — Lakehouse pipelines, Warehouse T-SQL, and Real-Time Intelligence with KQL. Use ExamOS practice to measure readiness objectively before you book.

Certifications

DP-900 (DP-900)

Microsoft Certified Fabric Data Engineer Associate (DP-700)

Realistic timeline

2 hours per day: approximately 5-7 months for the complete path
3-4 hours per day: approximately 3.5-5 months
Candidates with strong Azure Synapse or ADF experience: approximately 8-12 weeks — focus time on OneLake architecture, KQL, and Real-Time Intelligence which have no DP-203 equivalent
Candidates new to Microsoft data engineering: allow the full 5-7 month timeline
All three DP-700 domains carry equal weight — preparation time should be distributed roughly evenly across Steps 3-7
Hands-on Fabric experience is essential — sign up for the free 60-day Microsoft Fabric trial and build a complete Lakehouse pipeline, a Warehouse, and a KQL Database before your exam
KQL is the most underestimated preparation topic — invest specific dedicated time on Kusto Query Language even if SQL is strong
Consistency across daily sessions produces better DP-700 outcomes than occasional marathon sessions

Embark on your career roadmap by setting a target and staying accountable

Set target

Share your feedback

Azure Data Engineer: Zero to Hero

Step 0 - Data engineering and programming foundations

Step 1 - Data fundamentals and Azure basics (DP-900 and AZ-900)

Step 2 - Microsoft Fabric architecture and platform foundations

Step 3 - DP-700 Domain 2 — Fabric Lakehouse and data ingestion (30-35%)

Step 4 - DP-700 Domain 2 — Fabric Warehouse and SQL analytics (30-35%)

Step 5 - DP-700 Domain 2 — Real-Time Intelligence and KQL (30-35%)

Step 6 - DP-700 Domain 1 — Implement and manage an analytics solution (30-35%)

Step 7 - DP-700 Domain 3 — Monitor and optimize an analytics solution (30-35%)

Step 8 - Exam readiness and follow-on paths

Final step - Certification, validation, and the platform shift

Realistic timeline