Career Roadmap
Azure Data Engineer: Zero to Hero
This roadmap reflects the current Azure data engineering certification landscape. DP-203 (Azure Data Engineer Associate) retired March 31, 2025 and is no longer available. DP-700 (Microsoft Fabric Data Engineer Associate) is the current credential, built around the Microsoft Fabric unified analytics platform. DP-700 is fundamentally different from DP-203 — it tests Fabric, OneLake, Lakehouse architecture, KQL, and Real-Time Intelligence, not individual Azure services like Synapse, Data Factory, and Azure Data Lake Storage in isolation. The three domains are equally weighted at approximately 30-35% each. Updated for the April 20, 2026 skills outline revision.
Embark on your career roadmap by setting a target and staying accountable
Set targetStep 0 - Data engineering and programming foundations
Build the technical foundation that every Microsoft Fabric data engineering concept depends on. DP-700 is rated among the more demanding associate-level Microsoft exams — strong foundations accelerate every subsequent step.
3-4 weeks3-4 weeks
Step 0 - Data engineering and programming foundations
Build the technical foundation that every Microsoft Fabric data engineering concept depends on. DP-700 is rated among the more demanding associate-level Microsoft exams — strong foundations accelerate every subsequent step.
- Python and PySpark basics — writing ETL logic, data transformations, working with DataFrames, reading from and writing to storage
- SQL fluency — SELECT, JOIN, GROUP BY, window functions, CTEs, MERGE statements, understanding query plans
- KQL (Kusto Query Language) introduction — basic KQL syntax is essential for Fabric Real-Time Intelligence, which is new content with no DP-203 equivalent
- Data formats — Parquet, Delta Lake, CSV, JSON, Avro — what each is optimized for, why Delta dominates Fabric workloads
- Data engineering concepts — batch versus streaming, ELT versus ETL, medallion architecture (Bronze/Silver/Gold), idempotent pipeline design
- Distributed computing basics — what Spark is, partitioning for parallelism, why columnar storage improves analytical query performance
- Git fundamentals — version-controlled data engineering projects, branching, pull requests, CI/CD concepts for data pipelines
💡 KQL is genuinely new territory for engineers coming from SQL-only backgrounds. Microsoft Fabric's Real-Time Intelligence workload uses KQL as its primary query language. This is not optional for DP-700 — the exam tests KQL queries at a recognition and application level.
💡 Delta Lake format is central to Microsoft Fabric. OneLake stores data in Delta Parquet format by default. Understanding what Delta adds over plain Parquet (ACID transactions, time travel, schema evolution, optimized reads) is foundational knowledge for the entire roadmap.
💡 PySpark in Fabric Notebooks is the primary transformation engine for Lakehouse workloads. Candidates who only know SQL will find Notebook-based transformation scenarios harder than they should be.
Step 1 - Data fundamentals and Azure basics (DP-900 and AZ-900)
Build familiarity with core data concepts and the Azure platform before diving into Microsoft Fabric specifics. DP-900 is the natural starting point for candidates newer to data engineering.
2-3 weeks2-3 weeks
Step 1 - Data fundamentals and Azure basics (DP-900 and AZ-900)
Build familiarity with core data concepts and the Azure platform before diving into Microsoft Fabric specifics. DP-900 is the natural starting point for candidates newer to data engineering.
- Core data concepts — structured versus semi-structured versus unstructured data, OLTP versus OLAP, relational versus non-relational
- Batch versus streaming data processing — when each is appropriate, latency trade-offs
- Data roles — data engineer versus data analyst versus data scientist and how Fabric serves each role differently
- Microsoft Fabric overview — the unified analytics platform, workloads (Data Engineering, Data Factory, Data Science, Data Warehouse, Real-Time Intelligence, Power BI)
- Azure basics — resource groups, subscriptions, the relationship between Azure and Microsoft Fabric
- Microsoft Fabric capacity — F-SKUs, trial capacity, capacity management basics
Certifications
💡 DP-900 (Microsoft Azure Data Fundamentals) is optional but recommended for candidates newer to data concepts. 40-60 questions, 45 minutes, 700/1000 passing score, no prerequisites required.
💡 Microsoft Fabric is not just an Azure service — it is a SaaS platform built on Azure with its own capacity model, OneLake storage, and workspace architecture. Candidates who only understand individual Azure services (Synapse, ADF, ADLS) without understanding how Fabric unifies them will find DP-700 confusing.
💡 Use ExamOS quizzes to confirm data fundamentals and Fabric platform understanding before beginning DP-700 domain-specific preparation.
Step 2 - Microsoft Fabric architecture and platform foundations
Build deep understanding of the Microsoft Fabric platform architecture before studying individual workloads. DP-700 assumes you understand how Fabric is organized before asking you to engineer within it.
2-3 weeks2-3 weeks
Step 2 - Microsoft Fabric architecture and platform foundations
Build deep understanding of the Microsoft Fabric platform architecture before studying individual workloads. DP-700 assumes you understand how Fabric is organized before asking you to engineer within it.
- OneLake — the unified logical data lake, one copy of data for all Fabric workloads, shortcuts to external data sources (ADLS Gen2, S3, Google Cloud Storage)
- Fabric workspaces — workspace creation, capacity assignment, workspace roles (Admin, Member, Contributor, Viewer)
- Fabric items — Lakehouses, Warehouses, Notebooks, Data Pipelines, Dataflows Gen2, Eventstreams, KQL Databases
- Fabric capacity and SKUs — what capacity governs (compute, storage, concurrency), CU consumption, burst capacity
- Fabric licensing — per-user versus capacity licensing, free trial capacity for lab work
- Git integration for Fabric — connecting workspace to Azure DevOps or GitHub, branching strategy for workspace items, committing and syncing changes
- Deployment pipelines — promoting content through development, test, and production stages, deployment rules for environment-specific configurations
- Fabric security model — workspace roles, item permissions, row-level security, column-level security
💡 OneLake is the most important Fabric architectural concept to internalize. Every Fabric workload reads from and writes to OneLake automatically. Data stored in a Lakehouse table is accessible from a Warehouse, a Notebook, or Power BI without copying it — they all point to the same Delta Parquet files in OneLake.
💡 Shortcuts are specifically tested in DP-700. A shortcut allows Fabric items to reference data stored outside OneLake (in ADLS Gen2, S3, or another Fabric workspace) without copying the data. Know when shortcuts are appropriate versus when data should be ingested into OneLake directly.
💡 Git integration for workspaces is heavily tested in Domain 1. Know how to configure a workspace to sync with an Azure DevOps or GitHub repository, what items are Git-enabled, and how to manage conflicts between workspace state and repository state.
💡 Deployment pipelines are tested in scenarios about promoting data solutions through environments. Know how deployment rules override item properties per stage and how pipeline permissions are configured.
Step 3 - DP-700 Domain 2 — Fabric Lakehouse and data ingestion (30-35%)
Build and populate Microsoft Fabric Lakehouses using batch and streaming ingestion patterns. Ingest and transform data is one of the two largest domains and the one with the most hands-on technical depth.
4-5 weeks4-5 weeks
Step 3 - DP-700 Domain 2 — Fabric Lakehouse and data ingestion (30-35%)
Build and populate Microsoft Fabric Lakehouses using batch and streaming ingestion patterns. Ingest and transform data is one of the two largest domains and the one with the most hands-on technical depth.
- Fabric Lakehouse architecture — Files section versus Tables section, Delta Lake format for managed tables, unmanaged files
- Medallion architecture implementation — Bronze (raw ingestion), Silver (cleansed and validated), Gold (business-ready) layers in Lakehouse
- Data Factory pipelines in Fabric — pipeline activities (Copy Data, Notebook, Stored Procedure, Web, Script), linked services, datasets
- Copy Data activity — supported sources and destinations, schema mapping, column mapping, incremental copy patterns
- Dataflows Gen2 — Power Query-based transformations, connecting to data sources, loading to Lakehouse tables or Warehouse, refresh scheduling
- Fabric Notebooks for batch processing — PySpark transformations, reading from and writing to Lakehouse tables, Delta operations (merge, upsert, delete)
- Delta Lake operations — MERGE INTO for upserts, time travel with VERSION AS OF and TIMESTAMP AS OF, VACUUM for log cleanup, OPTIMIZE for file compaction
- Streaming ingestion — Fabric Eventstream for real-time data ingestion from Event Hubs, Kafka, and custom sources
- Data validation — schema enforcement in Delta tables, constraint checking in Notebooks, data quality rules in Dataflows
💡 Medallion architecture is central to DP-700. The exam presents data engineering scenarios and asks candidates to design or identify the correct medallion layer for described data characteristics. Bronze is raw, unmodified source data. Silver applies cleansing, validation, and standardization. Gold is aggregated and business-optimized for reporting.
💡 Delta Lake MERGE INTO is the primary pattern for implementing incremental loads — inserting new records and updating existing ones in a single operation. Know the WHEN MATCHED, WHEN NOT MATCHED, and WHEN NOT MATCHED BY SOURCE clauses and when each is used.
💡 Dataflows Gen2 uses Power Query under the hood. The exam tests when Dataflows Gen2 is appropriate (non-technical users, Power Query familiarity, moderate data volumes) versus when Notebooks with PySpark are more appropriate (complex transformations, large data volumes, custom logic).
💡 Delta time travel is tested in data recovery and audit scenarios. Know that `VERSION AS OF` and `TIMESTAMP AS OF` allow querying historical table states, and that VACUUM removes old Delta log files (defaulting to retaining 7 days of history).
💡 Use ExamOS for data ingestion scenario practice that tests choosing between pipeline activities, Dataflows Gen2, and Notebooks for described data characteristics and transformation requirements.
Step 4 - DP-700 Domain 2 — Fabric Warehouse and SQL analytics (30-35%)
Build and query Fabric Warehouses using T-SQL for structured analytical workloads where a relational model is more appropriate than Lakehouse.
3-4 weeks3-4 weeks
Step 4 - DP-700 Domain 2 — Fabric Warehouse and SQL analytics (30-35%)
Build and query Fabric Warehouses using T-SQL for structured analytical workloads where a relational model is more appropriate than Lakehouse.
- Fabric Warehouse architecture — Synapse-based compute, columnar storage in OneLake, T-SQL endpoint
- Fabric Warehouse versus Lakehouse — when to use each, what determines the choice in DP-700 scenarios
- T-SQL in Fabric Warehouse — DDL (CREATE TABLE, CREATE VIEW), DML (INSERT, UPDATE, DELETE, MERGE), stored procedures
- Loading data into Warehouse — COPY INTO from external files, pipeline integration, cross-warehouse queries
- Warehouse table design — distribution considerations, when clustered columnstore indexes help, partitioning
- SQL analytics endpoint — read-only SQL access to Lakehouse Delta tables, creating views and stored procedures against Lakehouse data
- Cross-workspace queries — querying tables in multiple Lakehouses and Warehouses from a single SQL endpoint
- Lakehouse versus Warehouse for Power BI — connecting Power BI to both, DirectLake mode for Lakehouse
💡 Fabric Warehouse versus Fabric Lakehouse is the design decision that appears most frequently across DP-700 scenarios. Warehouse is appropriate when T-SQL is the primary development language, when strict schema enforcement is required, and when the team has a relational database background. Lakehouse is appropriate when Spark/Python is the primary tool, when schema-on-read flexibility is needed, and when the data is primarily semi-structured or unstructured.
💡 The SQL analytics endpoint is specifically tested. It provides read-only T-SQL access to Lakehouse Delta tables without moving data. A data analyst who only knows SQL can query Lakehouse data through the SQL analytics endpoint without needing Spark access.
💡 DirectLake mode for Power BI is a key differentiator from traditional Import and DirectQuery modes. DirectLake reads Delta Parquet files from OneLake directly, providing import-speed performance without the data duplication of import mode.
💡 Use ExamOS for Warehouse scenario practice that tests schema design decisions and choosing between Lakehouse and Warehouse for described analytical workloads.
Step 5 - DP-700 Domain 2 — Real-Time Intelligence and KQL (30-35%)
Build real-time data engineering solutions using Fabric Eventstreams, KQL Databases, and the Real-Time Intelligence workload. This is entirely new content with no DP-203 equivalent.
3-4 weeks3-4 weeks
Step 5 - DP-700 Domain 2 — Real-Time Intelligence and KQL (30-35%)
Build real-time data engineering solutions using Fabric Eventstreams, KQL Databases, and the Real-Time Intelligence workload. This is entirely new content with no DP-203 equivalent.
- Fabric Eventstream — creating and configuring eventstreams, source types (Azure Event Hubs, Azure IoT Hub, Kafka, Custom App), routing to destinations
- Eventstream destinations — KQL Database, Lakehouse, Warehouse, Activator, Derived Stream
- KQL Database — the managed database for real-time analytics, data ingestion methods (one-click, Eventstream, pipeline)
- KQL (Kusto Query Language) — query syntax, summarize for aggregations, extend for computed columns, where and project for filtering and selecting, join types, render for visualization
- KQL time-series operations — bin() for time bucketing, make-series for trend analysis, series_decompose for anomaly detection
- Real-Time hub — discovering and connecting to real-time data sources across the organization
- Fabric Activator — setting alerts and actions based on real-time data conditions, when to use Activator versus Logic Apps
💡 KQL is non-negotiable for DP-700. Microsoft has confirmed that Real-Time Intelligence is tested across the exam domains, not confined to a single topic. Candidates who skip KQL preparation find significant gaps on exam day.
💡 The fundamental KQL query structure appears in scenario questions - `TableName | where Column == "value" | summarize count() by OtherColumn | order by count_ desc`. Know this pattern and be able to interpret queries that use it.
💡 Eventstream routing scenarios test which destination is appropriate for described requirements. Data landing in Lakehouse for batch analytics, KQL Database for real-time queries, and Activator for event-triggered actions are the primary routing patterns.
💡 The `bin()` function for time bucketing is the KQL function most consistently tested in time-series scenarios. `summarize count() by bin(Timestamp, 1h)` groups events into one-hour buckets for trend analysis.
💡 Use ExamOS for Real-Time Intelligence scenario practice that tests KQL query interpretation and Eventstream destination selection.
Step 6 - DP-700 Domain 1 — Implement and manage an analytics solution (30-35%)
Configure and manage Microsoft Fabric workspaces, implement security and governance controls, manage the analytics solution lifecycle, and prepare data for consumption. Domain 1 is the operational and governance domain.
3-4 weeks3-4 weeks
Step 6 - DP-700 Domain 1 — Implement and manage an analytics solution (30-35%)
Configure and manage Microsoft Fabric workspaces, implement security and governance controls, manage the analytics solution lifecycle, and prepare data for consumption. Domain 1 is the operational and governance domain.
- Workspace configuration — capacity assignment, workspace settings, OneLake storage configuration, Git repository connection
- Workspace lifecycle management — development, test, and production stages, deployment pipelines, promotion workflows
- Fabric security architecture — workspace roles, item-level permissions, row-level security (RLS) in Lakehouses and Warehouses, column-level security, object-level security
- Microsoft Purview integration — scanning Fabric items for sensitive data discovery, classifying Lakehouse tables, data lineage tracking across Fabric workloads
- Microsoft Purview Data Map — cataloging Fabric data assets, sensitivity labels applied to Lakehouse tables and Warehouse schemas
- OneLake data access control — managed versus unmanaged access, shortcut permissions
- Fabric capacity management — monitoring CU consumption, smoothing, throttling behavior, pausing capacity
- Monitoring Fabric workloads — Monitoring Hub for pipeline runs, Notebook executions, and data loading activities
💡 Microsoft Purview integration with Fabric is heavily tested and is entirely new content with no DP-203 equivalent. Know how Purview scans Fabric Lakehouses and Warehouses to discover and classify sensitive data, how sensitivity labels propagate to downstream items, and how data lineage is tracked across pipeline runs.
💡 Row-level security (RLS) in Fabric is implemented differently in Lakehouses versus Warehouses. In Warehouses, RLS uses T-SQL predicates on security policy objects (similar to Azure SQL). In Lakehouses, RLS is applied through the SQL analytics endpoint. Know the implementation approach for both.
💡 Deployment pipeline scenarios test the ability to promote workspace content through stages. Know what deployment rules do (override item properties per stage, such as connection strings or data source references), how to configure them, and when automatic versus manual deployment is appropriate.
💡 Fabric Monitoring Hub is the primary operational monitoring surface for DP-700. Know what information it provides for pipeline runs (status, duration, data read/written, activity details) and how to navigate to specific failed activity details.
💡 Use ExamOS for workspace management and security scenario practice that tests permission design decisions and Purview governance configuration.
Step 7 - DP-700 Domain 3 — Monitor and optimize an analytics solution (30-35%)
Monitor running data solutions, troubleshoot failures, optimize performance, and manage costs. The third domain at 30-35% covers everything that happens after the solution is built.
2-3 weeks2-3 weeks
Step 7 - DP-700 Domain 3 — Monitor and optimize an analytics solution (30-35%)
Monitor running data solutions, troubleshoot failures, optimize performance, and manage costs. The third domain at 30-35% covers everything that happens after the solution is built.
- Fabric Monitoring Hub — monitoring pipeline runs, Notebook executions, Warehouse queries, KQL ingestion
- Performance optimization for Lakehouses — Delta file compaction (OPTIMIZE command), Z-ordering for multi-column filtering, partitioning Lakehouse tables
- Spark performance optimization — cluster configuration in Fabric, executor sizing, caching DataFrames, broadcast joins for small tables
- Warehouse performance — statistics management (automatic versus manual), query store for identifying slow queries, result set caching
- KQL Database performance — ingestion time, query performance analysis, materialized views for pre-aggregation
- Cost optimization — monitoring CU consumption per workspace and per item, identifying expensive Notebook jobs, right-sizing Spark configurations
- Troubleshooting pipeline failures — identifying failed activities in Monitoring Hub, reading activity error messages, configuring retry policies
- Troubleshooting Notebook failures — reading Spark error messages, identifying data skew, handling schema evolution errors
- Delta Lake maintenance — OPTIMIZE for file compaction, VACUUM for log cleanup, identifying when maintenance is needed
💡 Delta OPTIMIZE command is specifically tested in performance scenarios. As a Lakehouse grows with many small Delta files from frequent writes, query performance degrades. OPTIMIZE compacts small files into larger ones and optionally applies Z-ordering for multi-dimensional filtering performance.
💡 Z-ordering in Delta is tested in specific filter optimization scenarios. If queries frequently filter on both `Region` and `ProductCategory`, `OPTIMIZE table ZORDER BY (Region, ProductCategory)` collocates data with the same filter values in the same files. This reduces the data scanned per query.
💡 CU consumption monitoring appears in cost optimization scenarios. Fabric charges based on capacity unit consumption. Identifying which workloads consume the most CUs and optimizing them (reducing Spark concurrency, scheduling non-urgent jobs during off-peak hours) are operational skills the exam tests.
💡 Use ExamOS for monitoring and optimization scenario practice that presents symptoms of pipeline failures or performance degradation and asks for the most appropriate diagnostic or remediation action.
Step 8 - Exam readiness and follow-on paths
Consolidate DP-700 preparation through integrated scenario practice, case study technique, and targeted gap closure before booking the exam.
2-3 weeks2-3 weeks
Step 8 - Exam readiness and follow-on paths
Consolidate DP-700 preparation through integrated scenario practice, case study technique, and targeted gap closure before booking the exam.
- Case study question technique — DP-700 includes case studies with multiple related questions drawing on an extended organizational scenario
- Domain-weighted gap analysis — equal domain weights mean no domain can be neglected
- End-to-end architecture scenarios — designing a complete Fabric solution from ingestion through transformation to consumption
- Exam format familiarization — interactive components, performance-based questions alongside multiple choice
- Follow-on credential paths — DP-600 (Fabric Analytics Engineer), DP-100 (Azure Data Scientist), AZ-104 (Azure Administrator) for infrastructure depth
💡 Microsoft Learn documentation is available during the DP-700 exam via split screen. Use it for verification on genuine uncertainty, not as a substitute for preparation. Time pressure (100 minutes for up to 60 questions) makes extensive searching impractical.
💡 Consistent performance above 80% on Legend mode across five or more consecutive ExamOS sessions is the clearest DP-700 readiness signal. All three domains carry equal weight — stable scores across all three matter more than strength in one.
💡 DP-600 (Fabric Analytics Engineer) is the natural follow-on for data engineers who also work with semantic models and Power BI. AZ-104 is worth pursuing for data engineers who want infrastructure depth alongside their Fabric credentials.
Final step - Certification, validation, and the platform shift
The most important message for anyone following an Azure data engineering path in 2026: DP-203 does not exist anymore. It retired March 31, 2025. Any study material, practice exam, or roadmap that references DP-203 as a current or schedulable exam is out of date. DP-700 is a fundamentally different exam on a fundamentally different platform. If you have existing Azure data engineering experience (Synapse, ADF, ADLS), much of that knowledge transfers — but OneLake, Lakehouse architecture, KQL, Eventstreams, and deployment pipelines are genuinely new and require dedicated study. Before booking DP-700, ensure hands-on experience in Microsoft Fabric across all three exam domains — Lakehouse pipelines, Warehouse T-SQL, and Real-Time Intelligence with KQL. Use ExamOS practice to measure readiness objectively before you book.
Final step - Certification, validation, and the platform shift
The most important message for anyone following an Azure data engineering path in 2026: DP-203 does not exist anymore. It retired March 31, 2025. Any study material, practice exam, or roadmap that references DP-203 as a current or schedulable exam is out of date. DP-700 is a fundamentally different exam on a fundamentally different platform. If you have existing Azure data engineering experience (Synapse, ADF, ADLS), much of that knowledge transfers — but OneLake, Lakehouse architecture, KQL, Eventstreams, and deployment pipelines are genuinely new and require dedicated study. Before booking DP-700, ensure hands-on experience in Microsoft Fabric across all three exam domains — Lakehouse pipelines, Warehouse T-SQL, and Real-Time Intelligence with KQL. Use ExamOS practice to measure readiness objectively before you book.
Realistic timeline
- 2 hours per day: approximately 5-7 months for the complete path
- 3-4 hours per day: approximately 3.5-5 months
- Candidates with strong Azure Synapse or ADF experience: approximately 8-12 weeks — focus time on OneLake architecture, KQL, and Real-Time Intelligence which have no DP-203 equivalent
- Candidates new to Microsoft data engineering: allow the full 5-7 month timeline
- All three DP-700 domains carry equal weight — preparation time should be distributed roughly evenly across Steps 3-7
- Hands-on Fabric experience is essential — sign up for the free 60-day Microsoft Fabric trial and build a complete Lakehouse pipeline, a Warehouse, and a KQL Database before your exam
- KQL is the most underestimated preparation topic — invest specific dedicated time on Kusto Query Language even if SQL is strong
- Consistency across daily sessions produces better DP-700 outcomes than occasional marathon sessions
Embark on your career roadmap by setting a target and staying accountable
Set target