dp-600

Prepare and serve data (40–45%)

Create objects in a lakehouse or warehouse

Copy data

Choose an appropriate method for copying data from a Fabric data source to a lakehouse or warehouse
- Microsoft Fabric decision guide: copy activity, dataflow, or Spark
Copy data by using a data pipeline, dataflow, or notebook
- see above
Add stored procedures, notebooks, and dataflows to a data pipeline
Schedule data pipelines, dataflows and notebooks
- Concept: Data pipeline Runs
- Use the Invoke pipeline activity to run another pipeline

Transform data

Implement a data cleansing process
Implement a star schema for a lakehouse or warehouse, including Type 1 and Type 2 slowly changing dimensions
Implement bridge tables for a lakehouse or a warehouse
- Many-to-many relationship guidance
Denormalize data
- Normalization vs. denormalization
Aggregate or de-aggregate data
- User-defined aggregations
- Automatic aggregations
Merge or join data
Identify and resolve duplicate data, missing data, or null values
Convert data types by using SQL or PySpark

Filter data

[Spark DataFrame Where Filter

Multiple Conditions](https://sparkbyexamples.com/spark/spark-dataframe-where-filter/)

WHERE (Transact-SQL)
Filter by values in a column

Optimize performance

Identify and resolve data loading performance bottlenecks in dataflows, notebooks, and SQL queries
Implement performance improvements in dataflows, notebooks, and SQL queries
Identify and resolve issues with Delta table file sizes