A Software-defined Asset can represent a collection of partitions that can be tracked and materialized independently. In many ways, each partition functions like its own mini-asset, but they all share a common materialization function and dependencies. Typically, each partition will correspond to a separate file, or a slice of a table in a database.
Consider an online store's order data. In a database, the order data might be stored as a single orders table, which contains multiple days' worth of orders. However, if the data were ingested into Amazon Web Services (AWS) S3 as parquet files, you could create a new parquet file per day or partition.
Using partitions provides the following benefits:
- Cost efficiency: Run only the data that’s needed and gain granular control over slices. For example, storing recent orders in hot storage and older orders in cheaper, cold storage.
- Speed up compute: Divide large datasets into smaller, more manageable parts to speed up queries.
- Scalability: As data grows, distribute it across multiple servers or storage systems or run multiple partitions at a time in parallel.
- Concurrent processing: Boost computational speed with parallel processing, significantly reducing the time and cost of data processing tasks.
- Speed up debugging: Test on an individual partition before trying to run larger ranges of data.
Partitions are supported for both Software-defined Assets and ops, but how each concept is used is unique. Refer to the following documentation for more info:
With partitions, you can:
- View runs by partition in the  Dagster UI
- Define a  schedule that fills in a partition each time it runs. For example, a job might run each day and process the data that arrived during the previous day.
- Launch  backfills, which are sets of runs that each process a different partition. For example, after making a code change, you might want to run your job on all partitions instead of just one of them.