EDW Concepts

MPP (massively parallel processing) Database

https://www.sisense.com/glossary/mpp-database/

Table Distribution Options

Reference

Hash Distribution

Hash Function and Properties

A hash function is any function that can be used to map data of arbitrary size to fixed-size
A hash function is a mathematical function that converts a input value into another compressed numerical value.
The input to the hash function is of arbitrary length but output is always of fixed length.
Properties
1. Efficiently computable.
2. Should uniformly distribute the keys (Each table position equally likely for each key)
https://www.tutorialspoint.com/cryptography/cryptography_hash_functions.htm

When to use Hash Function for Distribution

Round Robin Distribution

Replicated

Data Movement

ShuffleMoveOperation: Redistributes data from one distributed table to another distributed table, changing the distribution column.
PartitionMoveOperation: Data moved from distributions to Control Node. Usually for Aggregations.
BroadcastMoveOperation: When a distributed table needs to become replicated for join compatibility
TrimMoveOperation: When a replicated table needs to become distributed
MoveOperationData: Moved from Control Node back to Compute Nodes resulting in a replicated table for further processing.
RoundRobinMoveOperation: Redistributes data to Round Robin Table.

Statistics

Reference Links

Optimization with Index

https://docs.microsoft.com/en-us/sql/relational-databases/indexes/clustered-and-nonclustered-indexes-described?toc=%2Fazure%2Fsynapse-analytics%2Fsql-data-warehouse%2Ftoc.json&bc=%2Fazure%2Fsynapse-analytics%2Fsql-data-warehouse%2Fbreadcrumb%2Ftoc.json&view=sql-server-ver16&preserve-view=true&viewFallbackFrom=azure-sqldw-latest

Reference Links

https://docs.microsoft.com/en-us/sql/relational-databases/indexes/columnstore-indexes-overview?view=sql-server-ver16

Clustered Column Index

Instead of storing an entire row or rows in a page, one column from many rows is stored in that page. It is this difference in architecture that gives the columnstore index a very high level of compression along with reducing the storage footprint and providing massive improvements in read performance.

Relevant Links:

Heap

https://docs.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/sql-data-warehouse-tables-index

Clustered

Secondary

PreviousEDW NextGraphQL

Last updated 2 years ago

Was this helpful?