EDW Concepts
MPP (massively parallel processing) Database
Table Distribution Options
Reference
Hash Distribution
Hash Function and Properties
A hash function is any function that can be used to map data of arbitrary size to fixed-size
A hash function is a mathematical function that converts a input value into another compressed numerical value.
The input to the hash function is of arbitrary length but output is always of fixed length.
Properties
Efficiently computable.
Should uniformly distribute the keys (Each table position equally likely for each key)
When to use Hash Function for Distribution
Round Robin Distribution
Replicated
Data Movement
ShuffleMoveOperation: Redistributes data from one distributed table to another distributed table, changing the distribution column.
PartitionMoveOperation: Data moved from distributions to Control Node. Usually for Aggregations.
BroadcastMoveOperation: When a distributed table needs to become replicated for join compatibility
TrimMoveOperation: When a replicated table needs to become distributed
MoveOperationData: Moved from Control Node back to Compute Nodes resulting in a replicated table for further processing.
RoundRobinMoveOperation: Redistributes data to Round Robin Table.
Statistics
Reference Links
Optimization with Index
Reference Links
Clustered Column Index
Instead of storing an entire row or rows in a page, one column from many rows is stored in that page. It is this difference in architecture that gives the columnstore index a very high level of compression along with reducing the storage footprint and providing massive improvements in read performance.
Relevant Links:
Heap
Clustered
Secondary
Last updated
Was this helpful?