EDW Concepts

MPP (massively parallel processing) Database

Table Distribution Options

Reference

Hash Distribution

Hash Function and Properties

When to use Hash Function for Distribution

Round Robin Distribution

Replicated

Data Movement

  • ShuffleMoveOperation: Redistributes data from one distributed table to another distributed table, changing the distribution column.

  • PartitionMoveOperation: Data moved from distributions to Control Node. Usually for Aggregations.

  • BroadcastMoveOperation: When a distributed table needs to become replicated for join compatibility

  • TrimMoveOperation: When a replicated table needs to become distributed

  • MoveOperationData: Moved from Control Node back to Compute Nodes resulting in a replicated table for further processing.

  • RoundRobinMoveOperation: Redistributes data to Round Robin Table.

Statistics

Reference Links

Optimization with Index

Clustered Column Index

Instead of storing an entire row or rows in a page, one column from many rows is stored in that page. It is this difference in architecture that gives the columnstore index a very high level of compression along with reducing the storage footprint and providing massive improvements in read performance.

Relevant Links:

Heap

Clustered

Secondary

Last updated