SQL - A Complete Guide
Basic SQL Concepts
Basics
Data types
Operators
Functions
Clauses
Create, Insert, Update and Deletion of tables
sort data
filter data
Different types of joins:
inner join
left join
right join
full join
Subqueries
Views
Stored procedures
Triggers
Groupby and Partitionby
Basic SQL - Interview Questions
Advance SQL Concepts for Data Engineer
Data Modelling
Understanding how to design and create efficient database schemas and Design relationships between tables.
Some common techniques used in data modeling
Normalize tables:
It involves breaking down a large table into smaller tables that have a one-to-one relationship with the original table.
Data is distributed across multiple tables to avoid redundancy (same data available in multiple rows again and again), make it more manageable, and improve the data consistency (when data is repeated multiple times, it is prone to erros. Normalisation help in keeping the data consistent).
Create indexes
Dimensional Modeling:
Dimensional modeling is a technique used to organize data into a structure that is optimized for reporting and analysis. It involves creating fact tables and dim tables
Dimension tables:
Dimension tables contain descriptive data about the business entities that are being analyzed.
For example, in a retail data warehouse, dimension tables might contain data about customers, products, stores, and time.
Dimension tables typically have a primary key column that uniquely identifies each row.
Fact tables:
Fact tables contain the quantitative data, or facts, that are being analyzed.
Fact tables typically contain data such as sales amounts, quantities sold, or inventory levels.
Fact tables also include foreign keys that reference the corresponding dimension tables, which allows analysts to slice and dice the data by various dimensions.
In general, fact tables have many-to-one relationship with the dim tables
Data Warehousing:
This is the dump yard of the ETL pipeline. Data from multiple sources is transformed and loaded to a single source called data warehouse. Data warehouses in general have big data
Three parts:
Conceptual Data Modeling:
This stage is focused on identifying and understanding the high-level business requirements of the data.
It involves creating a conceptual model that represents the key entities, attributes, and relationships of the data.
The resulting model is often presented in a graphical format called an entity-relationship diagram (ERD).
Logical Data Modeling:
In this stage, the conceptual model is refined into a more detailed model that defines the data elements, their relationships, and the business rules that govern their behavior.
This model is often expressed in a normalized format, which involves decomposing data into smaller, more manageable parts to minimize redundancy and improve data consistency.
Physical Data Modeling:
This stage involves defining the physical structure of the database, including the tables, columns, and relationships.
Physical modeling is concerned with the implementation details of the data model, such as data types, constraints, and indexes.
Schemas in Data modeling
Star Schema
Snowflake Schema
Query Optimization
Optimizing queries is essential for efficient data processing. You should understand how to optimize queries by using indexes, reducing the number of joins, and using query execution plans.
Query Execution Plan
What is it?
It tells, what is happening inside the query servers
how the execution of queries is happening
how many tables are hit and at what indexes
Where to find it?
SQL Server Management Studio
Sys.dm_exec_query_plan
Trace Events (very costly, be cautious while using it)
There are estimated plan (proposed by the engine) and actual plan (used by the engine)
Playing with execution plans
Lookups
Missing Index
Bad Statistics
Recompiles
Ad Hoc SQL
Parameter Sniffing
Cursors
Multi-Statement Table Values UDF
Query Hints
Sargeable
Data Integration
Data integration is the process of combining data from multiple sources into a single dataset. You should understand how to use SQL to join tables, perform data transformations, and filter data.
Performance Tuning: Performance tuning involves optimizing database performance by configuring server settings, caching, and query execution plans. You should understand how to use tools like SQL Profiler and Resource Monitor to identify and resolve performance issues.
Stored Procedures and Functions: Stored procedures and functions are precompiled SQL statements that can be reused across multiple applications. You should understand how to create and use stored procedures and functions to improve application performance and simplify data processing.
Transactions: Transactions are a key concept for ensuring data consistency and integrity. You should understand how to use SQL transactions to group multiple statements into a single unit of work that can be committed or rolled back as a single entity.
Data Security: Data security is a critical aspect of data engineering. You should understand how to use SQL to set up user accounts, define user roles, and grant or revoke access to data.
Questions to solve
We are given a subscription table which consists of subscription start and end date for each user. We need to write a query that returns true/false for each user based on the overlapping of dates with other users. For instance, If user1's subscription period overlaps with any other user the query must return True for user1
References
Ad
Last updated
Was this helpful?