SQL - A Complete Guide

Basic SQL Concepts

Basics

  • Data types

  • Operators

  • Functions

  • Clauses

  • Create, Insert, Update and Deletion of tables

  • sort data

  • filter data

  • Different types of joins:

    • inner join

    • left join

    • right join

    • full join

  • Subqueries

  • Views

  • Stored procedures

  • Triggers

  • Groupby and Partitionby

Basic SQL - Interview Questions

Advance SQL Concepts for Data Engineer

Data Modelling

  • Understanding how to design and create efficient database schemas and Design relationships between tables.

Some common techniques used in data modeling

  • Normalize tables:

    • It involves breaking down a large table into smaller tables that have a one-to-one relationship with the original table.

    • Data is distributed across multiple tables to avoid redundancy (same data available in multiple rows again and again), make it more manageable, and improve the data consistency (when data is repeated multiple times, it is prone to erros. Normalisation help in keeping the data consistent).

  • Create indexes

  • Dimensional Modeling:

    • Dimensional modeling is a technique used to organize data into a structure that is optimized for reporting and analysis. It involves creating fact tables and dim tables

    • Dimension tables:

      • Dimension tables contain descriptive data about the business entities that are being analyzed.

      • For example, in a retail data warehouse, dimension tables might contain data about customers, products, stores, and time.

      • Dimension tables typically have a primary key column that uniquely identifies each row.

    • Fact tables:

      • Fact tables contain the quantitative data, or facts, that are being analyzed.

      • Fact tables typically contain data such as sales amounts, quantities sold, or inventory levels.

      • Fact tables also include foreign keys that reference the corresponding dimension tables, which allows analysts to slice and dice the data by various dimensions.

    • In general, fact tables have many-to-one relationship with the dim tables

  • Data Warehousing:

    • This is the dump yard of the ETL pipeline. Data from multiple sources is transformed and loaded to a single source called data warehouse. Data warehouses in general have big data

Three parts:

  1. Conceptual Data Modeling:

    • This stage is focused on identifying and understanding the high-level business requirements of the data.

    • It involves creating a conceptual model that represents the key entities, attributes, and relationships of the data.

    • The resulting model is often presented in a graphical format called an entity-relationship diagram (ERD).

  2. Logical Data Modeling:

    • In this stage, the conceptual model is refined into a more detailed model that defines the data elements, their relationships, and the business rules that govern their behavior.

    • This model is often expressed in a normalized format, which involves decomposing data into smaller, more manageable parts to minimize redundancy and improve data consistency.

  3. Physical Data Modeling:

    • This stage involves defining the physical structure of the database, including the tables, columns, and relationships.

    • Physical modeling is concerned with the implementation details of the data model, such as data types, constraints, and indexes.

Schemas in Data modeling

  • Star Schema

  • Snowflake Schema

Query Optimization

Optimizing queries is essential for efficient data processing. You should understand how to optimize queries by using indexes, reducing the number of joins, and using query execution plans.

Query Execution Plan

  • What is it?

    • It tells, what is happening inside the query servers

    • how the execution of queries is happening

    • how many tables are hit and at what indexes

  • Where to find it?

    • SQL Server Management Studio

    • Sys.dm_exec_query_plan

    • Trace Events (very costly, be cautious while using it)

  • There are estimated plan (proposed by the engine) and actual plan (used by the engine)

  • Playing with execution plans

    • Lookups

    • Missing Index

    • Bad Statistics

    • Recompiles

    • Ad Hoc SQL

    • Parameter Sniffing

    • Cursors

    • Multi-Statement Table Values UDF

    • Query Hints

    • Sargeable

Data Integration

Data integration is the process of combining data from multiple sources into a single dataset. You should understand how to use SQL to join tables, perform data transformations, and filter data.

  1. Performance Tuning: Performance tuning involves optimizing database performance by configuring server settings, caching, and query execution plans. You should understand how to use tools like SQL Profiler and Resource Monitor to identify and resolve performance issues.

  2. Stored Procedures and Functions: Stored procedures and functions are precompiled SQL statements that can be reused across multiple applications. You should understand how to create and use stored procedures and functions to improve application performance and simplify data processing.

  3. Transactions: Transactions are a key concept for ensuring data consistency and integrity. You should understand how to use SQL transactions to group multiple statements into a single unit of work that can be committed or rolled back as a single entity.

  4. Data Security: Data security is a critical aspect of data engineering. You should understand how to use SQL to set up user accounts, define user roles, and grant or revoke access to data.

Questions to solve

  • We are given a subscription table which consists of subscription start and end date for each user. We need to write a query that returns true/false for each user based on the overlapping of dates with other users. For instance, If user1's subscription period overlaps with any other user the query must return True for user1

References

Ad

Last updated

Was this helpful?