July | 2024 | Smoak Signals | Data Analytics Blog

I have always recommended that if you develop SQL based analytics, you should have some understanding of indexes when it comes to speeding up queries.

Sometimes we do not have access to create indexes on the physical tables (especially in a production environment), but approaching your DBA with some basic knowledge of indexing will put you ahead of the game.

if your queries are running longer than they need to be, it’s time to speed them up with some strategic indexing. Very briefly, here’s how you can fix your slow running query issues via indexing your tables.

Step 1: Implement a Covering Index Strategy

A covering index includes all the columns referenced in your SELECT or WHERE clauses. This means the database engine can fulfill your query using just the index, without having to access the base table data. This act can drastically reduce query execution time and improve your data visualization processes if needed. The following is an example of TSQL code for MS SQL Server demonstrating a covering index using the INCLUDE keyword:

— Creates a nonclustered index on the Person.Address table with four included (non-key) columns.
— index key column is PostalCode and the non-key columns are
— AddressLine1, AddressLine2, City, and StateProvinceID.

CREATE NONCLUSTERED INDEX IX_Address_PostalCode
ON Person.Address (PostalCode)
INCLUDE (AddressLine1, AddressLine2, City, StateProvinceID);

Step 2: Utilize Clustered Indexes

Clustered indexes should typically be applied to your primary key column. A clustered index sorts and stores the data rows in the table based upon the key values. This organization is particularly efficient for queries that retrieve a range of values.

Step 3: Apply Non-Clustered Indexes

Non-clustered indexes are typically applied to foreign keys. These indexes create a separate structure that points back to the base data rows in the table. They are ideal for speeding up queries that involve joins, which are common in business intelligence operations.

Find the Sweet Spot

With indexing, balance is key. Under-indexing can cause your queries to run longer, while over-indexing can negatively impact data insertion and updates. Striking the right balance ensures your queries remain efficient without compromising the performance of other operations.

Indexing Rule #4080: If it’s linked by it, filtered by it, or sorted by it….index by it.

(I saw that rule on a reddit forum once, and thought it made perfect sense. The 4080 reference is for my ATCQ fans.)

If you want more detail on indexing, you can read this blog post here. I wrote it a few years ago, but it is old gold, still valuable!

I appreciate everyone who has supported this blog and my YouTube channel via merch. Please check out the logo shop here.

Stay in contact with me through my various social media presences.

Join my YouTube channel to get access to perks:
https://www.youtube.com/channel/UCL2ls5uXExB4p6_aZF2rUyg/join

Over the years I have conducted a number of interviews with candidates who claimed to know SQL, but could not clearly verbalize their understanding of some basic concepts. I want you to understand these following concepts at a minimum, so your next SQL interview isn’t cut short.

First, distinguish between a left outer join and an inner join. A left outer join retrieves all records from the left table and returns only the matched records from the right table. On the other hand, an inner join retrieves only the matched records from both tables.

Next, familiarize yourself with the WHERE clause versus the HAVING clause. Use the WHERE clause to filter records before any groupings are made, while the HAVING clause filters records after grouping. This understanding will enable you to filter your data correctly.

Also, understand the difference between the UNION and UNION ALL statements. UNION removes duplicate records, whereas UNION ALL includes all records, duplicates and all.

Be ready to discuss Common Table Expressions (CTEs) versus temporary tables for ease of query writing and to forgo complicated sub-queries. CTEs are best used for readability, while temp tables are physically stored in memory (allowing for the use of indexes if necessary to optimize performance).

Understanding clustered indexes versus non-clustered indexes is vital for query performance. Clustered indexes sort and store data rows in the table based upon key values, while non-clustered indexes create a separate structure with efficient pointers back to the base table.

Lastly, ensure your queries are sargable (Search ARGument ABLE). Sargable queries use indexes efficiently, improving performance, unlike non-sargable queries, which slow down data retrieval.

I appreciate everyone who has supported this blog and my YouTube channel via merch. Please check out the logo shop here.

Stay in contact with me through my various social media presences.

Join my YouTube channel to get access to perks:
https://www.youtube.com/channel/UCL2ls5uXExB4p6_aZF2rUyg/join

Smoak Signals | Data Analytics Blog

Sharing Knowledge: Data, Visualization & Business

Month / July 2024

Increase SQL Query Performance with Indexes

Know These SQL Concepts for your Data Interview