December 9, 2024 by anthonysmoak

Boost SQL Performance: 4 Essential Optimization Tips

Performance matters!

I decided to break down some quick tips to help you write faster, more efficient SQL queries.

1. Limit Your Use of SELECT *

While SELECT * seems like a quick and easy solution, it’s a performance killer, especially when you’re dealing with large tables. By requesting all columns, the database has to load unnecessary data, which can slow down execution, especially as your dataset grows.

Tip: Be intentional about the columns you select. Only retrieve the data you need. This not only improves performance but also minimizes the load on your system.

2. Avoid DISTINCT and UNION When Unnecessary

Checking for uniqueness often comes at a cost. Both DISTINCT and UNION use sorting algorithms to remove duplicates, which is inherently time-consuming. If you don’t explicitly need unique records, avoid them.

Instead of UNION, use UNION ALL. While UNION will filter out duplicates, UNION ALL keeps all your records. It can perform your combining of records faster since it bypasses the sorting step.

3. Use Sargable Operators in the WHERE Clause

A sargable query (short for “search argument able”) is one that allows the SQL optimizer to take advantage of indexes, leading to faster data retrieval. When you use operators that are sargable (e.g., =, >, <=, BETWEEN), the query can utilize an index to find data quickly instead of performing a full table scan, which can be slow and resource-intensive.

For example, a query like WHERE customer_id = 12345 is sargable, whereas WHERE YEAR(order_date) = 2023 is not, because the database can’t use an index efficiently for that condition.

4. Use Indexes Appropriately

Indexes are extremely useful when it comes to speeding up query execution. Without them, the database is forced to scan the entire table to retrieve the data you need.

However, it’s essential to use indexes judiciously. Too many indexes can slow down insert and update operations.

Real-World Example: Sales Data Query Optimization

Imagine you’re a data analyst working with sales data, trying to identify the top-selling products for the last quarter. A poorly optimized query might look like this:

SELECT * FROM sales_data WHERE YEAR(sale_date) = 2023;

This query retrieves all columns (SELECT *), performs a non-sargable operation on sale_date, and might even require a DISTINCT if there are duplicates. All this can be painfully slow.

Instead, a more optimized query would look like this:

SELECT product_id, SUM(sales_amount)
FROM sales_data
WHERE sale_date BETWEEN '2023-01-01' AND '2023-12-31'
GROUP BY product_id
ORDER BY SUM(sales_amount) DESC;

In this version:

We select only the necessary columns (product_id, sales_amount).
We use a sargable BETWEEN operator on sale_date.
We aggregate data and order it efficiently to find the top products.

The Takeaway: Stay Sharp, Stay Efficient

Writing efficient SQL queries isn’t just about speed; it’s about making your data retrieval processes more scalable and reliable.

Be intentional about your query design. Use SELECT * sparingly, avoid unnecessary operations like DISTINCT and UNION, and always keep indexing and sargable operations in mind

I appreciate everyone who has supported this blog and my YouTube channel via merch. Please check out the logo shop here.

Stay in contact with me through my various social media presences.

Join my YouTube channel to get access to perks:
https://www.youtube.com/channel/UCL2ls5uXExB4p6_aZF2rUyg/join

November 30, 2024December 12, 2024 by anthonysmoak

Hot Take: SQL is Difficult

Career Advice, SQL
Career Advice, Database, SQL, SQL Server
1 Comment

Here is my hot take, it can be very difficult to write SQL.

Conventional data punditry suggests that SQL isn’t difficult to learn. Of course it’s easy to write SQL that returns data; however, it can be very difficult to write SQL that returns the CORRECT data.

Writing effective SQL requires more than the ability to write basic syntax. You need a sufficient understanding of the data itself, among other factors. And believe me, this is not always an easy endeavor for a single person!

I’ve been placed into situations where I did not have any understanding of the relationships between tables, or possess sufficient metadata/documentation to determine which columns or fields were relevant to the specific question I was tasked to answer. There were no data subject matter experts on hand to collaborate with because they left the company a few months prior.

I’ve literally been handed a SQL Server backup file containing hundreds of tables with no accompanying entity relationship diagram (ERD) or documentation, yet I was expected to single-handedly replicate an outcome under a tight deadline because I’m the “data guy”.

I was eventually able to do so, but not without relying on significant experience, lots of coffee and a little bit of luck.

Obviously SQL practitioners positioned on technical teams have to know more than the business with respect to the technical aspects of the data request.

But they also need to have the relevant business domain knowledge and/or the ability to translate and document the business logic from a domain expert and turn it into accurate working code.

Where Are the Requirements?

Additionally, over the years I’ve seen the practice of providing well thought out and formalized data requirements devolve into data request “vibes”.

By data request “vibes” I mean, submitting requests for data without providing formalized upfront documentation. These are vague ambiguous requests provided to the data professional that’s lacking specific criteria used to evaluate success. The data professional has to figure out what is needed and how best to accomplish it. This is in addition to coding the final solution.

In my opinion, communication and documentation skills are crucial. They enable the divination of accurate requirements from stakeholders. This ability is the “killer app” that separates commodity SQL coders from next level SQL coders. Commodity “order-taker” data resources are often outsourced.

Captain Save a Project

In addition, most organizations struggle with data quality, due to poor data governance. If the SQL analytics that you generate do not “feel” correct, guess who’s on the hook to figure out why?

You’re not only the SQL subject matter expert and the business requirements person, you’re also the data detective because a VIP wants a revised report and an explanation by the end of the day. However, you only signed up to be a consumer and messenger of this data source that you do not own.

Some SQL looks easy on the surface. However, don’t underestimate the effort it takes to understand table structures, data values, table relationships and data granularity. Additionally, domain knowledge is equally essential to write effective SQL.

Also be prepared to undertake requirements documentation, data quality sleuthing, coding, and of course, navigating the pressure of being the “reports person” who should know all things about the data.

Do You Have These Skills to Complete a SQL Analysis ?

In a manner, SQL is like Chess. It is easy to learn the basic moves. However, it is very difficult to master in order to play a worthwhile game. To really excel at SQL within an organization, you have to face challenging situations repeatedly. You must also manage a multitude of additional factors beyond pure technical syntax.

You can learn “SELECT * FROM TABLE_X” fairly quickly, but do you have the following complementary skills to deliver a SQL analysis?

Sufficient domain knowledge
Deciphering jargon and documenting requirements from stakeholders outside your area of expertise
Understanding tables, columns and data values
Thinking in sets, not row by agonizing row (i.e., RBAR)
An understanding of data relationships
Indexing and SQL performance tuning (are your queries too slow?)
Ability to work around potential data inconsistencies (e.g., the numbers from these different sources don’t match)
Savviness to navigate organizational data silos (e.g., groups that hoard their data and expertise for political reasons)
Managing up the chain with respect to expectations and tight timelines (we have a life outside of work)
Ability to simplify complex data findings

And last but not least, actual SQL coding proficiency.

These are the valuable skills that develop after years of work experience.

Conclusion

Yes, SQL can be easy to learn in a pressure free vacuum with perfect data that’s easy to comprehend .

Just remember that at some point in your data career you will be called upon to be “Captain Save a Project” and the basic SQL you learned in a vacuum will not be enough to get the job done. You’ll also need to call upon the skills I listed above.

In practice, generating effective SQL requires proper context, technical and communication skills, organizational savvy, and teamwork. The end results are not always as easy as your friendly SQL data pro makes it look.

Keep your queries sharp, and your data clean.

Until next time.

Anthony Smoak

I appreciate everyone who has supported this blog and my YouTube channel.

Stay in contact with me through my various social media presences.

Join my YouTube channel to support the channel:
https://www.youtube.com/channel/UCL2ls5uXExB4p6_aZF2rUyg/join

Photo by Photo By: Kaboompics.com: https://www.pexels.com/photo/a-a-disappointed-man-looking-at-a-paper-holding-his-head-7877111/

April 25, 2021November 1, 2021 by anthonysmoak

How to Import a BAK File into SQL Server

If you’ve ever asked “How do I import a .BAK file into SQL Server” or “What is a BAK file and how do I open it” then this is your video. Additionally, I also demonstrate how to create a .BAK file to backup your database. If you work with data then you need to know this tip!

Do Great Things with Your Data
Anthony B. Smoak

Please like and subscribe on the Anthony B. Smoak YouTube channel!
Definitely pick up some merch if you’ve enjoyed this blog and YouTube channel over the years.

All views and opinions are solely my own and do not necessarily reflect those of my employer.

December 6, 2018March 18, 2019 by anthonysmoak

T-SQL Tips and Quick Reference

Whenever I have to fire up SQL Server to perform some analyses there are a few functions, keywords and capabilities that I always find myself referring to in order to analyze my data. As is the case with most T-SQL users, even those of us that have been using T-SQL for over a decade, in our heads we always know what we want to do but will refer to our favorite syntax reference sources in order to progress. I decided to make a handy reference sheet for myself and then decided to post it here for anyone else.

How to Create a Temporary Table in T-SQL / SQL Server

Temporary (i.e., temp) tables enable the storage of result sets from SQL scripts yet require less record locking overhead and thus increase performance. They remain in effect until they are explicitly dropped, or until the connection that created them is discontinued.

As I see it, their main benefit is that they preclude me from writing difficult to comprehend nested queries since I can place a result set inside a temp table and then join it back to a normal table at-will.

In this example, the results of permanent table ‘TABLE1’ will be placed into global temporary table ##TEMPTABLE:

SELECT 
     FIELDNAME1,
     FIELDNAME2,
     FILEDNAME3 
INTO ##TEMPTABLE 
FROM TABLE1

Temp tables are stored in the tempdb system database.

“The tempdb system database is a global resource that is available to all users connected to the instance of SQL Server or connected to SQL Database.”

Additional Reference:

https://docs.microsoft.com/en-us/sql/relational-databases/databases/tempdb-database?view=sql-server-2017

What do the Hashtags Mean in T-SQL Temp Table Creation?

The number of hash signs “#” preceding the name of the temp table affects whether the scope of the table is local or global.

If you precede the temp table name with “#”, then the table will be treated as a local temp table.
If you precede the temp table with “##”, then the table will be treated as a global temp table.

“You can create local and global temporary tables. Local temporary tables are visible only in the current session, and global temporary tables are visible to all sessions. Temporary tables cannot be partitioned. Prefix local temporary table names with single number sign (#table_name), and prefix global temporary table names with a double number sign (##table_name).”

Additional References:

How to Drop a Temp Table in T-SQL / SQL Server

There are times when you will need to rerun code that creates a temp table. If the temp table has already been created, you will encounter an error.

“There is already an object named ‘##TEMP_TABLE_NAME’ in the database.”

Place the following code above the creation of your temp tables to force SQL Server to drop the temp table if it already exists. Change ##TEMP_TABLE_NAME to your table name and use the correct number of hashtags as applicable to a local (#) or global (##) temp table.

IF OBJECT_ID('tempdb..##TEMP_TABLE_NAME') IS NOT NULL
DROP TABLE ##TEMP_TABLE_NAME

How to Add a New Field to a Temp Table in T-SQL / SQL Server (ALTER TABLE)

Here is example T-SQL that illustrates how to add a new field to a global temp table. The code below adds a simple bit field (holds either 1 or 0) named FIELD1 to the temp table, declares it as NOT NULL (i.e., it must have a value) and then defaults the value to 0.

ALTER TABLE ##TEMP_TABLE
ADD FIELD1 Bit NOT NULL DEFAULT (0)

The following code changes the data type of an existing field in a global temp table. FIELD1 has its data type changed to NVARCHAR(2) and is declared as NOT NULL.

ALTER TABLE ##TEMP_TABLE
ALTER COLUMN FIELD1 NVARCHAR(20) NOT NULL;

Additional References:

https://docs.microsoft.com/en-us/sql/t-sql/statements/alter-table-transact-sql?view=sql-server-2017

How to Use a CASE Statement in T-SQL / SQL Server

The following information on the CASE statement is direct from Microsoft:

The CASE expression evaluates a list of conditions and returns one of multiple possible result expressions. The CASE expression has two formats:

The simple CASE expression compares an expression to a set of simple expressions to determine the result.
The searched CASE expression evaluates a set of Boolean expressions to determine the result.

Both formats support an optional ELSE argument.

CASE can be used in any statement or clause that allows a valid expression. For example, you can use CASE in statements such as SELECT, UPDATE, DELETE and SET, and in clauses such as select_list, IN, WHERE, ORDER BY, and HAVING.

Examples from Microsoft:
SELECT ProductNumber, Category = CASE ProductLine WHEN 'R' THEN 'Road' WHEN 'M' THEN 'Mountain' WHEN 'T' THEN 'Touring' WHEN 'S' THEN 'Other sale items' ELSE 'Not for sale'  END, Name FROM Production.Product ORDER BY ProductNumber;

SELECT ProductNumber, Name, "Price Range" = CASE WHEN ListPrice = 0 THEN 'Mfg item - not for resale' WHEN ListPrice = 50 and ListPrice = 250 and ListPrice < 1000 THEN 'Under $1000' ELSE 'Over $1000' END FROM Production.Product ORDER BY ProductNumber ;

https://docs.microsoft.com/en-us/sql/t-sql/language-elements/case-transact-sql?view=sql-server-2017

Here is a link to great post that highlights some of the unexpected results when using the CASE statement.

https://sqlperformance.com/2014/06/t-sql-queries/dirty-secrets-of-the-case-expression

How to Use the Cast Function in T-SQL / SQL Server

When you need to convert a data field or expression to another data type then the cast function can be helpful. I typically have the need to take imported text fields and evaluate them as a datetime. The cast statement below helps me resolve this issue.

Select cast(txtOrder_Date as datetime) as Order_Date

This statement can also be used in a WHERE clause to filter the text as if it were a true datetime field/.

Where cast(txtOrder_Date as datetime)) between '20170101' and '20181231'

Furthermore, you can cast a literal string to an integer or decimal as needed.

Select cast(‘12345’ as int) as Integer_Field
Select cast(‘12345.12’’ as decimal (9,2)) as Decimal_Field

When your FIELDNAME is a text value, you can use the cast function to change its data type to an integer or decimal, and then sum the results. Here are a few examples I have had to use in the past with the sum function.

sum(cast(FIELDNAME as int)) as Sum_Overall_Qty

sum(cast(ltrim(rtrim(FIELDNAME2)) as decimal(38,2))) as Sum_Sales_Price

Additional Reference:

https://docs.microsoft.com/en-us/sql/t-sql/functions/cast-and-convert-transact-sql?view=sql-server-2017

Using the REPLACE Function in T-SQL / SQL Server

The Replace function is useful when you need to replace all occurrences of one character or substring with another character or substring. The following select will replace the string ‘Anthony’ with ‘Tony’.

Select REPLACE(‘My name is Anthony’, ‘Anthony’, ‘Tony’);

Additional Reference:

https://docs.microsoft.com/en-us/sql/t-sql/functions/replace-transact-sql?view=sql-server-2017

How to Convert a Negative Text Number in Parenthesis Format to a Numeric Data Type (T-SQL / SQL Server)

I’ve found this particular expression useful when trying to convert a negative number in text format to a decimal value when the text is enclosed in parentheses; i.e., changing (123.45) to -123.45

It makes use of the REPLACE function to find the leading parenthesis and replace it with a negative sign. This first REPLACE is nested inside another REPLACE function in order to find the trailing parenthesis and replace

Select cast(replace(replace('(123.45)','(','-'),')','') as money);

You can also use the convert function to accomplish the same result. Below I used this line of code to sum the negative formatted text (represented by FIELD_NAME) by converting it to the money data type after replacing the parenthesis.

sum(convert(money,replace(replace(FIELD_NAME,'(','-'),')',''))) as Sum_Domestic_Price

COALESCE Function in T-SQL / SQL Server

The COALESCE function is very useful when replacing NULL field values with a substitute value. Per Microsoft, the COALESCE function evaluates in order a comma delimited list of expressions and returns the current value of the first expression that initially does not evaluate to NULL.

For example,

SELECT COALESCE(NULL, NULL, 'third_value', 'fourth_value');

returns the third value because the third value is the first value that is not null. I will use the COALESCE function at times to replace NULL values with 0 for use in calculations.

Select COALESCE(NULL_FIELD, 0)

https://docs.microsoft.com/en-us/sql/t-sql/language-elements/coalesce-transact-sql?view=sql-server-2017

August 5, 2018August 5, 2018 by anthonysmoak

How to Dynamically Pivot Data in SQL Server

SQL is the lifeblood of any data professional. If you can’t leverage SQL and you work with data, your life will be more difficult than it needs to be.

In this video I am using SQL Server Express to turn a simple normalized dataset into a pivoted dataset. This is not a beginner video as I assume you are familiar with basic SQL concepts.

T-SQL is Microsoft’s SQL language that contains additional functions and capabilities over and above ANSI standards. We’ll use some of these functions to turn the following data set that displays average rents in major American cities into a pivoted denormalized dataset.

The City values in the City column will become individual columns in a new pivoted dataset with their respective Average Rent values appearing underneath.

We’re going to transform this:

Normalized Data

Into this:

Pivoted Data

Notice how the city values are now column heads and the respective Average Rent values are underneath.

Make sure you watch the video but here is the code used in the example.

IF OBJECT_ID('tempdb..##TBL_TEMP') IS NOT NULL
DROP TABLE ##TBL_TEMP

--This parameter will hold the dynamically created SQL script
DECLARE   @SQLQuery AS NVARCHAR(MAX)

--This parameter will hold the Pivoted Column values
DECLARE   @PivotColumns AS NVARCHAR(MAX)

SELECT   @PivotColumns= COALESCE(@PivotColumns + ',','') + QUOTENAME([City])
FROM [dbo].[tbl_Rent]

/* UNCOMMENT TO SEE THE NEW COLUMN NAMES THAT WILL BE CREATED */
--SELECT   @PivotColumns

--Create the dynamic query with all the values for
--pivot column at runtime
--LIST ALL FILEDS EXCEPT PIVOT COLUMN

SET   @SQLQuery =
   N'SELECT [City Code],[Metro],[County],[State],[Population Rank],' +   @PivotColumns + '
   INTO ##TBL_TEMP
   FROM [dbo].[tbl_Rent]
   
   PIVOT( MAX([Average Rent])
      FOR [City] IN (' + @PivotColumns + ')) AS Q'

/* UNCOMMENT TO SEE THE DYNAMICALLY CREATED SQL STATEMENT */
--SELECT   @SQLQuery
--Execute dynamic query
EXEC sp_executesql @SQLQuery

/* VIEW PIVOTED TABLE RESULTS */
Select * from ##TBL_TEMP

Big shoutout to StackOverflow for help with this example.

April 10, 2018July 22, 2024 by anthonysmoak

How to Fix an Import Specification Error in Microsoft Access

There are certain aspects of Microsoft Access that can be downright frustrating and puzzling to debug. I want to share a tip with you that will hopefully save you hours of frustration. There is nothing more foundational than importing data into Microsoft Access so most likely you’ll appreciate the fix for this run-time error if you are attempting to use VBA.

If you encounter the following Microsoft Access Error:

“Run-Time error ‘3625’: The text file specification ‘My Saved Access Import Spec’ does not exist. You cannot import, export, or link using that specification.”

Most likely you have confused a saved set of “import steps” with a saved “Import/Export specification” while trying to use the Docmd.TransferText command; or at least I did.

Consider the following sample VBA code that uses the Docmd.TransferText command to import a delimited file (from a path stored in string variable strInputFileName) into a table named “tbl_Access_Import_Data” using an import specification.

Private Sub cmd_Import_Table_Click()

Dim strInputFileName As String
'Set Path to Local CSV File. This file will be imported into an Access Table.
strInputFileName = "C:\Users\Desktop\Access Data\Access_Import_Data"

' Use a Macro to Import a delimited file
' "My Saved Access Import Spec" = Import Spec
' "tbl_Access_Import_Data" = Destination Access Table
' strInputFileName = hardcoded path to source csv file

DoCmd.TransferText acImportDelim, "My Saved Access Import Spec", "tbl_Access_Import_Data", strInputFileName

End Sub

4. Error 3625 Edited 2

Let me show you where I went off track. I saved “import steps” and then tried to reference the saved “import steps” with the Docmd.TransferText method. You cannot reference “import steps” with this method, only “Import/Export specifications”.

1. Import Text Wizard Edited

I used the Import Text Wizard to define and delimit the columns in a specified .csv file and indicated the table I desired to have that data imported into. Afterwards, I pressed the finish button.

2. Import Text Wizard Blurred

Once I hit “Finish”, on the very next screen I saved the “import steps” that I previously defined. Notice the verbiage next to step 1 (i.e. “Save import steps”).

3. Saved Fake Spec Blurred

As you can see above, I created a saved “import step” erroneously named “My Saved Access Import Spec”. This name was the value that I erroneously passed to the Docmd.TransferText method in code.

4. Error 3625 Edited 2

These actions result in ‘Run-time error 3625’ that we will fix.

5. Import data Secification Edited

In order to save a legitimate Import/Export specification that can be successfully referenced with the Docmd.TransferText method, make sure to hit the “Advanced” button before you hit “Finish” when you come to the last window of the Import Text Wizard.

Make sure to hit “Save As” (Step 2 above) on the right hand side of the window.

6. Capture Edited

At this point, name and then save your true Import/Export specification name and hit “OK”.

Now when you come to the same window again you can hit the “Specs…” button to observe the names of all of the saved Import/Export specifications.

In the pic above I only have 1 Import/Export specification named “My Real Saved Access Import Spec”.

7.5 Import Complete Edited 2

Observe, once the true Import/Export specification is referenced in VBA code, the code executes as intended.

Additional Tips

I am not aware of how to edit Import/Export specifications. The best advice that I have is to recreate and then overwrite the existing specification or save the new revised specification with a different name.

If you place the following SQL code in a blank Select Query, you can view all the true specification names along with field names and respective field widths.

SELECT MSysIMEXSpecs.SpecName, MSysIMEXColumns.FieldName, MSysIMEXColumns.Start, MSysIMEXColumns.Width, MSysIMEXColumns.SkipColumn FROM MSysIMEXColumns INNER JOIN MSysIMEXSpecs ON MSysIMEXColumns.SpecID = MSysIMEXSpecs.SpecID

ORDER BY MSysIMEXSpecs.SpecName, MSysIMEXColumns.Start, MSysIMEXColumns.Width;

8. SQL Results Edited

The results of that query from my example database are shown above. All due credit goes to stackoverflow for this SQL tip.

https://stackoverflow.com/questions/34295360/the-text-file-specification-does-not-exist-when-importing-into-access

3. Saved Fake Spec Blurred

Furthermore, if you are intent on referencing saved import steps in VBA code (not to be confused with the aforementioned Import/Export specification), then use the Docmd.RunSavedImportExport method.

To execute the “import step” shown in the picture above using VBA, I would use the following command:

DoCmd.RunSavedImportExport "My Saved Access Import Spec"

I hope this helps solve your “how to fix Run-Time error 3625 in Microsoft Access” question. Good luck!

Join my YouTube channel to get access to perks:

https://www.youtube.com/channel/UCL2ls5uXExB4p6_aZF2rUyg/join

October 18, 2017August 6, 2018 by anthonysmoak

Return Unmatched Records with SQL and Microsoft Access

Over the course of many years of building SQL scripts, I’ve tended to help SQL novices perform the set difference operation on their data. This post will not provide in-depth coverage on SQL run plans and tuning minutiae, but I do want to provide a high level overview for the novice.

If we define set A as the three numbers {1, 2, 3} and set B as the numbers {2, 3, 4} then the set difference, denoted as A \ B, is {1}. Notice that the element 1 is only a member of set A.

A picture is worth a thousand words as they say. A Venn diagram will be effective at illustrating what we’re trying to accomplish in this post.

This blog post will cover using SQL and Microsoft Access to address capturing the shaded records in set A. If you have a database table named A and wanted to determine all of the rows in this table that DO NOT reside in another table named B, then you would apply the set difference principle.

LEFT OUTER JOIN & IS NULL SYNTAX

There are multiple ways to implement the set difference principle. It helps if there is a common join key between both sets of data when performing this analysis.

If I were working with two tables, one containing inventory data and one containing order data. I could write the following SQL script to return all the inventory rows that do not reside in the orders table.

SELECT table_inventories.* 
 FROM   table_inventories 
        LEFT OUTER JOIN table_orders 
                     ON table_inventories.id = table_orders.id 
 WHERE  table_orders.id IS NULL

MICROSOFT ACCESS EXAMPLE

Consider the following tables in Microsoft Access. Observe that table_orders has fewer records than table_inventories.

We can construct a set difference select query using these tables to return all of the products in table_inventories that have not been ordered. Create a query in Microsoft Access in a similar fashion as shown below.

The result of this query would produce the following two products that are not in table_orders.

The Microsoft Access Query & View Designer would automatically generate the following SQL if you cared to open the Access SQL editor.

SELECT table_inventories.*
FROM   table_inventories
LEFT JOIN table_orders
ON table_inventories.id = table_orders.id
WHERE  (( ( table_orders.id ) IS NULL ));

Notice that LEFT JOIN is automatically created instead of LEFT OUTER JOIN. In Microsoft Access, the OUTER operation is optional. Also notice that Access loves to add additional parentheses for reasons known only to Microsoft.

Per Microsoft Access SQL Reference:

“Use a LEFT JOIN operation to create a left outer join. Left outer joins include all of the records from the first (left) of two tables, even if there are no matching values for records in the second (right) table [1].”

NOT EXISTS SYNTAX

Let’s step away from Microsoft Access for the remainder of this post. The NOT EXISTS approach provides similar functionality in a more performance friendly manner as compared to the LEFT OUTER JOIN & IS NULL syntax.

SELECT table_inventories.*
FROM   table_inventories
WHERE  table_inventories.id NOT EXISTS (SELECT table_orders.id
FROM   table_orders);

EXCEPT SYNTAX (T-SQL)

Alternatively, we could use the SQL EXCEPT operator which would also accomplish the task of returning inventory ids that do not reside in the orders table (i.e. inventory items that were never ordered). This syntax would be appropriate when using SQL Server.

SELECT table_inventories.id
FROM   table_inventories
EXCEPT
SELECT table_orders.id
FROM   table_orders

Per Microsoft:

“EXCEPT
Returns any distinct values from the query to the left of the EXCEPT operator that are not also returned from the right query [2].”

MINUS SYNTAX (ORACLE)

The following script will yield the same result as the T-SQL syntax. When using Oracle, make sure to incorporate the MINUS operator.

SELECT table_inventories.id
FROM   inventories
MINUS
SELECT table_orders.id
FROM   table_orders

Now take this tip and get out there and do some good things with your data.

Anthony Smoak

References:

[1] Access 2007 Developer Reference. https://msdn.microsoft.com/en-us/library/bb208894(v=office.12).aspx

[2] Microsoft T-SQL Docs. Set Operators – EXCEPT and INTERSECT (Transact-SQL). https://docs.microsoft.com/en-us/sql/t-sql/language-elements/set-operators-except-and-intersect-transact-sql

[3] Oracle Help Center. The UNION [ALL], INTERSECT, MINUS Operators. http://docs.oracle.com/cd/B19306_01/server.102/b14200/queries004.htm

Venn diagram courtesy of http://math.cmu.edu/~bkell/21110-2010s/sets.html

November 25, 2016July 3, 2017 by anthonysmoak

B.I. Basics: Create an SSIS Data Profiling Task In SQL Server

Data Profiling is necessary when trying to gain an understanding of a given data set. A data profiling assessment should begin before any reporting or application development work begins. My video will demonstrate how to create a basic SSIS Data Profiling Task using SQL Server Data Tools.

According to the DAMA Guide to the Data Management Body of Knowledge:

“Before making any improvements to data, one must be able to distinguish between good and bad data…. A data analyst may not necessarily be able to pinpoint all instances of flawed data. However, the ability to document situations where data values look like they do not belong provides a means to communicate these instances with subject matter experts, whose business knowledge can confirm the existences of data problems.”

Here is additional information direct from Bill Gates’s former startup outfit regarding the types of data profiling tasks available in SSIS: https://msdn.microsoft.com/en-us/library/bb895263.aspx

If you’re interested in Business Intelligence & Tableau please subscribe and check out my videos either here on this site or on my Youtube channel.

April 3, 2016November 13, 2016 by anthonysmoak

The Need For Speed: Improve SQL Query Performance with Indexing

This article is also published on LinkedIn.

How many times have you executed a SQL query against a million plus row table and then engaged in a protracted waiting game for your results? Unfortunately, a poor database table indexing strategy can counteract the gains of the best hardware and server architectures. The positive impact that strategically applied indexes can provide to query performance should not be ignored just because one isn’t wearing a DBA hat. “You can obtain the greatest improvement in database application performance by looking first at the area of data access, including logical/physical database design, query design, and index design” (Fritchey, 2014). Understanding the basics of index application should not be eschewed and treated as an esoteric art best left to DBAs.

Make use of the Covering Index

It is important that regularly used, resource intensive queries be subjected to “covering indexes”. The aim of a covering index is to “cover” the query by including all of the fields that are referenced in WHERE or SELECT statements. Babbar, Bjeletich, Mackman, Meier and Vasireddy (2004) state, “The index ‘covers’ the query, and can completely service the query without going to the base data. This is in effect a materialized view of the query. The covering index performs well because the data is in one place and in the required order.” The benefit of a properly constructed covering index is clear; the RDBMS can find all the data columns it needs in the index without the need to refer back to the base table which drastically improves performance. Kriegel (2011) asserts, “Not all indices are created equal — If the column for which you’ve created an index is not part of your search criteria, the index will be useless at best and detrimental at worst.”

Apply a Clustered Index

More often than not, a table should have a clustered index applied so as to avoid expensive table scans by the query optimizer. It is advisable to create one clustered index per table preferably on the PRIMARY KEY column. In theory, since the primary key is the unique identifier for a row, query writers will employ the primary key in order to aid with record search performance.

“When no clustered index is present to establish a storage order for the data, the storage engine will simply read through the entire table to find what it needs. A table without a clustered index is called a heap table. A heap is just an unordered stack of data with a row identifier as a pointer to the storage location. This data is not ordered or searchable except by walking through the data, row by row, in a process called a scan” (Fritchey, 2014).

However, the caveat to applying clustered indexes on a transactional table is that the index must be reordered after every INSERT or UPDATE to the key which can add substantial overhead to those processes. Dimensional or static tables which are only accessed for join purposes are optimal for this indexing strategy.

Apply a Non-Clustered Index

Another consideration in regard to SQL performance tuning is to apply non-clustered indexes on foreign keys within frequently accessed tables. Babbar et al. (2004) advise, “Be sure to create an index on any foreign key. Because foreign keys are used in joins, foreign keys almost always benefit from having an index.”

Indexing is an Art not a Science

Always remember that indexing is considered an art and not a science. Diverse real world scenarios often call for different indexing strategies. In some instances, indexing a table may not be required. If a table is small (on a per data page basis), then a full table scan will be more efficient than processing an index and then subsequently accessing the base table to locate the rest of the row data.

Conclusion

One of the biggest detriments to SQL query performance is an insufficient indexing strategy. On one hand, under-indexing can potentially cause queries to run longer than necessary due to the costly nature of table scans against unordered heaps. This scenario must be counterbalanced by the tendency to over-index, which will negatively impact insert and update performance.

When possible, SQL practitioners and DBAs should collaborate to understand query performance as a whole; especially in a production environment. DBAs left to their own devices have the potential to create indexes without any knowledge of the queries that will utilize those indexes. This uncoordinated approach has the potential to render indexes inefficient on arrival. Conversely, it is equally important that SQL practitioners have a basic understanding of indexing as well. Placing “SELECT *” in every SQL query will negate the effectiveness of covering indexes and add additional processing overhead as compared to specifically listing the subset of fields desired.

Even if you do not have administrative access to the tables that constitute your queries, approaching your DBA with a basic understanding of indexing strategies will lead to a more effective conversation.

References

Babbar, A., Bjeletich, S., Mackman, A., Meier, J., & Vasireddy, S. (May, 2004). Improving .NET Application Performance and Scalability. Retrieved from https://msdn.microsoft.com/en-us/library/ff647793.aspx

March 5, 2016November 16, 2016 by anthonysmoak

SQL: Think in Sets not Rows

This article is also posted on LinkedIn.

Structured Query Language, better known as SQL, is regarded as the working language of relational database management systems (RDBMS). As was the case with the relational model and the concepts of normalization, the language developed as result of IBM research in the nineteen seventies.

“Left to their own devices, the early RDBMSs (sic) implemented a number of languages, including SEQUEL, developed by Donald D. Chamberlin and Raymond F. Boyce in the early 1970s while working at IBM; and QUEL, the original language of Ingres. Eventually these efforts converged into a workable SQL, the Structured Query Language” (Kriegel, 2001).

For information professionals and database practitioners, SQL is regarded as a foundational skill that enables raw data to be manipulated within a RDBMS. “This is a declarative type of language. It instructs the database about what you want to do, and leaves details of implementation (how to do it) to the RDBMS itself” (Kriegel, 2001).

Before the advent of commercially accessible databases, data was typically stored in a proprietary file format manner. Each vendor had detailed specific access mechanisms, which could not be easily configured and customized for access by alternate applications. As databases began to adopt the relational model, the arrival and eventual standardization of SQL by ANSI (American National Standards Institute) and ISO (International Standards Institute) helped foster access, manipulation and retrieval consistency across many products.

Think in Sets not Rows!

SQL provides users the ability to query and manipulate data within the RDBMS without having to solely rely on a graphical user interface. There are powerful extensions in the many variant structured query languages (e.g. T-SQL, DB2, PL/SQL, etc.) that provide functionality above and beyond ISO and ANSI standards. However, SQL practitioners must first and foremost remember that SQL is a SET BASED construct. The most efficient SQL code regards table data as a whole and refrains from manipulating individual row elements one at a time unless absolutely necessary.

“Thinking in sets, or more precisely, in relational terms, is probably the most important best practice when writing T-SQL code. Many people start coding in T-SQL after having some background in procedural programming. Often, at least at the early stages of coding in this new environment, you don’t really think in relational terms, but rather in procedural terms. That’s because it’s easier to think of a new language as an extension to what you already know as opposed to thinking of it as a different thing, which requires adopting the correct mindset” (Ben-Gan, 2012).

Working with a relational language based upon the relational data model demands a set based mindset. Iterative cursor based processing, if used, should be used sparingly.

“By preferring a cursor-based (row-at-a-time) result set—or as Jeff Moden has so aptly termed it, Row By Agonizing Row (RBAR; pronounced ‘ree-bar’)—instead of a regular set-based SQL query, you add a large amount of overhead to SQL Server” (Fritchey, 2014).

If all other set based options have been exhausted and a row-by-row cursor must be employed, then make sure to use an “efficient” (relatively speaking) cursor type. The fast-forward only cursor type provides some performance advantages with respect to other cursor types in a SQL server environment. Fast forward cursors are read only and they only move forward within a data set (i.e. they do not support multi-direction iteration). Furthermore, according to Microsoft Technet (2015), fast forward only cursors automatically close when they reach the end of the data. The application driver does not have to send a close request to the server, which saves a roundtrip across the network.

References:

Ben-Gan, I. (Apr, 2012). T-SQL Foundations: Thinking in Sets. Why this line of thought is important when addressing querying tasks. Retrieved from http://sqlmag.com/t-sql/t-sql-foundations-thinking-sets

Microsoft Technet. Fast Forward-Only Cursors (ODBC). Retrieved April 23, 2015, from https://technet.microsoft.com/en-us/library/aa177106(v=sql.80).aspx

Smoak Signals | Data Analytics Blog

Sharing Knowledge: Data, Visualization & Business

Tag / Database