Learn GROUP BY vs PARTITION BY in SQL

TLDR

If you are using SQL, you will eventually come across the GROUP BY and PARTITION BY clauses. While the Group BY clause is fairly standard in SQL, most people do not understand when to use the PARTITION BY clause. This easy to understand video uses some NBA season data to make the point very clear! I will show you the best use case of when to apply PARTITION BY.

Bonus content begins at the 10:03 mark, where I demonstrate a use case for the fundamentals I teach you earlier in the video.

The reviews are in, make sure to watch the whole video!

GROUP BY EXAMPLE

To begin, I demonstrate how to use GROUP BY in SQL Server Express to understand basic NBA team statistics based strictly upon the conference. I have to tell the database what to GROUP BY to generate all of the aggregate statistics. I select the conference and want to generate the aggregate sum of points, average points, and max points. I also order the results by the sum of points in a descending fashion.

When I run the query, the results show how the data points have been grouped by conference. The results show the sum of points by the two values in the conference field (Western and Eastern), the average points, and the max points. I can also see that there are no individual rows in this result, which is expected when using the GROUP BY clause with aggregate functions like MIN, MAX, SUM, and COUNT.

If I want to further break down the results and throw in a division, I need to also select the division field and add it to the GROUP BY statement as well. This action further slices the data points into specific conference and divisions that the teams play in.

PARTITION BY

Now, let’s talk about the OVER and PARTITION BY sub-clause. In this next example, I select the conference and bring in the points, which is our measure. Just like with GROUP BY, I sum the points, but I add “OVER” and the sub-clause “PARTITION BY.” This is where the magic happens because I tell SQL what data point to Partition by (i.e., conference) to show the total sum of points by conference.

When I run the query, the results show the sum of points by conference, and I can order the results by individual team points in descending order . The results show a breakdown of the sum of points by conference, but there are still individual rows in the results.

Here is a continuation of the same results for the Western conference teams:

This is an important distinction!! By using OVER and PARTITION BY, I can have data at the most granular level (unaggregated points at an individual team level i.e., the PTS field) combined with data at a higher granularity (points summed, averaged and the maximum points scored at an overall conference level).

The higher granularity of the data at the conference level makes the values for the last three statistics columns repeat (i.e., 135611 for SUM_PTS, 9040 for AVG_PTS and 9470 for MAX_PTS in the Eastern Conference). Similar data is returned for the Western conference.

I’m essentially allowed to have my data cake and eat it too with this best of both worlds approach!

IN SUMMARY

The GROUP BY statement is used to group rows that have the same values in a specific column or set of columns. When used with aggregate functions such as SUM, AVG, MAX, MIN, COUNT, etc., the GROUP BY statement allows us to calculate summary statistics for each group. The result will yield one row for each group. Typically, a GROUP BY statement will reduce the number of rows returned by your SQL.

On the other hand, the PARTITION BY statement is used to divide the data into partitions or subsets based on a specific column or set of columns (like conference in our case). Unlike GROUP BY, PARTITION BY does not reduce the number of rows returned in the result set. Instead, it adds a new column that shows the result of the aggregate function (e.g., SUM, AVG, MAX, etc.) for each partition.

LET PAT BEV COOK

So remember, when it comes to GROUP BY and PARTITION BY in SQL, just like how the Minnesota Timberwolves balanced an effective array of shots to lead the league in total points scored, understanding the nuances of when to use each statement can make all the difference in winning that crucial play-in game against your data! Yes you have to watch the video to understand this reference.

Also, I’m not bad at Pat Bev for going over the top, as this win was against a former employer who recently traded him. Success is always the best revenge!!

Happy querying!!

I appreciate everyone who has supported this blog and my YouTube channel via merch. Please check out the logo shop here.

Stay in contact with me through my various social media presences.

Thank you!!

Anthony B Smoak

Advertisement

Use Excel to Connect to SQL Server Data

TLDR

Connect directly to SQL Server data from within Excel. Also learn how to add and modify SQL statements from within Excel and pass them to SQL Server for data retrieval. If you need to quickly refresh data from SQL Server without hassle, then you need to watch this video!!

Intro

As a data professional, I am always looking for ways to optimize my workflow and increase efficiency. One of the techniques that I have found particularly helpful is making a direct connection between Excel and SQL Server. In this video, I will show you how to set up this direct connection and explain the benefits of using it.

What is a Direct Connection Between SQL and Excel?

First of all, let me explain what I mean by a direct connection between these two tools. Typically, when working with data in Excel, you would export the data from SQL Server into a .csv file and then import that file into Excel. This process can be time-consuming and cumbersome, especially if you are working with large datasets. With a direct connection, you can access the data in SQL Server directly from Excel, without the need for any intermediate steps.

What Do I Demonstrate in the Video?

To demonstrate this in the video, I walk you through an example. I am using the Wide World Importers DW sample database from Microsoft, which you can easily import into SQL Server. Within this database, I am looking at the fact.order table, which has over 230,000 rows and many columns. In Excel, I start with a blank sheet and navigate to the Data ribbon. From there, I select “From Database” and then “From SQL Server Database.”

This prompts me to enter a server name and a database name. If you have access to SQL Server, you can find the server name by connecting to the database engine. Otherwise, you may need to reach out to your database administrator to obtain this information (always stay on your DBA’s good side, if you know what’s good for you). Once you enter the server and database names, you can hit “OK” and Excel will work its magic to establish a connection.

At this point, you can preview the data from the table that you want to import. Excel will give you the option to transform the data if necessary, but we’ll just hit “Load.” Excel will then create a connection and query the SQL Server database and load the data directly into Excel. This means that you can always access the most up-to-date version of the data, without having to worry about exporting and importing files!!

Advantages of Connecting SQL and Excel

Now, let’s say that you need to update the data in Excel at a later time. Perhaps you have some ad hoc processes that reference this data and you need to ensure that you always have the latest version. With a direct connection, this is easy to do. You can simply go to the Data ribbon and select “Refresh.” Excel will connect to SQL Server and update the data in your Excel sheet with the latest data from the database.

This is incredibly powerful because it means that you can share your Excel sheet with others without worrying about whether they have the latest version of the data. As long as they have access to the SQL Server database, they will always see the most up-to-date version of the data when they open the Excel sheet.

One thing to keep in mind is that this type of direct connection is best suited for ad hoc purposes!! In other words, you should not use this to create production worksheets that will be used by others. This is because the direct connection is dependent on having access to the SQL Server database. If that access is lost, the Excel sheet will no longer be able to connect to the database and the data will be lost. Therefore, it is best to use this type of connection for temporary analysis and reporting purposes.

Once you have successfully set up the direct connection between Excel and SQL Server, you can easily refresh the data whenever you need it. To refresh the data, all you have to do is go to the Data tab, and click on the “Refresh All” button. This will refresh all the data connections in your workbook, including the connection to SQL Server.

Powerful Excel Functionality (PivotTables and Pivot Charts)

I don’t reference this in the video, but you can also use Excel’s PivotTables and PivotCharts to analyze and visualize the data. PivotTables allow you to group and summarize data in many different ways, while PivotCharts provide a visual representation of the data that is easy to understand. It may be easier for you to manipulate this data in Excel and extract additional insights than in SQL Server.

Financial analysts, in particular, should avoid taking manual inputs from any and everywhere (especially ungoverned data sources) and using this type of refresh for production purposes. As a recovering financial analyst I know your management hates automation and loves when you cut and paste random information from Bob in division finance. I also know they want to see you work 12 hours a day because the CFO needs that monthly IT spend variance to budget!! Please do yourself a favor and meditate hard for serenity every Sunday night.

Conclusion

In conclusion, using a direct connection between Excel and SQL Server can greatly improve your workflow when working with large datasets. By leveraging the full power of SQL Server’s querying capabilities and Excel’s Pivot tools, you can create powerful AD-HOC reports and analyze data in ways that would be difficult or impossible with other tools. Consider setting up a direct connection to SQL Server to streamline your ad-hoc workflow and improve your productivity.

Additional References to Maximize Your Learning

All views and opinions are solely my own and do not necessarily reflect those of my employer

I appreciate everyone who has supported this blog and my YouTube channel via merch. Please check out the logo shop here.

Thank you!!

Anthony B Smoak

P.S. I respect your hustle Financial Analysts. I ask that you respect those who want to make your life easier with report automation!!

Build Dynamic SQL with SQL Server and Excel

In this video you will learn how to use the SQL CASE statement to add “filter flexibility” to your front-end Excel worksheet, thus taking your SQL + Excel skills to the next level. I’ll reveal the UPDATED code I used to build a dynamic SQL statement using SQL Server and Excel.

By using Excel as a tool to pass cell values to SQL Server queries, you’ll be able to generate dynamic SQL statements on the fly, saving time and reducing the risk of errors in your code. Building upon the previous video “Call a SQL Server Stored Procedure using Excel Parameters”, let’s enhance your SQL skills and streamline your workflow.

Here is a screenshot of the front end Excel worksheet we setup in previous videos. This Excel sheet will execute a stored procedure call with parameters supplied from cells on the sheet.

Below is the stored procedure I use to enhance the code from previous videos. I setup a static SQL string that will serve as the base of the SQL statement. I then use the CASE statement to evaluate the cell values incoming from the Excel worksheet (with some slight manipulation for empty and default date values incoming from Excel).

Depending upon those values, the filter clause is dynamically built and appended to the base of the SQL string, which is then executed with the sp_executedqsl command. This command has many advantages with respect to protecting your code from a SQL injection attack.

If you need a breakdown of the code and the worksheet functionality, make sure to watch the video below.

Additional References to Maximize Your Learning

I always have fun creating this type of content and sharing with you, my YouTube channel followers.

Stay in contact with me through my various social media presences.

All views and opinions are solely my own and do not necessarily reflect those of my employer

I appreciate everyone who has supported this blog and my YouTube channel via merch. Please check out the logo shop here.

Thank you!!

Anthony B Smoak

Remove the Default Highlighting Effect in Tableau

Have you ever wanted to disable the default Tableau highlighting effect when you select a mark on your chart and then remove the filter? Even when the filter is removed via the “Remove All Filters” process, it can be confusing for the user experience when all values remain “greyed out”, tricking the user into thinking that their filter is still applied. This video will help you remedy this issue and improve your dashboard user experience.

Fortunately, there is a solution to this problem that is simple and easy to implement. In this video I will show you how to use a simple calculated field and highlight action to remedy the issue. This should be default behavior in Tableau, (help us out here Tableau!)

The solution approach involves creating a boolean calculated field and setting it initially to TRUE. Then, placing this calculated field on the detail of the chart that has a filter applied. Next, adding a highlight action to the same chart that you want to remove the “greyed out” effect for. In the “Add Highlight Action” pop-up box, the Source Sheet and the Target Sheet should be the same and the Selected Fields option should have the boolean calculated field checked.

By following these steps, you will be able to remove the greyed out effect on your Tableau chart when the “Remove All Filters” process is applied.

This not only improves the appearance of your dashboard but also makes it easier to understand the data.

★☆★ THESE ADDITIONAL FILTERING VIDEOS IN TABLEAU ARE WORTH YOUR TIME ★☆★

Don’t let the greyed out effect on your Tableau charts hold you back any longer. Watch the video and follow the steps outlined in this blog post to improve the appearance and functionality of your Tableau dashboards.

You can also follow my dapper data adventures on Instagram.

Stay in contact with me through my various social media presences.

All views and opinions are solely my own and do not necessarily reflect those of my employer

I appreciate everyone who has supported this blog and my YouTube channel via merch. Please check out the logo shop here.

Thank you!!

Anthony B Smoak

Passing Parameter Values from Excel to SQL Server

If you’re working with data in Excel and need to connect to a SQL Server database, there are a couple of ways to pass parameter values from Excel to SQL Server. In my first video, “Passing Parameter Values from Excel to SQL Server,” I show you how to connect to SQL Server and read values from a cell and pass those values to a native SQL query without using parameters.

Advantage: The first approach has the advantage of being quick and easy to implement. This is because it does not require any additional setup or configuration, such as creating stored procedures in SQL Server. Instead, the values are passed directly to the query, which can be executed immediately.

However, this approach can also be prone to SQL injection attacks, where a hacker inputs malicious SQL code into an input field in order to gain unauthorized access to the database.

Verdict: Speed over Security. Good for Ad-Hoc personal use.

In my second video, “Call a SQL Server Stored Procedure using Excel Parameters,” I demonstrate how to connect to SQL Server from Excel and pass cell values from Excel to SQL Server using a stored procedure. This approach is more secure because the values are passed to a stored procedure rather than a query. It simply requires the setup of a stored procedure in SQL Server, and I show you two ways to accomplish this feat.

Advantage: Stored procedures provide an added layer of security because they can be set to execute with specific permissions, and can be audited for changes and usage. This makes it harder for an attacker to gain unauthorized access to the database or to execute malicious SQL commands.

However, the potential small disadvantage of this approach is that the stored procedures will need to be updated and managed separately from the Excel file.

If you’re new to working with SQL Server and Excel, I recommend watching both videos. The first video will give you a good overview of the basics, while the second video will show you a more secure way to pass parameter values.

I always have fun creating this type of content and sharing with you, my YouTube channel followers.

You can also follow my dapper data adventures on Instagram.

All views and opinions are solely my own and do not necessarily reflect those of my employer

I appreciate everyone who has supported this blog and my YouTube channel via merch. Please check out the logo shop here.

Thank you!!

Anthony B Smoak

How to Swap Sheets in Tableau

Learn how to perform a useful Tableau hack that allows you to display multiple sheets in one container on your Tableau dashboard. In this video I use my personal training dashboard to show you step by step how this trick is performed. This tip is a must know for the intermediate to advanced dashboard builder as it will help you save space on your dashboard.

Learn more about the dashboard used in this video: https://youtu.be/MFluvSKJXnI

Interact with the Dashboard here:

Watching the video will make the concept clearer but I will provide an overview in this post.

Step 1: I create a Parameter named “Select a Chart”. You can see that I have chosen a list of allowable values and I place into the list the names of charts that I want to swap.

Step 2: I create a calculated field named “_Selected Chart”. It only holds the value of the parameter I created in Step 1.

Step 3: (Use screenshot below)

1. Place the “_Selected Chart” calculated field on the filter shelf of a chart that you wish to show and hide.

2. Edit the “_Selected Chart” filter and select the “Custom value list” option.

3. Type in the respective name of the chart that corresponds to the value that you entered in the parameters allowable values list in Step 1. Hit the plus button to the far right to add the value. Additionally add the value of “All” to the Custom value list in the same manner.

IMPORTANT: the value that you enter into your chart must match EXACTLY to the value that you placed on the parameter allowable values list.

Repeat this process for every chart that you wish to show and hide, making sure to type in the exact same chart name that you entered in the parameter allowable values list in Step 1.

Step 4:

Now it’s time to place all of your charts into the same object (i.e., horizontal or vertical container) on your dashboard . Make sure to show the parameter named “Select a Chart” on the dashboard so you have a combo box with the names of your charts inside that you can select.

Make sure to watch the video for exact details!!

Please like and subscribe on the Anthony B. Smoak YouTube channel.

All views and opinions are solely my own and do not necessarily reflect those of my employer

I appreciate everyone who has supported this blog and my YouTube channel via merch. Please click here

Thank you!!

Anthony B Smoak

How to Become a Data Analyst

I’ve been working with data for some 20 plus years as of the writing of this post. In the video below I captured my thoughts on the required hard and soft skills it takes to succeed as a data analyst. If you are looking to start your career in data as someone who has not yet graduated or as someone with tangential work experience, then this video will serve you well.

Do You Need a Computer Science Degree to be a Data Analyst?

This question is frequently asked by people such as yourself looking to make a move into data. The answer is no. You do not need a computer science degree to have a very successful data career. In the video I give my thoughts on computer science, but the reality is that although it may be helpful from a “getting a first job” perspective, it is not a requirement to succeed. Although I have an undergraduate computer science degree from Clark Atlanta University (shout-out to HBCU alums), some of the brightest minds I’ve worked with in the data space do not have a computer science degree. Bottom line; a formal computer science degree certainly helps but it is by no means necessary. All you need is the willingness to learn the tools and the perseverance to get your first data opportunity.

Do You Need a Computer Science Degree for a Data Career?

Hard Skills Required (View Video)

I’ll give you a hint, data visualization skills are a must and Tableau is the tool of choice for me.

Soft Skills Required

I’ll keep it short here and simply state that you should always look for ways to differentiate yourself and not just be seen as an interchangeable commodity worker. To paraphrase famed Harvard professor Michael Porter, a differentiation strategy advocates that a business must offer products or services that are valuable and unique to buyers above and beyond a low price. In this metaphor, think of yourself as a business and you bring multiple skill sets to your employer (other than being a single focus technical employee who can be easily outsourced for a lower price).

To be a differentiator, do not think of yourself as just being a tool specific analyst. Learn how to take requirements, communicate well, develop exceptional writing skills for business emails and documentation. Finally, learn how to present your analyses to people several pay grades above yourself when required. You want differentiation to be your competitive advantage. You do not want “low cost” to be your advantage for obvious reasons (if you’re like me, you want to be paid fairly for the value you provide).

Future Career Paths

In our jobs we desire mastery, autonomy and purpose. After a certain point in your career you may want to take a leap from the descriptive analytics path and move towards a predictive analytics path. Descriptive analytics (think data analyst or traditional business intelligence reporting analyst) deal with what has happened in the past while predictive analytics focus on what will most likely happen in the future. In order to level up in predictive analytics, you will need python, statistics, probability, and/or machine learning skills.

If you want to make the leap from data into management, you can consider obtaining an MBA or a masters degree in Management Information Systems. I happen to have an MBA from the Georgia Institute of Technology and a masters degree in Information Management from Syracuse. This may seem like a bit of overkill but I work in consulting where credentials are overly appreciated by clients (and I am a lifelong learner).

Interact with my Tableau resume here.

Conclusion

A career in data can be fun (in the early learning phases) and lucrative (mid to late career). In my case it has been a fulfilling career ever since I started work as a data analyst at General Motors many years ago. I turned myself from a commodity to a differentiator by not only learning the basics but also adding business understanding and a willingness to share what I know on this blog and my YouTube channel. I know that you can do the same. If you put in the time to learn along with the perseverance to land that first data role, you won’t need much luck at all to accomplish your goals.

Looking to land that first role or trying to move ahead in your current role? Then check out this post for the Keys for a Successful Career as a Data Analyst.

-Anthony Smoak

All views and opinions are solely my own and do not necessarily reflect those of my employer

I appreciate everyone who has supported this blog and my YouTube channel via merch. Please click here

Thank you!!

Build Better Sparklines in Tableau

So you want to add some spice to your bland looking Sparklines in Tableau? You have come to the right place (start by watching the video above). Let’s talk about how a Sparkline is defined per Wikipedia:

“A sparkline is a very small line chart, typically drawn without axes or coordinates. It presents the general shape of the variation (typically over time) in some measurement, such as temperature or stock market price, in a simple and highly condensed way. Sparklines are small enough to be embedded in text, or several sparklines may be grouped together as elements of a small multiple. Whereas the typical chart is designed to show as much data as possible, and is set off from the flow of text, sparklines are intended to be succinct, memorable, and located where they are discussed.”

Here are a few examples of Tableau specific sparklines in action (with latest complete month bubble indicators and reference lines): Notice how I do not include any data axes, but you can clearly recognize the data trends in the visuals.

Here is an example of how I used the sparklines demonstrated in the video to build a out a classic yet refined looking Tableau dashboard.

Interact with and download this workbook here.

For reference purposes I am going to list three formulas used in the completion of the sparklines, you’ll have to watch the video to learn how to put them together.

In this exercise I am using that standard Tableau Superstore data set which you can perform a Google search to find if you are using Tableau Public.

Calculated Fields

Calculated Field #1 (Name: SPRK_CircleMonths)

This calculated field puts a circle on the penultimate month data points. Penultimate is just a fancy SAT word way of saying “next to last”. When the month of the data point on the line chart (Order Date) equals the next to last order date month in the dataset, then return the Order Date.

//IF THE MONTH OF THE DATE ON THE LINE CHART EQUALS THE MONTH-1 OF THE MAXIMUM DATA POINT
// THEN RETURN THE DATE
If DATEPART('month',[Order Date]) = DATEPART('month',dateadd('month',-1,{MAX([Order Date])}))
Then [Order Date] END

Calculated Field #2 (Name: SPRK_CircleMonths)

This logic will be applied to the circles generated by the previous calculation SPRK_CircleMonths. Only the next to last month will meet the TRUE condition (which will be colored as red).

// IS THE MONTH OF THE CHART DATE EQUAL TO THE MOST RECENT DATE MONTH MINUS 1 MONTH
// E.G., NOV 2018 = NOV 2020 WILL RESOLVE TO TRUE DUE TO MATCHING MONTHS
DATETRUNC('month',[Order Date]) = DATEADD('month',-1,DATETRUNC('month',{max([Order Date])}))

Calculated Field #3 (Name: SPRK_RefLine Profit)

This logic will return the profit associated with the next to last month in the dataset to display on the reference line.

// RETURNS A VALUE USED FOR THE REFERENCE LINE
// IF THE MONTH OF THE DATE = THE MONTH OF THE MAXIMUM DATE MINUS 1 MONTH (GET A COMPLETE FIRST MONTH)
if DATETRUNC('month',[Order Date]) 
= DATEADD('month',-1,DATETRUNC('month',{max([Order Date])}))
THEN [Profit] END

When you put all the functions together in a manner according to the video, you end up with a more refined sparkline in my opinion. Big shoutout to the Data Duo for the inspiration on the dashboard I created and this technique. If you haven’t checked out any of their work make sure to do so.

Please like and subscribe on the Anthony B. Smoak YouTube channel.

All views and opinions are solely my own and do not necessarily reflect those of my employer

I appreciate everyone who has supported this blog and my YouTube channel via merch. Please click here

Thank you!!

Anthony B Smoak

Top 10 Functions in Tableau You Need to Know

Welcome Tableau learner!

Normally my videos are geared towards the intermediate level user in Tableau but in this post I want to share a video I recorded that tackles basic functions in Tableau that you should know. You will NOT be effective analyzing data in Tableau if you do not have a basic understanding of these functions.

I love to use data from basketball-reference.com in my videos. Specifically you can grab the player statistics I am using in the video here.

This video is so good, it received a mention in the Monthly Tableau roundup. See for yourself!

Also, here is a link to all of the Tableau functions from the knowledge base.

Please make sure to share this link with a new Tableau user in your circle and let me know what you think of the videos in the YouTube comment section.

Please like and subscribe on the Anthony B. Smoak YouTube channel.

All views and opinions are solely my own and do not necessarily reflect those of my employer

I appreciate everyone who has supported this blog and my YouTube channel via merch. Please click here

Thank you!!

Anthony B Smoak