August 4, 2025August 4, 2025 by anthonysmoak

How AI Redrafted the 2018 NBA Draft: Findings and Insights

I had been toying around with writing SQL code to redraft one of the most talent rich drafts in NBA history (my apologies to the 1984 and 2003 NBA drafts) and sharing the results on my YouTube channel. Fortunately, I had a serendipitous coding session with my good buddy Terence down in Austin who introduced me to a cutting-edge AI tool (i.e., Windsurf AI). This initial session fundamentally changed how I think about analytics and AI-assisted coding.

If you’re an NBA, data and SQL enthusiast, then the 2018 NBA draft represents a fascinating case study for retrospective analysis.

We’re talking about a class that produced MVP-caliber talents like Luka Dončić (originally 3rd), Trae Young (5th), and Shai Gilgeous-Alexander (11th). Not to mention Ironman style players like Mikal Bridges (who has never missed an NBA game in his career thus far) and the unexpected emergence of undrafted gems like Duncan Robinson.

Seven years of career data later, we have enough statistical evidence to ask the crucial question: If NBA front offices knew then what we know now, how would this draft unfold?

What I loved about having Windsurf AI as my co-pilot (no relation to Microsoft) is that it handled the heavy lifting while I focused on the strategic analysis. The term “vibe-coding” has entered the lexicon recently, and it is used as a means to let AI instantly translate one’s analytical vision into executable T-SQL code.

When using any generative AI tool, the key is feeding detailed, comprehensive prompts to the models (the more context you give these models, the better your output becomes).

The components of my prompt included the following (you can watch the full video if you want to hear me recite the entire prompt).

My comprehensive prompt included:

-- 1. Role definition (NBA GM + Statistical Expert + T-SQL Specialist)
-- 2. Clear objectives (redraft based on career performance)
-- 3. Data specifications (column names, data types, constraints)
-- 4. Business rules (handle undrafted players as position 0)
-- 5. Output requirements (normalized scores, impact analysis)

I settled on a redraft approach based upon a Career Impact Scoring methodology:

Impact Score Formula:

Points: 1.0x multiplier
Rebounds: 1.2x (slightly more valuable)
Assists: 1.5x (playmaking premium)
Steals/Blocks: 3.0x (defensive impact recognition)
Field Goal %: 100x multiplier
Free Throw %: 50x multiplier
Three-Point %: 100x multiplier
Games Played Bonus: Durability factor
Turnover Penalty: Subtracted from total score (my apologies Trae Young)

Re-Draft Results:

Without spoiling the complete breakdown, I’ll share just enough to demonstrate the methodology’s power:

The most stunning result? Jalen Brunson emerged as the 5th overall pick in our AI-driven redraft, a player who originally went 33rd overall to Dallas.

If Brunson jumped from 33rd to 5th, imagine what happened to the other lottery picks. Some players who went in the top 10 dropped dramatically, while others who were considered “reaches” at the time proved to be incredible value picks.

Player Categorizations:

Although the AI initially suggested several player categories based upon their redrafted positions, I prompted the model to incorporate the specific logic and code I desired into the overall script (because I know how to write SQL).

Category Breakdowns:

Massive Undrafted Steals: Players who weren’t drafted but earned first-round redraft positions
Undrafted Steals: Players who weren’t drafted but earned a first or second-round redraft position
Huge Steals: Late picks who jumped into a top 10 pick
Steals: Later picks who jumped into the lottery
Busts: Top 5 picks who fell to at least the 21^st spot
Major Busts: Top 10 picks who fell out of the first round entirely
Expected Value: Players who performed roughly as anticipate

The Broader Implications for Data Professionals

For me, Windsurf served as a force multiplier and not a replacement for my SQL knowledge. It readily served as an amplifier of skills that I already possess. I still had to understand basic database concepts and a little query optimization in order to craft effective prompts and validate the model’s outputs. The difference between mediocre and exceptional AI-generated code lies in prompt quality, so put in the upfront time and effort to craft an initial prompt.

The Future of Analytics: Collaboration, Not Competition

In my opinion, this project undertaking represented something larger than a fun NBA analysis, it’s a glimpse into the future of data analysis work. We’re entering an era where analysts spend less time wrestling with syntax and more time on strategic thinking, hypothesis formation, domain knowledge, storytelling and insight communication.

We’re in this sweet spot right now where these types of agentic AI tools are assisting data analyses as opposed to fully completing data analyses. The key (right now, before the singularity) is embracing this transition rather than resisting it so you can remain ahead on the knowledge curve. Start experimenting with AI-assisted coding tools now, while the learning curve is manageable and the competitive advantage is still emerging.

Your Next Steps:

If you’re intrigued by the possibilities of AI-enhanced analytics, I encourage you to:

Download Windsurf AI and experiment with your own datasets
Practice prompt engineering with progressively complex analytical requests
Combine your domain expertise with AI capabilities for unique insights
Share your experiments to bolster your reputation and build your data portfolio

The full video walkthrough demonstrates every step of this process, from initial prompt crafting through final GitHub deployment.

In the full video you can discover which players claimed the top 4 spots in our redraft.

As always, get out there and do some great things with your data!

Join this channel to get access to perks: ► https://www.youtube.com/channel/UCL2ls5uXExB4p6_aZF2rUyg/join

Get the code and the data file used in the video here: ►
https://github.com/ASmoak/NBA_2018_ReDraft

April 19, 2025April 20, 2025 by anthonysmoak

Tableau Dashboard Layouts: Master the Art of Containers

The Wait is Over. Watch this Tableau Dashboard Container Layout Video!

This video was a long time coming. If you’ve ever dealt with layout containers in Tableau (which are about as intuitive as a Thomas Pynchon novel) you know they can leave you scratching your head if you’re new to the tool. And yes, I do own a copy of Gravity’s Rainbow.

Luckily for you I was up I was up at 2:30 a.m. on a Sunday recording the second part of my popular Tableau dashboard tutorial, because that’s how passionate (and stubborn) I am about providing you educational videos of value. Quality over Quantity!

Watch the preview video above, you’ll observe that I’m teaching you a step-by-step breakdown of how I laid out the visual components for an advanced Tableau dashboard using layout containers. This is part two of the series, and while you can absolutely follow along here, the full experience (with all the dashboard elements, charts, and finishing touches) is available to members of my YouTube channel.

I demonstrate a floating container technique that I learned from Curtis Harris’s excellent video on the subject (look up “Things I Know About Tableau Layout Containers” on YouTube).

Before I forget, I want to give a shout out to Dmitry Shirikov, whose original design I reverse engineered and rebuilt (with his permission), and Murilo Cremon for original overall inspiration.

Let’s get into what I actually did.

What You’ll Miss If You Don’t Watch the Full Video

There’s much that you won’t fully grasp until you see it in action in the video. Things like:

How I embedded a YouTube link in an image element
The way I sequence container stacking to prevent the annoying TILE factor
How to work around Tableau’s quirks when resizing and aligning objects
What I do when Tableau auto-adds objects I don’t want
And most importantly, how all the complex pieces come together to form a professionally polished dashboard

All of this is explained in detail in the full tutorial video, available to my channel members.

Your Next Steps

If you’ve been struggling with dashboard layouts, or you’ve watched my first Tableau dashboard build and you’ve been waiting forever for part 2, here’s what I recommend:

Rewatch Part 1 of this tutorial series (it’s free on my channel).
Follow-along and implement the container techniques I’ve outlined here.
Consider becoming a member to access the full Part 2 tutorial which clocks in at 45 minutes!
Download and admire the actual Dashboard from Tableau Public.

Join the Channel, Build Better Dashboards

Becoming a member supports the channel and helps me keep making high-quality Tableau content. Tutorials like this take hours of research, recording, editing, and production. I don’t mind putting in the work because I know how much value it provides (thank you for all your wonderful comments, I read them all) but your support goes a long way.

Thanks for rocking with me all these years; now get out there and do some great things with your data!

-Anthony Smoak

I appreciate everyone who has supported this blog and my YouTube channel.

Stay in contact with me through my various social media presences.

Click here to Join my YouTube channel to get access to value added bonus videos and behind the scenes posts.

December 9, 2024April 19, 2025 by anthonysmoak

Developing Your Personal Brand with LinkedIn

Career Advice, Video
Career, Career Advice, job-search, social-media, Video
1 Comment

So I recently had the opportunity to speak about building your online presence during a webinar. I firmly believe that an online presence can open up doors for you, especially if you are looking to break into the data world. I consider myself lucky in that I started on my data journey many years ago before it was cool or impossibly difficult to get a starting opportunity.

If I was starting from scratch today, I would definitely use LinkedIN to raise my visibility and showcase my strengths to potential employers.

Step 1: You Need a LinkedIn Profile: Full Stop

At its core, LinkedIN is the digital equivalent of a modern résumé, but with a far broader reach. If you don’t have a profile, you’re invisible in the professional world. An optimized LinkedIN profile is a must, especially in a competitive field like data analytics. When recruiters or potential employers search for candidates with your skill set, your profile will be a key factor in whether or not they reach out to you.

Actionable Tip: Make sure your profile is complete, with a compelling headline, clear summary, and a detailed work history that includes relevant keywords (think “SQL queries,” “data modeling,” “performance tuning”) to increase discoverability. Don’t underestimate the power of a well-written “About” section; this is your opportunity to tell your professional story in a way that resonates with both humans and algorithms.

For example, if you look at my LinkedIn profile, you’ll see that my about section is full of verbiage describing my skills, education, online presence and current workplace.

Real-World Insight: I had a recruiter contact me for a previous job opportunity simply because my profile contained the right keywords and was well-organized. I can’t stress this point enough, the right visibility can lead to unexpected opportunities.

Step 2: Don’t Leave Your Profile Alone: Engage with Content

Once you’ve set up your profile, you need to stay active. It’s tempting to think that just having a polished profile is enough, but to really stand out, you must engage. Start by sharing informative articles, insights, and news relevant to your industry. Sharing curated content is valuable, but creating your own posts and/or commentary will increase your visibility.

Actionable Tip: Instead of just hitting “reshare” on a post you found useful, add your thoughts. Comment on what you found interesting about the article and how it connects to your work. Reshares without lead in commentary don’t get as much traction on LinkedIN.

Relatable Anecdote: The idea of putting yourself out there can feel daunting; especially if you’re more introverted. But think of it as sharing your knowledge and expertise with others, in a format that is helpful and informative. It doesn’t need to be an elaborate blog post; even a quick tip or a link with a few insightful lines can go a long way.

Step 3: Create Your Own Content: Yes, You Can Do It

I believe this is where you can begin to truly differentiate yourself. If you follow my LinkedIN profile, you’ll see multiple videos of me sharing knowledge on different data analytics tools. I guarantee that you will likely have knowledge that others in your field can benefit from. Remember that your experiences and expertise are unique and valuable.

We seem to take for granted that our personal knowledge is obvious or widely known by others, but believe me it is not. Even if someone is sharing about a similar topic, they can’t share information with your unique point of view!

Actionable Tip: Start small. Post a brief tip, or share an interesting challenge you faced during a recent project and how you solved it. This doesn’t have to be a huge production; just a couple of lines can spark engagement and show that you’re actively contributing to the field.

Overcoming the Introvert’s Challenge

For those of you who consider yourselves introverts, don’t worry, creating content doesn’t require you to be an extroverted social butterfly. In fact, many professionals, myself included, are introverted (yes it’s true). The key is to focus on sharing knowledge, not on putting on a performance. A great number of my YouTube videos don’t even show my face. I just have good screen capture software and a quality XLR microphone.

The Bottom Line: Don’t just have a LinkedIN profile; use it as a tool for professional growth. Start by optimizing your profile, sharing relevant content, and eventually creating your own posts to showcase your expertise.

Remember that visibility brings opportunity. Start small, stay consistent, and watch as your professional presence grows, benefiting both you and your employer in the process.

Until next time.

I appreciate everyone who has supported this blog and my YouTube channel.

Stay in contact with me through my various social media presences.

Join my YouTube channel to support the channel:
https://www.youtube.com/channel/UCL2ls5uXExB4p6_aZF2rUyg/join

December 9, 2024 by anthonysmoak

Boost SQL Performance: 4 Essential Optimization Tips

Performance matters!

I decided to break down some quick tips to help you write faster, more efficient SQL queries.

1. Limit Your Use of SELECT *

While SELECT * seems like a quick and easy solution, it’s a performance killer, especially when you’re dealing with large tables. By requesting all columns, the database has to load unnecessary data, which can slow down execution, especially as your dataset grows.

Tip: Be intentional about the columns you select. Only retrieve the data you need. This not only improves performance but also minimizes the load on your system.

2. Avoid DISTINCT and UNION When Unnecessary

Checking for uniqueness often comes at a cost. Both DISTINCT and UNION use sorting algorithms to remove duplicates, which is inherently time-consuming. If you don’t explicitly need unique records, avoid them.

Instead of UNION, use UNION ALL. While UNION will filter out duplicates, UNION ALL keeps all your records. It can perform your combining of records faster since it bypasses the sorting step.

3. Use Sargable Operators in the WHERE Clause

A sargable query (short for “search argument able”) is one that allows the SQL optimizer to take advantage of indexes, leading to faster data retrieval. When you use operators that are sargable (e.g., =, >, <=, BETWEEN), the query can utilize an index to find data quickly instead of performing a full table scan, which can be slow and resource-intensive.

For example, a query like WHERE customer_id = 12345 is sargable, whereas WHERE YEAR(order_date) = 2023 is not, because the database can’t use an index efficiently for that condition.

4. Use Indexes Appropriately

Indexes are extremely useful when it comes to speeding up query execution. Without them, the database is forced to scan the entire table to retrieve the data you need.

However, it’s essential to use indexes judiciously. Too many indexes can slow down insert and update operations.

Real-World Example: Sales Data Query Optimization

Imagine you’re a data analyst working with sales data, trying to identify the top-selling products for the last quarter. A poorly optimized query might look like this:

SELECT * FROM sales_data WHERE YEAR(sale_date) = 2023;

This query retrieves all columns (SELECT *), performs a non-sargable operation on sale_date, and might even require a DISTINCT if there are duplicates. All this can be painfully slow.

Instead, a more optimized query would look like this:

SELECT product_id, SUM(sales_amount)
FROM sales_data
WHERE sale_date BETWEEN '2023-01-01' AND '2023-12-31'
GROUP BY product_id
ORDER BY SUM(sales_amount) DESC;

In this version:

We select only the necessary columns (product_id, sales_amount).
We use a sargable BETWEEN operator on sale_date.
We aggregate data and order it efficiently to find the top products.

The Takeaway: Stay Sharp, Stay Efficient

Writing efficient SQL queries isn’t just about speed; it’s about making your data retrieval processes more scalable and reliable.

Be intentional about your query design. Use SELECT * sparingly, avoid unnecessary operations like DISTINCT and UNION, and always keep indexing and sargable operations in mind

I appreciate everyone who has supported this blog and my YouTube channel via merch. Please check out the logo shop here.

Stay in contact with me through my various social media presences.

Join my YouTube channel to get access to perks:
https://www.youtube.com/channel/UCL2ls5uXExB4p6_aZF2rUyg/join

July 23, 2024December 12, 2024 by anthonysmoak

Increase SQL Query Performance with Indexes

I have always recommended that if you develop SQL based analytics, you should have some understanding of indexes when it comes to speeding up queries.

Sometimes we do not have access to create indexes on the physical tables (especially in a production environment), but approaching your DBA with some basic knowledge of indexing will put you ahead of the game.

if your queries are running longer than they need to be, it’s time to speed them up with some strategic indexing. Very briefly, here’s how you can fix your slow running query issues via indexing your tables.

Step 1: Implement a Covering Index Strategy

A covering index includes all the columns referenced in your SELECT or WHERE clauses. This means the database engine can fulfill your query using just the index, without having to access the base table data. This act can drastically reduce query execution time and improve your data visualization processes if needed. The following is an example of TSQL code for MS SQL Server demonstrating a covering index using the INCLUDE keyword:

— Creates a nonclustered index on the Person.Address table with four included (non-key) columns.
— index key column is PostalCode and the non-key columns are
— AddressLine1, AddressLine2, City, and StateProvinceID.

CREATE NONCLUSTERED INDEX IX_Address_PostalCode
ON Person.Address (PostalCode)
INCLUDE (AddressLine1, AddressLine2, City, StateProvinceID);

Step 2: Utilize Clustered Indexes

Clustered indexes should typically be applied to your primary key column. A clustered index sorts and stores the data rows in the table based upon the key values. This organization is particularly efficient for queries that retrieve a range of values.

Step 3: Apply Non-Clustered Indexes

Non-clustered indexes are typically applied to foreign keys. These indexes create a separate structure that points back to the base data rows in the table. They are ideal for speeding up queries that involve joins, which are common in business intelligence operations.

Find the Sweet Spot

With indexing, balance is key. Under-indexing can cause your queries to run longer, while over-indexing can negatively impact data insertion and updates. Striking the right balance ensures your queries remain efficient without compromising the performance of other operations.

Indexing Rule #4080: If it’s linked by it, filtered by it, or sorted by it….index by it.

(I saw that rule on a reddit forum once, and thought it made perfect sense. The 4080 reference is for my ATCQ fans.)

If you want more detail on indexing, you can read this blog post here. I wrote it a few years ago, but it is old gold, still valuable!

I appreciate everyone who has supported this blog and my YouTube channel via merch. Please check out the logo shop here.

Stay in contact with me through my various social media presences.

Join my YouTube channel to get access to perks:
https://www.youtube.com/channel/UCL2ls5uXExB4p6_aZF2rUyg/join

July 15, 2024December 1, 2024 by anthonysmoak

Know These SQL Concepts for your Data Interview

Over the years I have conducted a number of interviews with candidates who claimed to know SQL, but could not clearly verbalize their understanding of some basic concepts. I want you to understand these following concepts at a minimum, so your next SQL interview isn’t cut short.

First, distinguish between a left outer join and an inner join. A left outer join retrieves all records from the left table and returns only the matched records from the right table. On the other hand, an inner join retrieves only the matched records from both tables.

Next, familiarize yourself with the WHERE clause versus the HAVING clause. Use the WHERE clause to filter records before any groupings are made, while the HAVING clause filters records after grouping. This understanding will enable you to filter your data correctly.

Also, understand the difference between the UNION and UNION ALL statements. UNION removes duplicate records, whereas UNION ALL includes all records, duplicates and all.

Be ready to discuss Common Table Expressions (CTEs) versus temporary tables for ease of query writing and to forgo complicated sub-queries. CTEs are best used for readability, while temp tables are physically stored in memory (allowing for the use of indexes if necessary to optimize performance).

Understanding clustered indexes versus non-clustered indexes is vital for query performance. Clustered indexes sort and store data rows in the table based upon key values, while non-clustered indexes create a separate structure with efficient pointers back to the base table.

Lastly, ensure your queries are sargable (Search ARGument ABLE). Sargable queries use indexes efficiently, improving performance, unlike non-sargable queries, which slow down data retrieval.

I appreciate everyone who has supported this blog and my YouTube channel via merch. Please check out the logo shop here.

Stay in contact with me through my various social media presences.

Join my YouTube channel to get access to perks:
https://www.youtube.com/channel/UCL2ls5uXExB4p6_aZF2rUyg/join

June 9, 2024April 19, 2025 by anthonysmoak

Create a Stunning Advanced Dashboard in Tableau

Since Tableau Public was recently upgraded to enable local saving (It’s about time), I decided to look and see what new dashboards were out there that I could learn from, and subsequently teach others to rebuild. My mantra is that you learn the most, by teaching others.

I did find the perfect dashboard built by Dmitry Shirikov on his Tableau Public page! There are also design elements from a dashboard created by Murilo Cremon so I tip my cap to him as well.

I reached out to Dmitry via Linkedin and asked for his permission to use this dashboard as a teaching tool for video lessons and he graciously gave me permission. This further demonstrates the collaborative spirit that thrives on Tableau Public. I offer a big thank you to Dmitry!

What I like about his dashboard is the use of the standard Tableau standard superstore dataset combined with interactive/dynamic year over years metrics and thoughtful design.

It contains deceptively simple KPIs, intermediate level metric swapping and a button selector process that relies on parameter actions. Parameter actions update the value of a parameter based on a user interacting with the marks in the view (in this case, the Sales, Profit or Orders dimensions).

I spent a great deal of after-work and weekend time researching how to put this dashboard back together and rebuilt it piece by piece so I could understand what was done.

It also took some time to record and edit this video as it is coming in at over an hour in length! I can’t believe I put this much time and effort into a free product!

My hope is that by dissecting the dashboard and teaching you how to build the chart elements, you can gain some valuable dashboard building insights to add to your repertoire.

If you think a part 2 video regarding the actual dashboard layout will be valuable, please leave a comment on my YouTube channel!

Your Next Steps

Here’s what I recommend:

Download and admire the actual Dashboard from Tableau Public.
Watch the Part 2 preview of this tutorial series to implement the container techniques I’ve outlined.
Consider becoming a member to access the full Part 2 tutorial which clocks in at 45 minutes!

You can also watch a preview of Part 1 here:

I appreciate everyone who has supported this blog and my YouTube channel via merch. Please check out the logo shop here.

Stay in contact with me through my various social media presences.

Join my YouTube channel to get access to perks:
https://www.youtube.com/channel/UCL2ls5uXExB4p6_aZF2rUyg/join

Thank you!!

Anthony B Smoak

February 4, 2024March 9, 2024 by anthonysmoak

How to Make a Histogram in Excel

Let’s learn how to build a histogram in Excel with some interesting NBA data. Personally, I use the histogram in my data analyses to help me understand how my data is distributed and to identify any outliers or extreme values that warrant further investigation. In this blog post, I’ll use Kobe Bryant’s playoff career scoring game log as our data source to create a histogram in Excel. Sound fun? Of course, it does. Let’s go!

Inserting a Histogram Chart in Excel

I advocate that you watch the video above for more detail, but you’ll find this blog post equally informative. To create a histogram in Excel, we’ll need a column of numerical data to analyze. In this case, I have Kobe Bryant’s playoff career scoring game log, which shows how many points he scored in each of his playoff games during his career. I referenced this data from basketballreference.com.

The first step is to highlight the data column and press [Ctrl + Shift + Down] to select the entire range. Then, go to the Insert tab and click on the Histogram icons as shown below. This will insert a histogram chart based on your data.

Formatting the Histogram Chart in Excel

The default histogram chart may not look very attractive or informative. Thus, we will use my personal favorite technique to kick start the formatting (the “easy” way) which is the selection of “Layout 2” as a chart style.

Now this option may be easy (if you’re using Excel 2016 or greater) but you must know which additional options to change in order to make the histogram look more presentable.

The Layout 2 style does an excellent job of removing the horizontal grid lines and vertical axis (so we can keep the Edward Tufte style “chart junk” to a minimum). It also adds data labels to our bar charts as well for interpretation clarity.

Formatting the Histogram Chart in Excel

We can further improve our histogram’s appearance by applying some formatting options. For example, we can:

Delete the grid lines and the vertical axis that we don’t need (already performed by Layout 2)
Increase the font size of the data labels
Change the fill color of the bars
Add a chart title

To access the formatting options, right-click on any element of the chart and select “Format”. Alternatively, we can hit [Ctrl + 1] to open the Format pane.

Adjusting the Bin Size and the Overflow/Underflow Options in Excel

One of the most important aspects of any histogram is the bin size, which determines how the data is grouped into intervals. The bin size affects the shape and the interpretation of our histogram. We can adjust the bin size by selecting the horizontal axis and changing the “Bin width” option in the Format pane.

The default bin size for this data set is 5.7, which means that the data is grouped into intervals of 5.7 points. For example, the first bin includes the values from 0 to 5.7, the second bin includes the values from 5.7 to 11.4, and so on.

However, this bin size may not be very intuitive or meaningful. A better option for our histogram is to use a bin size of 5, which means that the data is grouped into intervals of 5 points. For example, the first bin includes the values from 0 to 5, the second bin includes the values from 6 to 10, and so on. This makes the histogram easier to read and understand.

Another option that we can adjust in the histogram chart is the overflow and the underflow bins. These are special bins that capture the values that are above or below a certain threshold.

For example, we may want to create an overflow bin that includes all the games where Kobe scored more than 40 points, and an underflow bin that includes all the games where he scored less than 10 points. To do this, we can select the horizontal axis and change the Overflow bin and the Underflow bin options in the Format pane.

After applying the overflow and the underflow options, the histogram chart looks like this:

Histogram Axis Notation

You’ll notice the histogram’s horizontal axis includes both brackets “[” and parentheses “(“. I will quote the Microsoft blog to explain this notation.

“In our design, we follow best practices for labeling the Histogram axis and adopt notation that is commonly used in math and statistics. For example, a parenthesis, ‘(‘ or ‘)’, connotes the value is excluded whereas a bracket, ‘[‘ or ‘]’, means the value is included. “

In our histogram for example, the notation (20, 25] indicates that the respective bar includes any value greater than 20 but less than or equal to 25.

Interpreting the Histogram Chart and Finding Outliers in Excel

Our new histogram isn’t just pretty, it’s equally informative allowing us to answer questions that we couldn’t easily determine from a wall of numbers in spreadsheet form. The histogram easily helps us understand Kobe’s playoff scoring distribution. We also gain an understanding of his outlier games. For example, we can now:

..locate the mode, which is the most frequent value or interval. In this case, the mode is the interval from 20 to 25, which means that Kobe Bryant scored between 20 and 25 points in most of his playoff games.
..find the range, which is the difference between the maximum and the minimum values. In this case, the range is 50, which means that Kobe Bryant’s playoff scoring varied from 0 to 50 points.
..find the skewness, which is the asymmetry of the distribution. In this case, the distribution is right-skewed, which means that the longer tail of values is on the right side of the distribution. This indicates that Kobe Bryant’s playoff scoring was more concentrated in the lower values. This makes perfect sense as it is much harder to score more points as opposed to lesser points.
..find the outliers, which are the values that are far away from the rest of the data. In this case, the outliers are the point score values that are greater than 40.

I hope you enjoyed this blog post and learned how to create a histogram in Excel to analyze data distribution and outliers!!!

Let me leave you with this highlight package of Kobe’s best dunks:

I appreciate everyone who has supported this blog and my YouTube channel via merch. Please check out the logo shop here.

If you want to learn all the latest tips and tricks in core data analysis tools, stay in contact with me through my various social media presences.

And don’t forget to subscribe to my YouTube channel for more data analyst tips and tricks.

Thank you!!

Anthony B Smoak

Credit Where Credit is Due: Kobe wallpaper created by James Chen <– (great job James!!)

December 3, 2023 by anthonysmoak

SQL Window Functions – Lead and Lag

The LEAD and LAG window functions in SQL offer the simple yet powerful ability to analyze data beyond the current row. They allow you to access values from rows before and after the current one, without using complex self-joins or subqueries. You can use these two functions to compare values across rows, calculate differences, and perform other operations that require looking ahead or behind in your dataset.

Back when I used Microsoft Access as my main data tool and needed to access information from a previous row to do calculations, I would use VBA coding to implement solutions. At the time, it did not occur to me that I could achieve similar functionality using SQL Server’s LEAD and LAG window functions.

In this highly informative video (if I do say so myself, I’m biased but I’m correct), I show you a use case for the LAG function to calculate the average time for an invoice to move through approver workflow steps. At a high level, this entails subtracting the approval date from a prior row from the current row’s approval date.

LEAD & LAG Window Function Syntax

The LAG window function allows you to access data from a prior row to “look back” dynamically as you query data. This aids calculating deltas, running totals, and more. The syntax of LAG function is as follows:

LAG(column_name, offset, default) OVER (PARTITION BY column_name ORDER BY column_name)

The LEAD window function allows you to access data from an upcoming row to “look ahead” dynamically as you query data. The syntax of LEAD function is as follows:

LAG(column_name, offset, default) OVER (PARTITION BY column_name ORDER BY column_name)

The PARTITION BY clause is key; it that ensures we only look back (or ahead) within a given value in the column_name (for example as shown in the video, an invoice number) otherwise row transitions would provide invalid lookups.

Why Use Lead and Lag functions in SQL?

Using LEAD and LAG functions in SQL can help you analyze your data faster and easier, because you can:

Perform calculations and comparisons across rows without using self-joins or subqueries, which can be complex and slow.
Avoid exporting your data to Excel or other tools, which can be time-consuming and error-prone.
Simplify your code and improve its readability and maintainability.

As I mentioned earlier, if I had known about these two functions years ago when I was trying to access prior values via VBA and loops, I could have saved myself a lot of time, increased my efficiency and made my approach much easier to read in the resulting documentation.

Conclusion

I believe that learning from examples is critical for mastering both logical concepts and SQL syntax. By watching the linked video, you will gain the knowledge to tackle similar problems in your own data analysis scenarios.

Additionally, LEAD and LAG provide simpler, set-based solutions for cross-row data comparisons that would otherwise require procedural coding.

LEAD and LAG functions allow you to merge the past and the future!

I appreciate everyone who has supported this blog and my YouTube channel via merch. Please check out the logo shop here.

If you want to learn all the latest tips and tricks in core data analysis tools, stay in contact with me through my various social media presences.

Thank you!!

Anthony B Smoak

November 27, 2023July 22, 2024 by anthonysmoak

The Power BI Decomposition Tree Guide for Data Analysis

I was recently teaching someone how to use the decomposition tree in Power BI and it clicked that this topic would make for a great video lesson. What I love about the decomposition tree is that it enables data analysts to conduct root cause analyses, identify patterns and discover insights that are not readily apparent. For example, if we want to understand the contributing factors to our small business profits based upon the data at hand, this visual fits the need to a tee.

Specifically, the decomposition tree lets you visualize data across multiple dimensions and enables drilling down into your dimensions in any order. As a bonus, it’s also an artificial intelligence (i.e., A.I.) visualization, so you can ask it to find the next dimension to drill down into based on certain criteria.

The picture below illustrates how our Smoaking Coffee Company Profits can be subdivided by dimensions across the top of the visual. Massachusetts apparently likes their coffee (Smoaking Coffee is hypothetically better than Dunkin’).

Absolute vs Relative AI Splits

Another benefit of the decomposition tree is the ability to choose between two types of AI splits: absolute and relative. AI splits are the automatic breakdowns that Power BI suggests based on your data.

Absolute AI splits show you the highest or lowest contributors to the measure you are analyzing. In my example, if we are looking at profit by market size, the absolute AI split for high value will show us the market size that has the highest profit, in this case the Central region with $59,337 in profit.

Relative AI splits shows us the most interesting or unusual contributors to the measure we are analyzing. For example, if we switch the Analysis type to Relative from Absolute and perform the same analysis, the relative AI split will show us the product category that has the highest profit compared to its expected value based on the other categories. In this instance it is the Colombian coffee with $44,131 in profit. This number is lower than the absolute value of $59,337, but relative to it’s other product peers, it stands out.

You can switch between absolute and relative AI splits by going to the format visual pane and selecting the analysis option. You can also choose whether you want to see the high or low values by clicking on the arrow next to the plus sign on the actual decomposition tree values.

Drill Through to Details

The decomposition tree is a great way to get a high-level overview of your data, but sometimes you may want to see the details behind the numbers. For example, if you are looking at profit by region, you may want to see the individual transactions that make up the profit for a specific region.

Power BI allows you to drill through to another page that shows the details of your data. To do this, you need to have a detail page that has the same measure as the one you are using in the decomposition tree. For example, if you are using profit as your measure, you need to have a detail page that has profit as well.

To drill through, you need to right-click on a node in the decomposition tree and select drill through to your detail page.

This will take you to the detail page and apply the filters based on the path you followed in the decomposition tree. For example, if you drilled through to the product value of Colombian, you will see the details of the profit transactions for products noted as Colombian.

This is a very useful feature that allows you to see the underlying data behind the summary. You can also use the back button to go back to the decomposition tree and explore other paths.

Use Bookmarks to Save and Share Your Analysis

Another cool feature of the decomposition tree is that it fully supports bookmarks. Bookmarks are a way to save and share your analysis with others. You can use bookmarks to capture the state of your report, including the filters, slicers, and visuals.

To create a bookmark, you need to go to the view tab and select bookmarks pane. Then, you need to click on the add button to create a new bookmark. You can give it a name and a description to make it easy to identify.

You can also link your bookmarks to buttons or images on your report. This way, you can create interactive scenarios that allow you to switch between different views of your data. For example, you can create a button that shows you the decomposition tree for the lowest state profit value and associated region.

To link a bookmark to a button or an image, you need to select the button or the image and go to the action option in the format shape pane. Then, you need to turn on the action and select bookmark as the type. You can then choose the bookmark that you want to link to the button or the image.

Using bookmarks, you can create dynamic and engaging reports that showcase your analysis and tell a story with your data.

Some Final Tips

Before I conclude, I want to share some final tips on how to use the decomposition tree in Power BI.

You can rename your dimensions in the decomposition tree by selecting them in the Visualizations pane “Explain By” area. Simply right click on a value and select “Rename for this visual”. This can help you to customize the labels and make them more meaningful.
You can lock the values in the decomposition tree by selecting the area to the left of the dimension name at the top of the decomposition tree visual (select the light bulb if the dimension was placed in the visual by AI). This will prevent the users from changing the nodes or the AI splits. This can be useful if you want to fix the analysis and avoid confusion.
The maximum number of levels and data points that can be displayed in the decomposition tree are 50 levels and 5000 data points. I hope you never get to that point, however if you’re at that point, just start over; your visual is way too cluttered.

Conclusion

Watch the video to understand how you can use the decomposition tree in Power BI to analyze your data and conduct root cause analyses.

Yes I did get carried away with AI in the thumbnail picture for this video. I was always a fan of the John Stewart version of Green Lantern so I had to play around with AI to get a close approximation of me as a Lantern. May the decomposition tree work for you in brightest day and darkest night!!

I appreciate everyone who has supported this blog and my YouTube channel via merch. Please check out the logo shop here.

If you want to learn all the latest tips and tricks in core data analysis tools, stay in contact with me through my various social media presences.

And don’t forget to subscribe to my YouTube channel for more data analyst tips and tricks.

Join my YouTube channel to get access to perks:
https://www.youtube.com/channel/UCL2ls5uXExB4p6_aZF2rUyg/join

Anthony B Smoak

Smoak Signals | Data Analytics Blog

Sharing Knowledge: Data, Visualization & Business

Tag / Video

How AI Redrafted the 2018 NBA Draft: Findings and Insights

Tableau Dashboard Layouts: Master the Art of Containers

Boost SQL Performance: 4 Essential Optimization Tips

Increase SQL Query Performance with Indexes

Know These SQL Concepts for your Data Interview

Create a Stunning Advanced Dashboard in Tableau

How to Make a Histogram in Excel

SQL Window Functions – Lead and Lag

The Power BI Decomposition Tree Guide for Data Analysis

Absolute vs Relative AI Splits

Drill Through to Details

Use Bookmarks to Save and Share Your Analysis