How AI Redrafted the 2018 NBA Draft: Findings and Insights

I had been toying around with writing SQL code to redraft one of the most talent rich drafts in NBA history (my apologies to the 1984 and 2003 NBA drafts) and sharing the results on my YouTube channel. Fortunately, I had a serendipitous coding session with my good buddy Terence down in Austin who introduced me to a cutting-edge AI tool (i.e., Windsurf AI). This initial session fundamentally changed how I think about analytics and AI-assisted coding.

If you’re an NBA, data and SQL enthusiast, then the 2018 NBA draft represents a fascinating case study for retrospective analysis.

We’re talking about a class that produced MVP-caliber talents like Luka Dončić (originally 3rd), Trae Young (5th), and Shai Gilgeous-Alexander (11th). Not to mention Ironman style players like Mikal Bridges (who has never missed an NBA game in his career thus far) and the unexpected emergence of undrafted gems like Duncan Robinson.

Seven years of career data later, we have enough statistical evidence to ask the crucial question: If NBA front offices knew then what we know now, how would this draft unfold?

What I loved about having Windsurf AI as my co-pilot (no relation to Microsoft) is that it handled the heavy lifting while I focused on the strategic analysis. The term “vibe-coding” has entered the lexicon recently, and it is used as a means to let AI instantly translate one’s analytical vision into executable T-SQL code.

When using any generative AI tool, the key is feeding detailed, comprehensive prompts to the models (the more context you give these models, the better your output becomes).

The components of my prompt included the following (you can watch the full video if you want to hear me recite the entire prompt).

My comprehensive prompt included:

-- 1. Role definition (NBA GM + Statistical Expert + T-SQL Specialist)
-- 2. Clear objectives (redraft based on career performance)
-- 3. Data specifications (column names, data types, constraints)
-- 4. Business rules (handle undrafted players as position 0)
-- 5. Output requirements (normalized scores, impact analysis)

I settled on a redraft approach based upon a Career Impact Scoring methodology:

Impact Score Formula:

Points: 1.0x multiplier
Rebounds: 1.2x (slightly more valuable)
Assists: 1.5x (playmaking premium)
Steals/Blocks: 3.0x (defensive impact recognition)
Field Goal %: 100x multiplier
Free Throw %: 50x multiplier
Three-Point %: 100x multiplier
Games Played Bonus: Durability factor
Turnover Penalty: Subtracted from total score (my apologies Trae Young)

Re-Draft Results:

Without spoiling the complete breakdown, I’ll share just enough to demonstrate the methodology’s power:

The most stunning result? Jalen Brunson emerged as the 5th overall pick in our AI-driven redraft, a player who originally went 33rd overall to Dallas.

If Brunson jumped from 33rd to 5th, imagine what happened to the other lottery picks. Some players who went in the top 10 dropped dramatically, while others who were considered “reaches” at the time proved to be incredible value picks.

Player Categorizations:

Although the AI initially suggested several player categories based upon their redrafted positions, I prompted the model to incorporate the specific logic and code I desired into the overall script (because I know how to write SQL).

Category Breakdowns:

  • Massive Undrafted Steals: Players who weren’t drafted but earned first-round redraft positions
  • Undrafted Steals: Players who weren’t drafted but earned a first or second-round redraft position
  • Huge Steals: Late picks who jumped into a top 10 pick
  • Steals: Later picks who jumped into the lottery
  • Busts: Top 5 picks who fell to at least the 21st spot
  • Major Busts: Top 10 picks who fell out of the first round entirely
  • Expected Value: Players who performed roughly as anticipate

The Broader Implications for Data Professionals

For me, Windsurf served as a force multiplier and not a replacement for my SQL knowledge. It readily served as an amplifier of skills that I already possess. I still had to understand basic database concepts and a little query optimization in order to craft effective prompts and validate the model’s outputs. The difference between mediocre and exceptional AI-generated code lies in prompt quality, so put in the upfront time and effort to craft an initial prompt.

The Future of Analytics: Collaboration, Not Competition

In my opinion, this project undertaking represented something larger than a fun NBA analysis, it’s a glimpse into the future of data analysis work. We’re entering an era where analysts spend less time wrestling with syntax and more time on strategic thinking, hypothesis formation, domain knowledge, storytelling and insight communication.

We’re in this sweet spot right now where these types of agentic AI tools are assisting data analyses as opposed to fully completing data analyses. The key (right now, before the singularity) is embracing this transition rather than resisting it so you can remain ahead on the knowledge curve. Start experimenting with AI-assisted coding tools now, while the learning curve is manageable and the competitive advantage is still emerging.

Your Next Steps:

If you’re intrigued by the possibilities of AI-enhanced analytics, I encourage you to:

  1. Download Windsurf AI and experiment with your own datasets
  2. Practice prompt engineering with progressively complex analytical requests
  3. Combine your domain expertise with AI capabilities for unique insights
  4. Share your experiments to bolster your reputation and build your data portfolio

The full video walkthrough demonstrates every step of this process, from initial prompt crafting through final GitHub deployment.

In the full video you can discover which players claimed the top 4 spots in our redraft.

As always, get out there and do some great things with your data!

Join this channel to get access to perks: ► https://www.youtube.com/channel/UCL2ls5uXExB4p6_aZF2rUyg/join

Get the code and the data file used in the video here: ►
https://github.com/ASmoak/NBA_2018_ReDraft

Boost SQL Performance: 4 Essential Optimization Tips

Performance matters!

I decided to break down some quick tips to help you write faster, more efficient SQL queries.

1. Limit Your Use of SELECT *

While SELECT * seems like a quick and easy solution, it’s a performance killer, especially when you’re dealing with large tables. By requesting all columns, the database has to load unnecessary data, which can slow down execution, especially as your dataset grows.

Tip: Be intentional about the columns you select. Only retrieve the data you need. This not only improves performance but also minimizes the load on your system.

2. Avoid DISTINCT and UNION When Unnecessary

Checking for uniqueness often comes at a cost. Both DISTINCT and UNION use sorting algorithms to remove duplicates, which is inherently time-consuming. If you don’t explicitly need unique records, avoid them.

Instead of UNION, use UNION ALL. While UNION will filter out duplicates, UNION ALL keeps all your records.  It can perform your combining of records faster since it bypasses the sorting step.

3. Use Sargable Operators in the WHERE Clause

A sargable query (short for “search argument able”) is one that allows the SQL optimizer to take advantage of indexes, leading to faster data retrieval. When you use operators that are sargable (e.g., =, >, <=, BETWEEN), the query can utilize an index to find data quickly instead of performing a full table scan, which can be slow and resource-intensive.

For example, a query like WHERE customer_id = 12345 is sargable, whereas WHERE YEAR(order_date) = 2023 is not, because the database can’t use an index efficiently for that condition.

4. Use Indexes Appropriately

Indexes are extremely useful when it comes to speeding up query execution. Without them, the database is forced to scan the entire table to retrieve the data you need.

However, it’s essential to use indexes judiciously. Too many indexes can slow down insert and update operations.

Real-World Example: Sales Data Query Optimization

Imagine you’re a data analyst working with sales data, trying to identify the top-selling products for the last quarter. A poorly optimized query might look like this:

SELECT * FROM sales_data WHERE YEAR(sale_date) = 2023;

This query retrieves all columns (SELECT *), performs a non-sargable operation on sale_date, and might even require a DISTINCT if there are duplicates. All this can be painfully slow.

Instead, a more optimized query would look like this:

SELECT product_id, SUM(sales_amount)
FROM sales_data
WHERE sale_date BETWEEN '2023-01-01' AND '2023-12-31'
GROUP BY product_id
ORDER BY SUM(sales_amount) DESC;

In this version:

  • We select only the necessary columns (product_id, sales_amount).
  • We use a sargable BETWEEN operator on sale_date.
  • We aggregate data and order it efficiently to find the top products.

The Takeaway: Stay Sharp, Stay Efficient

Writing efficient SQL queries isn’t just about speed; it’s about making your data retrieval processes more scalable and reliable.

Be intentional about your query design. Use SELECT * sparingly, avoid unnecessary operations like DISTINCT and UNION, and always keep indexing and sargable operations in mind

I appreciate everyone who has supported this blog and my YouTube channel via merch. Please check out the logo shop here.

Stay in contact with me through my various social media presences.

Join my YouTube channel to get access to perks:
https://www.youtube.com/channel/UCL2ls5uXExB4p6_aZF2rUyg/join

Hot Take: SQL is Difficult

Here is my hot take, it can be very difficult to write SQL.  

Conventional data punditry suggests that SQL isn’t difficult to learn. Of course it’s easy to write SQL that returns data; however, it can be very difficult to write SQL that returns the CORRECT data. 

Writing effective SQL requires more than the ability to write basic syntax. You need a sufficient understanding of the data itself, among other factors. And believe me, this is not always an easy endeavor for a single person! 

I’ve been placed into situations where I did not have any understanding of the relationships between tables, or possess sufficient metadata/documentation to determine which columns or fields were relevant to the specific question I was tasked to answer. There were no data subject matter experts on hand to collaborate with because they left the company a few months prior.

I’ve literally been handed a SQL Server backup file containing hundreds of tables with no accompanying entity relationship diagram (ERD) or documentation, yet I was expected to single-handedly replicate an outcome under a tight deadline because I’m the “data guy”.  

I was eventually able to do so, but not without relying on significant experience, lots of coffee and a little bit of luck.

Obviously SQL practitioners positioned on technical teams have to know more than the business with respect to the technical aspects of the data request.  

But they also need to have the relevant business domain knowledge and/or the ability to translate and document the business logic from a domain expert and turn it into accurate working code. 

Where Are the Requirements? 

Additionally, over the years I’ve seen the practice of providing well thought out and formalized data requirements devolve into data request “vibes”.

By data request “vibes” I mean, submitting requests for data without providing formalized upfront documentation. These are vague ambiguous requests provided to the data professional that’s lacking specific criteria used to evaluate success. The data professional has to figure out what is needed and how best to accomplish it. This is in addition to coding the final solution. 

In my opinion, communication and documentation skills are crucial. They enable the divination of accurate requirements from stakeholders. This ability is the “killer app” that separates commodity SQL coders from next level SQL coders. Commodity “order-taker” data resources are often outsourced. 

Captain Save a Project 

In addition, most organizations struggle with data quality, due to poor data governance. If the SQL analytics that you generate do not “feel” correct, guess who’s on the hook to figure out why?  

You’re not only the SQL subject matter expert and the business requirements person, you’re also the data detective because a VIP wants a revised report and an explanation by the end of the day. However, you only signed up to be a consumer and messenger of this data source that you do not own. 

Some SQL looks easy on the surface. However, don’t underestimate the effort it takes to understand table structures, data values, table relationships and data granularity. Additionally, domain knowledge is equally essential to write effective SQL.

Also be prepared to undertake requirements documentation, data quality sleuthing, coding, and of course, navigating the pressure of being the “reports person” who should know all things about the data.  

Do You Have These Skills to Complete a SQL Analysis ? 

In a manner, SQL is like Chess. It is easy to learn the basic moves. However, it is very difficult to master in order to play a worthwhile game. To really excel at SQL within an organization, you have to face challenging situations repeatedly. You must also manage a multitude of additional factors beyond pure technical syntax.

You can learn “SELECT * FROM TABLE_X” fairly quickly, but do you have the following complementary skills to deliver a SQL analysis? 

  • Sufficient domain knowledge 
  • Deciphering jargon and documenting requirements from stakeholders outside your area of expertise 
  • Understanding tables, columns and data values 
  • Thinking in sets, not row by agonizing row (i.e., RBAR) 
  • An understanding of data relationships 
  • Indexing and SQL performance tuning (are your queries too slow?) 
  • Ability to work around potential data inconsistencies (e.g., the numbers from these different sources don’t match) 
  • Savviness to navigate organizational data silos (e.g., groups that hoard their data and expertise for political reasons) 
  • Managing up the chain with respect to expectations and tight timelines (we have a life outside of work) 
  • Ability to simplify complex data findings 

And last but not least, actual SQL coding proficiency. 

These are the valuable skills that develop after years of work experience.  

Conclusion 

Yes, SQL can be easy to learn in a pressure free vacuum with perfect data that’s easy to comprehend .  

Just remember that at some point in your data career you will be called upon to be “Captain Save a Project” and the basic SQL you learned in a vacuum will not be enough to get the job done. You’ll also need to call upon the skills I listed above.  

In practice, generating effective SQL requires proper context, technical and communication skills, organizational savvy, and teamwork. The end results are not always as easy as your friendly SQL data pro makes it look. 

Keep your queries sharp, and your data clean.  

Until next time. 

Anthony Smoak 

I appreciate everyone who has supported this blog and my YouTube channel.

Stay in contact with me through my various social media presences.

Join my YouTube channel to support the channel:
https://www.youtube.com/channel/UCL2ls5uXExB4p6_aZF2rUyg/join

Photo by Photo By: Kaboompics.com: https://www.pexels.com/photo/a-a-disappointed-man-looking-at-a-paper-holding-his-head-7877111/

Increase SQL Query Performance with Indexes

I have always recommended that if you develop SQL based analytics, you should have some understanding of indexes when it comes to speeding up queries.

Sometimes we do not have access to create indexes on the physical tables (especially in a production environment), but approaching your DBA with some basic knowledge of indexing will put you ahead of the game.

if your queries are running longer than they need to be, it’s time to speed them up with some strategic indexing. Very briefly, here’s how you can fix your slow running query issues via indexing your tables.

Step 1: Implement a Covering Index Strategy 

A covering index includes all the columns referenced in your SELECT or WHERE clauses. This means the database engine can fulfill your query using just the index, without having to access the base table data. This act can drastically reduce query execution time and improve your data visualization processes if needed. The following is an example of TSQL code for MS SQL Server demonstrating a covering index using the INCLUDE keyword:


— Creates a nonclustered index on the Person.Address table with four included (non-key) columns.
— index key column is PostalCode and the non-key columns are
— AddressLine1, AddressLine2, City, and StateProvinceID.

CREATE NONCLUSTERED INDEX IX_Address_PostalCode
ON Person.Address (PostalCode)
INCLUDE (AddressLine1, AddressLine2, City, StateProvinceID);

Step 2: Utilize Clustered Indexes 

Clustered indexes should typically be applied to your primary key column. A clustered index sorts and stores the data rows in the table based upon the key values. This organization is particularly efficient for queries that retrieve a range of values.

Step 3: Apply Non-Clustered Indexes 

Non-clustered indexes are typically applied to foreign keys. These indexes create a separate structure that points back to the base data rows in the table. They are ideal for speeding up queries that involve joins, which are common in business intelligence operations. 

Find the Sweet Spot 

With indexing, balance is key. Under-indexing can cause your queries to run longer, while over-indexing can negatively impact data insertion and updates. Striking the right balance ensures your queries remain efficient without compromising the performance of other operations. 

Indexing Rule #4080: If it’s linked by it, filtered by it, or sorted by it….index by it.

(I saw that rule on a reddit forum once, and thought it made perfect sense. The 4080 reference is for my ATCQ fans.)

If you want more detail on indexing, you can read this blog post here. I wrote it a few years ago, but it is old gold, still valuable!

I appreciate everyone who has supported this blog and my YouTube channel via merch. Please check out the logo shop here.

Stay in contact with me through my various social media presences.

Join my YouTube channel to get access to perks:
https://www.youtube.com/channel/UCL2ls5uXExB4p6_aZF2rUyg/join

Know These SQL Concepts for your Data Interview

Over the years I have conducted a number of interviews with candidates who claimed to know SQL, but could not clearly verbalize their understanding of some basic concepts. I want you to understand these following concepts at a minimum, so your next SQL interview isn’t cut short.

First, distinguish between a left outer join and an inner join. A left outer join retrieves all records from the left table and returns only the matched records from the right table. On the other hand, an inner join retrieves only the matched records from both tables.

Next, familiarize yourself with the WHERE clause versus the HAVING clause. Use the WHERE clause to filter records before any groupings are made, while the HAVING clause filters records after grouping. This understanding will enable you to filter your data correctly. 

Also, understand the difference between the UNION and UNION ALL statements. UNION removes duplicate records, whereas UNION ALL includes all records, duplicates and all.

Be ready to discuss Common Table Expressions (CTEs) versus temporary tables for ease of query writing and to forgo complicated sub-queries. CTEs are best used for readability, while temp tables are physically stored in memory (allowing for the use of indexes if necessary to optimize performance).

Understanding clustered indexes versus non-clustered indexes is vital for query performance. Clustered indexes sort and store data rows in the table based upon key values, while non-clustered indexes create a separate structure with efficient pointers back to the base table. 

Lastly, ensure your queries are sargable (Search ARGument ABLE). Sargable queries use indexes efficiently, improving performance, unlike non-sargable queries, which slow down data retrieval. 

I appreciate everyone who has supported this blog and my YouTube channel via merch. Please check out the logo shop here.

Stay in contact with me through my various social media presences.

Join my YouTube channel to get access to perks:
https://www.youtube.com/channel/UCL2ls5uXExB4p6_aZF2rUyg/join

SQL Window Functions – Lead and Lag

The LEAD and LAG window functions in SQL offer the simple yet powerful ability to analyze data beyond the current row. They allow you to access values from rows before and after the current one, without using complex self-joins or subqueries. You can use these two functions to compare values across rows, calculate differences, and perform other operations that require looking ahead or behind in your dataset.

Back when I used Microsoft Access as my main data tool and needed to access information from a previous row to do calculations, I would use VBA coding to implement solutions. At the time, it did not occur to me that I could achieve similar functionality using SQL Server’s LEAD and LAG window functions.

In this highly informative video (if I do say so myself, I’m biased but I’m correct), I show you a use case for the LAG function to calculate the average time for an invoice to move through approver workflow steps. At a high level, this entails subtracting the approval date from a prior row from the current row’s approval date.

LEAD & LAG Window Function Syntax

The LAG window function allows you to access data from a prior row to “look back” dynamically as you query data. This aids calculating deltas, running totals, and more. The syntax of LAG function is as follows:

LAG(column_name, offset, default) OVER (PARTITION BY column_name ORDER BY column_name)

The LEAD window function allows you to access data from an upcoming row to “look ahead” dynamically as you query data. The syntax of LEAD function is as follows:

LAG(column_name, offset, default) OVER (PARTITION BY column_name ORDER BY column_name)

The PARTITION BY clause is key; it that ensures we only look back (or ahead) within a given value in the column_name (for example as shown in the video, an invoice number) otherwise row transitions would provide invalid lookups.

Why Use Lead and Lag functions in SQL?

Using LEAD and LAG functions in SQL can help you analyze your data faster and easier, because you can:

  • Perform calculations and comparisons across rows without using self-joins or subqueries, which can be complex and slow.
  • Avoid exporting your data to Excel or other tools, which can be time-consuming and error-prone.
  • Simplify your code and improve its readability and maintainability.

As I mentioned earlier, if I had known about these two functions years ago when I was trying to access prior values via VBA and loops, I could have saved myself a lot of time, increased my efficiency and made my approach much easier to read in the resulting documentation.

Conclusion

I believe that learning from examples is critical for mastering both logical concepts and SQL syntax. By watching the linked video, you will gain the knowledge to tackle similar problems in your own data analysis scenarios.

Additionally, LEAD and LAG provide simpler, set-based solutions for cross-row data comparisons that would otherwise require procedural coding.

LEAD and LAG functions allow you to merge the past and the future!

I appreciate everyone who has supported this blog and my YouTube channel via merch. Please check out the logo shop here.

If you want to learn all the latest tips and tricks in core data analysis tools, stay in contact with me through my various social media presences.

Thank you!!

Anthony B Smoak

Learn GROUP BY vs PARTITION BY in SQL

TLDR

If you are using SQL, you will eventually come across the GROUP BY and PARTITION BY clauses. While the Group BY clause is fairly standard in SQL, most people do not understand when to use the PARTITION BY clause. This easy to understand video uses some NBA season data to make the point very clear! I will show you the best use case of when to apply PARTITION BY.

Bonus content begins at the 10:03 mark, where I demonstrate a use case for the fundamentals I teach you earlier in the video.

The reviews are in, make sure to watch the whole video!

GROUP BY EXAMPLE

To begin, I demonstrate how to use GROUP BY in SQL Server Express to understand basic NBA team statistics based strictly upon the conference. I have to tell the database what to GROUP BY to generate all of the aggregate statistics. I select the conference and want to generate the aggregate sum of points, average points, and max points. I also order the results by the sum of points in a descending fashion.

When I run the query, the results show how the data points have been grouped by conference. The results show the sum of points by the two values in the conference field (Western and Eastern), the average points, and the max points. I can also see that there are no individual rows in this result, which is expected when using the GROUP BY clause with aggregate functions like MIN, MAX, SUM, and COUNT.

If I want to further break down the results and throw in a division, I need to also select the division field and add it to the GROUP BY statement as well. This action further slices the data points into specific conference and divisions that the teams play in.

PARTITION BY

Now, let’s talk about the OVER and PARTITION BY sub-clause. In this next example, I select the conference and bring in the points, which is our measure. Just like with GROUP BY, I sum the points, but I add “OVER” and the sub-clause “PARTITION BY.” This is where the magic happens because I tell SQL what data point to Partition by (i.e., conference) to show the total sum of points by conference.

When I run the query, the results show the sum of points by conference, and I can order the results by individual team points in descending order . The results show a breakdown of the sum of points by conference, but there are still individual rows in the results.

Here is a continuation of the same results for the Western conference teams:

This is an important distinction!! By using OVER and PARTITION BY, I can have data at the most granular level (unaggregated points at an individual team level i.e., the PTS field) combined with data at a higher granularity (points summed, averaged and the maximum points scored at an overall conference level).

The higher granularity of the data at the conference level makes the values for the last three statistics columns repeat (i.e., 135611 for SUM_PTS, 9040 for AVG_PTS and 9470 for MAX_PTS in the Eastern Conference). Similar data is returned for the Western conference.

I’m essentially allowed to have my data cake and eat it too with this best of both worlds approach!

IN SUMMARY

The GROUP BY statement is used to group rows that have the same values in a specific column or set of columns. When used with aggregate functions such as SUM, AVG, MAX, MIN, COUNT, etc., the GROUP BY statement allows us to calculate summary statistics for each group. The result will yield one row for each group. Typically, a GROUP BY statement will reduce the number of rows returned by your SQL.

On the other hand, the PARTITION BY statement is used to divide the data into partitions or subsets based on a specific column or set of columns (like conference in our case). Unlike GROUP BY, PARTITION BY does not reduce the number of rows returned in the result set. Instead, it adds a new column that shows the result of the aggregate function (e.g., SUM, AVG, MAX, etc.) for each partition.

LET PAT BEV COOK

So remember, when it comes to GROUP BY and PARTITION BY in SQL, just like how the Minnesota Timberwolves balanced an effective array of shots to lead the league in total points scored, understanding the nuances of when to use each statement can make all the difference in winning that crucial play-in game against your data! Yes you have to watch the video to understand this reference.

Also, I’m not bad at Pat Bev for going over the top, as this win was against a former employer who recently traded him. Success is always the best revenge!!

Happy querying!!

I appreciate everyone who has supported this blog and my YouTube channel via merch. Please check out the logo shop here.

Stay in contact with me through my various social media presences.

Thank you!!

Anthony B Smoak

Use Excel to Connect to SQL Server Data

TLDR

Connect directly to SQL Server data from within Excel. Also learn how to add and modify SQL statements from within Excel and pass them to SQL Server for data retrieval. If you need to quickly refresh data from SQL Server without hassle, then you need to watch this video!!

Intro

As a data professional, I am always looking for ways to optimize my workflow and increase efficiency. One of the techniques that I have found particularly helpful is making a direct connection between Excel and SQL Server. In this video, I will show you how to set up this direct connection and explain the benefits of using it.

What is a Direct Connection Between SQL and Excel?

First of all, let me explain what I mean by a direct connection between these two tools. Typically, when working with data in Excel, you would export the data from SQL Server into a .csv file and then import that file into Excel. This process can be time-consuming and cumbersome, especially if you are working with large datasets. With a direct connection, you can access the data in SQL Server directly from Excel, without the need for any intermediate steps.

What Do I Demonstrate in the Video?

To demonstrate this in the video, I walk you through an example. I am using the Wide World Importers DW sample database from Microsoft, which you can easily import into SQL Server. Within this database, I am looking at the fact.order table, which has over 230,000 rows and many columns. In Excel, I start with a blank sheet and navigate to the Data ribbon. From there, I select “From Database” and then “From SQL Server Database.”

This prompts me to enter a server name and a database name. If you have access to SQL Server, you can find the server name by connecting to the database engine. Otherwise, you may need to reach out to your database administrator to obtain this information (always stay on your DBA’s good side, if you know what’s good for you). Once you enter the server and database names, you can hit “OK” and Excel will work its magic to establish a connection.

At this point, you can preview the data from the table that you want to import. Excel will give you the option to transform the data if necessary, but we’ll just hit “Load.” Excel will then create a connection and query the SQL Server database and load the data directly into Excel. This means that you can always access the most up-to-date version of the data, without having to worry about exporting and importing files!!

Advantages of Connecting SQL and Excel

Now, let’s say that you need to update the data in Excel at a later time. Perhaps you have some ad hoc processes that reference this data and you need to ensure that you always have the latest version. With a direct connection, this is easy to do. You can simply go to the Data ribbon and select “Refresh.” Excel will connect to SQL Server and update the data in your Excel sheet with the latest data from the database.

This is incredibly powerful because it means that you can share your Excel sheet with others without worrying about whether they have the latest version of the data. As long as they have access to the SQL Server database, they will always see the most up-to-date version of the data when they open the Excel sheet.

One thing to keep in mind is that this type of direct connection is best suited for ad hoc purposes!! In other words, you should not use this to create production worksheets that will be used by others. This is because the direct connection is dependent on having access to the SQL Server database. If that access is lost, the Excel sheet will no longer be able to connect to the database and the data will be lost. Therefore, it is best to use this type of connection for temporary analysis and reporting purposes.

Once you have successfully set up the direct connection between Excel and SQL Server, you can easily refresh the data whenever you need it. To refresh the data, all you have to do is go to the Data tab, and click on the “Refresh All” button. This will refresh all the data connections in your workbook, including the connection to SQL Server.

Powerful Excel Functionality (PivotTables and Pivot Charts)

I don’t reference this in the video, but you can also use Excel’s PivotTables and PivotCharts to analyze and visualize the data. PivotTables allow you to group and summarize data in many different ways, while PivotCharts provide a visual representation of the data that is easy to understand. It may be easier for you to manipulate this data in Excel and extract additional insights than in SQL Server.

Financial analysts, in particular, should avoid taking manual inputs from any and everywhere (especially ungoverned data sources) and using this type of refresh for production purposes. As a recovering financial analyst I know your management hates automation and loves when you cut and paste random information from Bob in division finance. I also know they want to see you work 12 hours a day because the CFO needs that monthly IT spend variance to budget!! Please do yourself a favor and meditate hard for serenity every Sunday night.

Conclusion

In conclusion, using a direct connection between Excel and SQL Server can greatly improve your workflow when working with large datasets. By leveraging the full power of SQL Server’s querying capabilities and Excel’s Pivot tools, you can create powerful AD-HOC reports and analyze data in ways that would be difficult or impossible with other tools. Consider setting up a direct connection to SQL Server to streamline your ad-hoc workflow and improve your productivity.

Additional References to Maximize Your Learning

All views and opinions are solely my own and do not necessarily reflect those of my employer

I appreciate everyone who has supported this blog and my YouTube channel via merch. Please check out the logo shop here.

Thank you!!

Anthony B Smoak

P.S. I respect your hustle Financial Analysts. I ask that you respect those who want to make your life easier with report automation!!

Build Dynamic SQL with SQL Server and Excel

In this video you will learn how to use the SQL CASE statement to add “filter flexibility” to your front-end Excel worksheet, thus taking your SQL + Excel skills to the next level. I’ll reveal the UPDATED code I used to build a dynamic SQL statement using SQL Server and Excel.

By using Excel as a tool to pass cell values to SQL Server queries, you’ll be able to generate dynamic SQL statements on the fly, saving time and reducing the risk of errors in your code. Building upon the previous video “Call a SQL Server Stored Procedure using Excel Parameters”, let’s enhance your SQL skills and streamline your workflow.

Here is a screenshot of the front end Excel worksheet we setup in previous videos. This Excel sheet will execute a stored procedure call with parameters supplied from cells on the sheet.

Below is the stored procedure I use to enhance the code from previous videos. I setup a static SQL string that will serve as the base of the SQL statement. I then use the CASE statement to evaluate the cell values incoming from the Excel worksheet (with some slight manipulation for empty and default date values incoming from Excel).

Depending upon those values, the filter clause is dynamically built and appended to the base of the SQL string, which is then executed with the sp_executedqsl command. This command has many advantages with respect to protecting your code from a SQL injection attack.

If you need a breakdown of the code and the worksheet functionality, make sure to watch the video below.

Additional References to Maximize Your Learning

I always have fun creating this type of content and sharing with you, my YouTube channel followers.

Stay in contact with me through my various social media presences.

All views and opinions are solely my own and do not necessarily reflect those of my employer

I appreciate everyone who has supported this blog and my YouTube channel via merch. Please check out the logo shop here.

Thank you!!

Anthony B Smoak

Passing Parameter Values from Excel to SQL Server

If you’re working with data in Excel and need to connect to a SQL Server database, there are a couple of ways to pass parameter values from Excel to SQL Server. In my first video, “Passing Parameter Values from Excel to SQL Server,” I show you how to connect to SQL Server and read values from a cell and pass those values to a native SQL query without using parameters.

Advantage: The first approach has the advantage of being quick and easy to implement. This is because it does not require any additional setup or configuration, such as creating stored procedures in SQL Server. Instead, the values are passed directly to the query, which can be executed immediately.

However, this approach can also be prone to SQL injection attacks, where a hacker inputs malicious SQL code into an input field in order to gain unauthorized access to the database.

Verdict: Speed over Security. Good for Ad-Hoc personal use.

In my second video, “Call a SQL Server Stored Procedure using Excel Parameters,” I demonstrate how to connect to SQL Server from Excel and pass cell values from Excel to SQL Server using a stored procedure. This approach is more secure because the values are passed to a stored procedure rather than a query. It simply requires the setup of a stored procedure in SQL Server, and I show you two ways to accomplish this feat.

Advantage: Stored procedures provide an added layer of security because they can be set to execute with specific permissions, and can be audited for changes and usage. This makes it harder for an attacker to gain unauthorized access to the database or to execute malicious SQL commands.

However, the potential small disadvantage of this approach is that the stored procedures will need to be updated and managed separately from the Excel file.

If you’re new to working with SQL Server and Excel, I recommend watching both videos. The first video will give you a good overview of the basics, while the second video will show you a more secure way to pass parameter values.

I always have fun creating this type of content and sharing with you, my YouTube channel followers.

You can also follow my dapper data adventures on Instagram.

All views and opinions are solely my own and do not necessarily reflect those of my employer

I appreciate everyone who has supported this blog and my YouTube channel via merch. Please check out the logo shop here.

Thank you!!

Anthony B Smoak