Historically, creating Sankey charts in Tableau has been a time-consuming process, often requiring the use of complex templates. However, the team at Tableau Public has introduced a game-changing functionality that allows us to create Sankey charts effortlessly. This feature, currently in beta and available for a limited time (like the McRib of data visualizations), enables us to author and publish Sankey charts directly to our Tableau Public profiles.
In this blog post, I’ll briefly walk you through the process of creating one using Tableau Public.
What is a Sankey Chart?
Before we delve into the specifics of this new Tableau Public feature, let’s take a moment to understand what a Sankey chart is and why it’s such a powerful visualization tool. A Sankey chart is a flow diagram that illustrates the movement of data, be it goods, energy, or even money. With a Sankey chart, you can effortlessly compare different data points and identify patterns that might remain hidden in traditional charts or tables.
Testing Out the New Feature
Assuming you already have a Tableau Public profile (and if you don’t, I highly recommend creating one—it’s an incredible platform for sharing your data visualizations with the world). You’ll need to create a visualization directly from your Tableau Public page.
Once you’ve created your visualization, navigate to the “Connect to Data” section. As we’re uploading data from our computer, select the “Upload from Computer” option. Choose the dataset you want to work with—I’ll be using the “Sample Superstore” dataset for this example. After confirming that your data has been successfully imported, select the “Sankey” chart type.
Now, here’s where the magic happens. You’ll notice a “Level” and “Link” section that appears. To define the flow in the Sankey chart, let’s select a dimension like “Segment” and drag it into the “Level” area. Next, grab another dimension—I’ll choose “Region”—and place it in the “Level” area as well. Finally, to quantify the flow, let’s choose a measure like “Sales” and place it in the “Link” area.
Voila! With just a few clicks, we’ve created a Sankey chart. Impressive, isn’t it? You’ll notice the flow between the segments and regions instantly come to life. But we’re not done yet—let’s keep the party going by adding another level.
For the sake of experimentation, let’s grab a dimension like “Ship Status” and drop it into the visualization. Now we have an additional sub-level in our Sankey chart. To avoid overcrowding, we can uncheck the “Allow Labels to Overlap” option, ensuring our chart remains clean and legible.
Keep Innovating for The Analytics Core Audience
Tableau Public’s decision to incorporate this feature highlights their commitment to democratizing data visualization. While the addition of features like Sankey charts to Tableau Public is fantastic, it’s essential that the overlords at Salesforce remember Tableau’s core audience—the general analytics users who are generally decoupled from Salesforce usage. Let’s keep hoping for bigger and better things to come with the tool. This beta signals that they’re heading in the right direction.
Yes I put an AI version of myself on the thumbnail. I obviously “Quantum Leaped” from the future to teach you these Advanced Tableau table skills that you’ll encounter in the accompanying video.
Be warned, this is Highly Advanced Tableau!! In the main video, we’ll explore how to generate advanced tables in Tableau (step by step), complete with multiple chart elements displayed on the same table row. It’s OK, you can click the area below since it leads to a YouTube short.
As a data enthusiast and Tableau user, I always strive to learn new things, experiment with different techniques and share my knowledge with others. Recently, I came across a visualization by Zainab Ayodimeji that caught my attention. Zainab is a Tableau Ambassador and her work is always top-notch, so I reached out to her and asked if I could reverse engineer one of her vizzes for a video. She was cool with it, so I got to work.
The visualization that caught my eye was an advanced Tableau visualization that used normalized data to create sales and profit sparklines for using standard Superstore data. Zainab’s visualization featured a variety of different chart elements, all on the same row, and looked incredibly cool.
I was immediately intrigued and wanted to see if I could reproduce something similar myself, but with a different data set other than the ubiquitous Superstore. So, I got to work on reverse engineering and came up with my own take on Zainab’s visualization.
I discovered that the technique used in her viz was innovated by Sam Parsons, so I also checked out his video on this technique and found it ingenious; very MacGyver like. Sam’s innovative video is the inspirational source for all of these techniques. Watch his video for the concepts, watch my video for practical hands on building.
Watch the Step by Step Re-creation Video to Learn this Advanced Technique
In the video below, I will explain step by step how I used Tableau to create a compelling chart example that will help my viewers understand the Advanced Tableau calculations and concepts it takes to visualize multiple types of charts on one table row.
The dataset that I worked with contained information about the sales and profits of different products sold at a coffee shop as opposed to Superstore data. Recreating the data with a different dataset forced me to understand the concepts better than just copying and pasting the existing code in Zainab’s visualization.
The Reviews are In
Y-Axis Positioning Trick – (How this Process Works)
One of the coolest concepts in this process is the positioning of the chart elements on the same Y-Axis. Again, a big shoutout to Sam Parsons for coming up with these techniques!
The y-axis position is critical because it determines where each data point will be plotted on the chart. As a result of the ingenious calculation, Tableau places all non-line chart elements at a y-axis position of 0.5, which is the middle of the y-axis. However, for line chart elements, the y-axis position is calculated based on the normalized sales or profits value.
To normalize the data, we make the values of the sales and profit of each product fit between a range of 0 and 1 for a consistent Y axis. This allows us to see the trends of the sales and profits of each product at a standard consistent height on the visual.
The sales or profit axis test (a calculated field) determines whether the normalized sales or profits value should be plotted if the chart element is a line. If the test returns a value of 1, Tableau will plot the normalized sales value. If it returns a value of 0, Tableau will plot the normalized profits value. This is determined by checking whether the sales access product field is present in the detail section of the view.
I just realized that I used Quantum Leap and MacGyver references in the same blog post (gettin’ Ziggy with it). After watching my video above, you’ll be able to create an insightful visualization using clever and unconventional methods (not unlike MacGyver making a jetpack out of a toothpick and a piece of gum).
Again, big thanks to Zainab and Sam for influencing this work so I could teach you how to Quantum Leap forward in your Tableau skills (Ok I’ll stop with the puns). Keep doing great things with your data!
Are you a data professional looking to start a new data project?
Then you need to review my 10-point checklist to make sure you’re on the right track. Starting a new data project can be overwhelming. But don’t worry, with my 20 years of experience, I’m here to guide you through it.
Typically when I start to perform a new data related task or analysis for a project, I have to make sure that I meet the expected objectives, which often include identifying patterns, trends, and insights that can be used to drive business decisions.
10 Point Checklist
Point 1: First things first, you need to understand the nature of the deliverable that’s being asked for. Is it a new report, database table, column, data visualization, calculation, or a change to any of the above? In a similar fashion you also need to understand the technologies in play that you have to work with. This could be anything from Tableau, Power BI, SQL Server, Oracle, Teradata or even Microsoft Access (yes, people still use this tool).
Point 2: It’s crucial to know the desired delivery time frame for your project. You don’t want to end up with a longer timeline than what the project manager or client had in mind. Communication is key in this situation.
Point 3: Who is the intended audience for this deliverable? If it’s for an executive audience, you may need to roll the numbers up and take out some detail. If it’s for an analyst or operational audience, you may want to leave in more detail.
Point 4: How much historical data is required? What is the anticipated volume of data that your deliverable is going to generate? Don’t get caught in a situation where your solution can’t handle the trending analyses for a 2 year time frame when you only pulled data for the last 6 months.
Point 5: Understand the volume of data that your solution will generate. For example, a 5 million row output is not conducive to a 100% Excel approach. You will definitely be in the land of database analyses. However you may later present the data at an aggregated level (see point 3) via Excel but hopefully using a real data visualization tool .
Point 6: You need to understand if there’s any Personally Identifiable Information (PII) or sensitive data that you need to access in order to carry out the request. This could include social security numbers, passport numbers, driver’s license numbers, or credit card numbers.
Point 7: It’s important to understand the business processes behind the request. As data people, we tend to focus only on the data piece of the puzzle, but understanding more about the relevant business process can help you deliver the better results for your end users.
Point 8: Try to find and understand any relevant KPIs associated with the business processes on which your data project/task is affecting.
Point 9: Perform data profiling on your datasets! This can’t be stated enough. Profiling leads to understanding data quality issues and can help lead you to the source of the issues so they can be stopped.
Here are a few data profiling videos I’ve created over the years to give you a sense of data profiling in action.
Point 10: Understand how your solution will impact existing business process. By changing a column or calculation, how does this impact upstream or downstream processes? Keep your email inbox clean of those headache emails that are going to ask why the data looks different than it did last week. Most likely there was not a clear communication strategy to inform everyone of the impact of your changes.
Here are a few bonus considerations since you had the good fortune of reading this blog post and not just stopping at the video.
Bonus Point 1: Consider any external factors that could impact your data project. For example, changes in regulations can impact the data that you collect, analyze, and use. If the government imposes stricter regulations on data privacy (see point 6 above), you may need to change your data sources or analysis methods to comply with these regulations.
Bonus Point 2: Consider internal organizational politics when starting on a project. If you work in a toxic or siloed organization (it happens), access to data can be a challenge. For example, if the marketing department controls customer data, accessing that data for a sales analysis project may be challenging due to internal strife and/or unnecessary burdensome roadblocks.
Internal politics can also lead to potential conflicts of interest, such as when stakeholders have different goals or agendas. For example, if your data analyses could impact a department’s budget, that department may have an incentive to influence your work outcome to their advantage (or try to discredit you or your work by any means necessary).
Bonus Point 3: Finally, make sure to document everything. This includes the business requirements, technical requirements, saved emails, and any changes that were made along the way.
When I started my first office position as an intern at a well known Fortune 500 company, my mentor told me the first rule of corporate life was to C.Y.A. I’m sure you know what that means to cover. Having solid documentation of your work and an email trail for decisions made along the way can keep you out of hot water.
And there you have it, my 10-point checklist for starting a new data project. By following these steps, you’ll be well on your way to delivering high-quality results. Don’t forget to like and subscribe for more data-related content!
If you are using SQL, you will eventually come across the GROUP BY and PARTITION BY clauses. While the Group BY clause is fairly standard in SQL, most people do not understand when to use the PARTITION BY clause. This easy to understand video uses some NBA season data to make the point very clear! I will show you the best use case of when to apply PARTITION BY.
Bonus content begins at the 10:03 mark, where I demonstrate a use case for the fundamentals I teach you earlier in the video.
To begin, I demonstrate how to use GROUP BY in SQL Server Express to understand basic NBA team statistics based strictly upon the conference. I have to tell the database what to GROUP BY to generate all of the aggregate statistics. I select the conference and want to generate the aggregate sum of points, average points, and max points. I also order the results by the sum of points in a descending fashion.
When I run the query, the results show how the data points have been grouped by conference. The results show the sum of points by the two values in the conference field (Western and Eastern), the average points, and the max points. I can also see that there are no individual rows in this result, which is expected when using the GROUP BY clause with aggregate functions like MIN, MAX, SUM, and COUNT.
If I want to further break down the results and throw in a division, I need to also select the division field and add it to the GROUP BY statement as well. This action further slices the data points into specific conference and divisions that the teams play in.
Now, let’s talk about the OVER and PARTITION BY sub-clause. In this next example, I select the conference and bring in the points, which is our measure. Just like with GROUP BY, I sum the points, but I add “OVER” and the sub-clause “PARTITION BY.” This is where the magic happens because I tell SQL what data point to Partition by (i.e., conference) to show the total sum of points by conference.
When I run the query, the results show the sum of points by conference, and I can order the results by individual team points in descending order . The results show a breakdown of the sum of points by conference, but there are still individual rows in the results.
Here is a continuation of the same results for the Western conference teams:
This is an important distinction!! By using OVER and PARTITION BY, I can have data at the most granular level (unaggregated points at an individual team level i.e., the PTS field) combined with data at a higher granularity (points summed, averaged and the maximum points scored at an overall conference level).
The higher granularity of the data at the conference level makes the values for the last three statistics columns repeat (i.e., 135611 for SUM_PTS, 9040 for AVG_PTS and 9470 for MAX_PTS in the Eastern Conference). Similar data is returned for the Western conference.
I’m essentially allowed to have my data cake and eat it too with this best of both worlds approach!
The GROUP BY statement is used to group rows that have the same values in a specific column or set of columns. When used with aggregate functions such as SUM, AVG, MAX, MIN, COUNT, etc., the GROUP BY statement allows us to calculate summary statistics for each group. The result will yield one row for each group. Typically, a GROUP BY statement will reduce the number of rows returned by your SQL.
On the other hand, the PARTITION BY statement is used to divide the data into partitions or subsets based on a specific column or set of columns (like conference in our case). Unlike GROUP BY, PARTITION BY does not reduce the number of rows returned in the result set. Instead, it adds a new column that shows the result of the aggregate function (e.g., SUM, AVG, MAX, etc.) for each partition.
LET PAT BEV COOK
So remember, when it comes to GROUP BY and PARTITION BY in SQL, just like how the Minnesota Timberwolves balanced an effective array of shots to lead the league in total points scored, understanding the nuances of when to use each statement can make all the difference in winning that crucial play-in game against your data! Yes you have to watch the video to understand this reference.
Also, I’m not bad at Pat Bev for going over the top, as this win was against a former employer who recently traded him. Success is always the best revenge!!
In this video I will build out a Tableau Prep flow that demonstrates one approach to perform fuzzy matching. We’ll perform the fuzzy matching technique on two lists of companies. One list contains a standardized list and the other contains a non-standardized list, and we will match them together using Tableau’s fuzzy algorithm.
As a working data professional, one of my main responsibilities is to ensure the accuracy and consistency of the data that I work with. One of the challenges I face is dealing with different variations of company names that may exist in different datasets. In the video above, I walk you through a fuzzy match join that I recently performed using Tableau Prep, a data preparation tool, to reconcile these variations.
What is Fuzzy Matching?
First, I want to introduce you to the concept of fuzzy matching. It’s a technique used to match data when there are slight differences in how the data is presented (most likely as a result of bad data governance). For example, if you have two data sets with company names, one may list a company as “Apple Inc.” while the other may list the same company as “Apple Incorporated.” Fuzzy matching would help you match these two records, even though the names are slightly different.
In Tableau Prep, unfortunately fuzzy matching is not a straightforward process like it is in Excel or other tools like Power BI or Alteryx. However, we can use a workaround to achieve a somewhat similar result. Tableau Prep allows you to automatically group values together using fuzzy-match algorithms that find similar values.
High Level Flow Process
To get started, let’s say we have two lists of companies. One is a master list of companies that we want to use as our reference, or “golden” copy. The other is a list of companies that may be misspelled or unstandardized. We want to join these two lists together using fuzzy matching (although technically we employ fuzzy grouping options to enable traditional matching).
Within Tableau Prep I start with two groups of data, a clean “golden copy” of company names and a list of company names that contain poor data quality (i.e., user-entered data, which may have misspelled or unstandardized company names).
Once we have imported both data sets, we can append them using a union step. The union will combine the two lists of company names into one.
Next, we need to perform a fuzzy algorithm on the list of company names. Remember that both the golden copy and the misspelled names are stacked into one column. In Tableau Prep, we’ll use the fuzzy grouping capability to group together similar values, even if they’re not exact matches. We’ll use this technique to group the misspelled company names with their correct counterparts in the “golden” copy.
Granted this approach does not scale terribly well over a large dataset because we need to eyeball each grouping to ensure it is acceptable, but it is a good start. I don’t know of any fuzzy algorithm that guarantees 100% results, as “fuzzy” is inherent in the name of the approach.
Here are a couple of must read posts from the Tableau Knowledge Base for Fuzzy Grouping if you want to understand these automated grouping options:
After the values in the combined column are grouped together into a standardized clean format, we simply need to deduplicate our values so we are left with a 1 to 1 mapping between the incorrect data and the cleaned up result of the fuzzy grouping. We can use this “translation table” as a means to join our Golden Copy dataset to the less standardized dataset (which has the sales data we need to analyze).
By using fuzzy match and Tableau Prep, I was able to successfully reconcile the variations in the company names and match the sales quotes in the “poor data quality” company names data set to their corresponding IDs in the clean golden copy names data set. If you’re stuck in this kind of a scenario at work, use this process to advocate for a cleanup of the data quality at the source of entry or creation.
In conclusion, fuzzy matching is a powerful technique that can help you match data even when the data may not be spelled correctly or may have slight variations. Tableau Prep may not have a straightforward fuzzy matching feature (a la Excel), but we can use fuzzy grouping to achieve similar results.
Here is a pic of the flow we build in the video. I add additional steps for this unique case but ideally to perform the fuzzy grouping we could stop much earlier in the process.
Connect directly to SQL Server data from within Excel. Also learn how to add and modify SQL statements from within Excel and pass them to SQL Server for data retrieval. If you need to quickly refresh data from SQL Server without hassle, then you need to watch this video!!
As a data professional, I am always looking for ways to optimize my workflow and increase efficiency. One of the techniques that I have found particularly helpful is making a direct connection between Excel and SQL Server. In this video, I will show you how to set up this direct connection and explain the benefits of using it.
What is a Direct Connection Between SQL and Excel?
First of all, let me explain what I mean by a direct connection between these two tools. Typically, when working with data in Excel, you would export the data from SQL Server into a .csv file and then import that file into Excel. This process can be time-consuming and cumbersome, especially if you are working with large datasets. With a direct connection, you can access the data in SQL Server directly from Excel, without the need for any intermediate steps.
What Do I Demonstrate in the Video?
To demonstrate this in the video, I walk you through an example. I am using the Wide World Importers DW sample database from Microsoft, which you can easily import into SQL Server. Within this database, I am looking at the fact.order table, which has over 230,000 rows and many columns. In Excel, I start with a blank sheet and navigate to the Data ribbon. From there, I select “From Database” and then “From SQL Server Database.”
This prompts me to enter a server name and a database name. If you have access to SQL Server, you can find the server name by connecting to the database engine. Otherwise, you may need to reach out to your database administrator to obtain this information (always stay on your DBA’s good side, if you know what’s good for you). Once you enter the server and database names, you can hit “OK” and Excel will work its magic to establish a connection.
At this point, you can preview the data from the table that you want to import. Excel will give you the option to transform the data if necessary, but we’ll just hit “Load.” Excel will then create a connection and query the SQL Server database and load the data directly into Excel. This means that you can always access the most up-to-date version of the data, without having to worry about exporting and importing files!!
Advantages of Connecting SQL and Excel
Now, let’s say that you need to update the data in Excel at a later time. Perhaps you have some ad hoc processes that reference this data and you need to ensure that you always have the latest version. With a direct connection, this is easy to do. You can simply go to the Data ribbon and select “Refresh.” Excel will connect to SQL Server and update the data in your Excel sheet with the latest data from the database.
This is incredibly powerful because it means that you can share your Excel sheet with others without worrying about whether they have the latest version of the data. As long as they have access to the SQL Server database, they will always see the most up-to-date version of the data when they open the Excel sheet.
One thing to keep in mind is that this type of direct connection is best suited for ad hoc purposes!! In other words, you should not use this to create production worksheets that will be used by others. This is because the direct connection is dependent on having access to the SQL Server database. If that access is lost, the Excel sheet will no longer be able to connect to the database and the data will be lost. Therefore, it is best to use this type of connection for temporary analysis and reporting purposes.
Once you have successfully set up the direct connection between Excel and SQL Server, you can easily refresh the data whenever you need it. To refresh the data, all you have to do is go to the Data tab, and click on the “Refresh All” button. This will refresh all the data connections in your workbook, including the connection to SQL Server.
Powerful Excel Functionality (PivotTables and Pivot Charts)
I don’t reference this in the video, but you can also use Excel’s PivotTables and PivotCharts to analyze and visualize the data. PivotTables allow you to group and summarize data in many different ways, while PivotCharts provide a visual representation of the data that is easy to understand. It may be easier for you to manipulate this data in Excel and extract additional insights than in SQL Server.
Financial analysts, in particular, should avoid taking manual inputs from any and everywhere (especially ungoverned data sources) and using this type of refresh for production purposes. As a recovering financial analyst I know your management hates automation and loves when you cut and paste random information from Bob in division finance. I also know they want to see you work 12 hours a day because the CFO needs that monthly IT spend variance to budget!! Please do yourself a favor and meditate hard for serenity every Sunday night.
In conclusion, using a direct connection between Excel and SQL Server can greatly improve your workflow when working with large datasets. By leveraging the full power of SQL Server’s querying capabilities and Excel’s Pivot tools, you can create powerful AD-HOC reports and analyze data in ways that would be difficult or impossible with other tools. Consider setting up a direct connection to SQL Server to streamline your ad-hoc workflow and improve your productivity.
In this video you will learn how to use the SQL CASE statement to add “filter flexibility” to your front-end Excel worksheet, thus taking your SQL + Excel skills to the next level. I’ll reveal the UPDATED code I used to build a dynamic SQL statement using SQL Server and Excel.
By using Excel as a tool to pass cell values to SQL Server queries, you’ll be able to generate dynamic SQL statements on the fly, saving time and reducing the risk of errors in your code. Building upon the previous video “Call a SQL Server Stored Procedure using Excel Parameters”, let’s enhance your SQL skills and streamline your workflow.
Here is a screenshot of the front end Excel worksheet we setup in previous videos. This Excel sheet will execute a stored procedure call with parameters supplied from cells on the sheet.
Below is the stored procedure I use to enhance the code from previous videos. I setup a static SQL string that will serve as the base of the SQL statement. I then use the CASE statement to evaluate the cell values incoming from the Excel worksheet (with some slight manipulation for empty and default date values incoming from Excel).
Depending upon those values, the filter clause is dynamically built and appended to the base of the SQL string, which is then executed with the sp_executedqsl command. This command has many advantages with respect to protecting your code from a SQL injection attack.
If you need a breakdown of the code and the worksheet functionality, make sure to watch the video below.
Have you ever wanted to disable the default Tableau highlighting effect when you select a mark on your chart and then remove the filter? Even when the filter is removed via the “Remove All Filters” process, it can be confusing for the user experience when all values remain “greyed out”, tricking the user into thinking that their filter is still applied. This video will help you remedy this issue and improve your dashboard user experience.
Fortunately, there is a solution to this problem that is simple and easy to implement. In this video I will show you how to use a simple calculated field and highlight action to remedy the issue. This should be default behavior in Tableau, (help us out here Tableau!)
The solution approach involves creating a boolean calculated field and setting it initially to TRUE. Then, placing this calculated field on the detail of the chart that has a filter applied. Next, adding a highlight action to the same chart that you want to remove the “greyed out” effect for. In the “Add Highlight Action” pop-up box, the Source Sheet and the Target Sheet should be the same and the Selected Fields option should have the boolean calculated field checked.
By following these steps, you will be able to remove the greyed out effect on your Tableau chart when the “Remove All Filters” process is applied.
This not only improves the appearance of your dashboard but also makes it easier to understand the data.
★☆★ THESE ADDITIONAL FILTERING VIDEOS IN TABLEAU ARE WORTH YOUR TIME ★☆★
Don’t let the greyed out effect on your Tableau charts hold you back any longer. Watch the video and follow the steps outlined in this blog post to improve the appearance and functionality of your Tableau dashboards.
You can also follow my dapper data adventures on Instagram.
If you’re working with data in Excel and need to connect to a SQL Server database, there are a couple of ways to pass parameter values from Excel to SQL Server. In my first video, “Passing Parameter Values from Excel to SQL Server,” I show you how to connect to SQL Server and read values from a cell and pass those values to a native SQL query without using parameters.
Advantage: The first approach has the advantage of being quick and easy to implement. This is because it does not require any additional setup or configuration, such as creating stored procedures in SQL Server. Instead, the values are passed directly to the query, which can be executed immediately.
However, this approach can also be prone to SQL injection attacks, where a hacker inputs malicious SQL code into an input field in order to gain unauthorized access to the database.
Verdict: Speed over Security. Good for Ad-Hoc personal use.
In my second video, “Call a SQL Server Stored Procedure using Excel Parameters,” I demonstrate how to connect to SQL Server from Excel and pass cell values from Excel to SQL Server using a stored procedure. This approach is more secure because the values are passed to a stored procedure rather than a query. It simply requires the setup of a stored procedure in SQL Server, and I show you two ways to accomplish this feat.
Advantage: Stored procedures provide an added layer of security because they can be set to execute with specific permissions, and can be audited for changes and usage. This makes it harder for an attacker to gain unauthorized access to the database or to execute malicious SQL commands.
However, the potential small disadvantage of this approach is that the stored procedures will need to be updated and managed separately from the Excel file.
If you’re new to working with SQL Server and Excel, I recommend watching both videos. The first video will give you a good overview of the basics, while the second video will show you a more secure way to pass parameter values.
I always have fun creating this type of content and sharing with you, my YouTube channel followers.
You can also follow my dapper data adventures on Instagram.
All views and opinions are solely my own and do not necessarily reflect those of my employer
I’m proud to announce that I recently passed the Tableau Certified Data Analyst certification. If you found this article, most likely you are looking for a perspective on the exam and how to pass and earn this certification yourself. Here is the story of my journey, which may differ from the typical experience.
I had a New Year’s resolution to add the Tableau Certified Data Analyst certification to my resume because the Tableau Desktop Certified Associate certification I held was due to expire. If you want to read up on how I passed that older exam, you can find my insights here. Some of those insights will also serve you well on passing the current exam.
I believe that certification has its advantages. It’s a way to signal to potential and current employers that you have some defined level of competency in a targeted skill. It’s also a means to strengthen the case to your employer that you deserve additional compensation (if you are under-compensated). Fortunately, I am compensated fairly now, but this has not always been the case (shout-out to highly competent middle office IT pros toiling away underappreciated, but I digress). Finally, studying allows you to stay up-to-date on the latest tools and trends in your chosen domain.
How Much Experience Do You Need?
The official exam guide states, “The best preparation is role experience and time with the product. To be prepared, candidates are strongly encouraged to have at least 6 months of experience.” I would tend to agree with this if you have used the tool extensively during this time frame. Otherwise, I would recommend at least 1 to 2 years experience with the tool and as a data analyst before attempting to sit for this one. Focus on obtaining the Tableau Specialist certification (it never expires) first before attempting this exam.
Why Did I Get Certified?
For my purposes as a senior manager in a consulting practice, certification certainly has benefits with respect to establishing credibility quickly on new projects. I may hold a manager title but you’ll never pry my hands away from keyboard-centric hands-on technical work, as I enjoy being a technical subject matter expert (and teaching/mentoring others).
Other than employment and signaling purposes, an additional benefit of certification is the personal growth and esteem benefits that you gain from tackling a goal. My body of work is visible online and I have years of relevant experience, thus certification is not something I necessarily needed but something I desired.
The main difference between the new Certified Data Analyst exam and the older Desktop Certified Associate exam is that you will now be tested on Tableau Prep, Tableau Server and Tableau Online. Having to understand aspects of Server and Online were initial concerns I held before taking this test.
I have about 7 years of experience between Tableau Public & Desktop and about a year of experience with Tableau Prep so that was not an issue. I have used Tableau Server to publish my dashboards while on a project at a large Fortune 500 company, but I would by no means consider myself a server expert. I’ve used Prep to transform data for clients without issue as it is easy to pick up with exposure and usage. Look at this listing of domain items covered on the exam.
My strategy to compensate for a lack of deep hands on experience in Domain 4 was to perform really well on all the other domains. Using this strategy, I could still potentially score 91% max (assuming I miss every Domain 4 question which would be highly improbable). If you are like me and have deep knowledge of Tableau Desktop, then you should be fine. Do not use a lack of server experience as an excuse to avoid certification. Simply read up on publishing content at these links and you should have a fighting chance. Personally, I found the Certified Data Analyst exam to be somewhat easier than the Desktop Certified Associate exam. Not easy, just a little bit easier with respect to the Tableau Desktop asks.
Another difference between the Certified Data Analyst exam and the older Desktop Certified Associate exam is the presence of a hands-on lab portion. I honestly found this to be the easiest section on the test, although your mileage may vary. There was one question that had me stumped only because I wasn’t sure what was being asked so I built a visual that probably did not reflect the ask. Other than that 1 question, I felt that I nailed this section.
The official exam guide states, “Candidates are encouraged to comment on items in the exam. Feedback from all comments is considered when item performance is reviewed prior to the release of new versions of exam content.” In hindsight, I should have left a comment on the question stating “unclear”.
For the hands on lab (I’m not sharing anything that isn’t already on the exam guide), definitely be familiar with filter and highlight actions, how to use a TOP N filter, how to use parameters with filters, labels, and how to add reference lines and perform custom sorting.
How Did I Prepare?
Honestly, I meant to prepare for at least a week beforehand, but life got in the way. Thus, I literally crammed my review into the span of 7 hours the Saturday before sitting the exam. I do not recommend this if you are not well versed in the tool. I simply needed to review some concepts. The listing at this website provides great links to official Tableau documentation for the subject areas covered on the exam.
I completed the exam with about 35 minutes to spare. After I submitted my results online, I only had to wait an hour before I received an email stating that I had a score available. This is in stark contrast to when the beta exam was in effect. I heard that results would take months to process. I cleared the 75% hurdle despite studying for only a few hours and not having deep experience with Tableau server. I could have easily scored higher given more study time, but I was happy to pass the exam given the meager study time I allotted to the task. I’m not saying that the test was easy, I’m just fortunate that I’ve had enough experience with Desktop that I could “sacrifice” in other areas and still make it across the finish line. This strategy may not work for you if you have under a year’s experience with the tool.
Focus on These Subject Areas:
However, here is the section you came for, this is my abridged list of test focus areas. Make sure to focus on these subject areas to give yourself a good shot at passing the exam.
Start here: Here are 5 useful videos from my catalog that you should review to level up for the exam. I promise they are worth your time and will help you prepare. Do me a favor and like the videos to help others find the content as well!
Best of luck to you. I know that you can pass this test if you have decent hands on experience with the tool. For those of you without a Tableau license, use Tableau Public to study and fill in gaps by reading blogs, watching videos and using Tableau official documentation. I believe in you!
Need Personal Data Tutoring?
Are you a beginner that needs help understanding data topics in Tableau (or Excel/SQL) and would like someone with experience to discuss your problem? If so, contact me here to schedule a 1 on 1 virtual meetup. Make sure to describe the concept that you are trying to learn in the message so I can understand if I can help. Depending upon your ask and time required we can discuss cost. Access to Tableau Public will cover most of your study needs regarding the Tableau Desktop sections and lucky for you, that is a FREE tool.
About Me (Data background):
Experience: 15 Years Industry + 8 Years Analytics Consulting