How to Drill into Data Using Set Actions in Tableau

Drilling with Set Actions

If you’ve ever tried to use the default drill functionality within Tableau, you know that it could be a more user friendly experience. The default table drill functionality opens all of the options at the next drill level which can force a user to lose sight of the data upon which they’re focusing. A more user-friendly option enables the user to only drill into a specific selected value where focus and attention can be maintained. This is otherwise known as asymmetric drill down.

Fortunately as of version 2018.3, Tableau has added Set Actions as a new functionality. At a high level, developers can take an existing set and update its values based upon a user’s actions in the visualization. The set can be employed via a calculated field within the visualization, via direct placement in the visualization or on the marks card property.

In lay terms this means empowering a user with more interactivity to impact their analyses.

In this first video, I’ll demonstrate a use of set actions on an NBA data set. We’ll drill from Conference to Division to Team to Player. This tip will be easily applicable to your Tableau data. And with the bonus tree-map tip you’ll release your inner Piet Mondrian.

Feel free to interact with the set action example on Tableau Public and then download and dissect the workbook.


Drilling with Level of Detail (LOD) Calculations
If you want to stay with a classic approach, a nice Level of Detail (LOD) workaround can be employed to drill into the next level. Here is a tip that accomplishes a similar outcome where I demonstrate a technique originally presented by Marc Rueter at Tableau Conference 2017.

Now that I’ve equipped you with the knowledge to incorporate customized drilling functionality into your analyses, go forth and do some great things with your data!

References:

https://onlinehelp.tableau.com/current/pro/desktop/en-us/actions_sets.htm
https://www.tableau.com/learn/tutorials/on-demand/set-actions
https://www.basketball-reference.com/leagues/NBA_2018.html
https://www.youtube.com/watch?v=d22A4XVoUEs

Image Copyright dzxy on 123rf.com

Advertisements

T-SQL Tips and Quick Reference

Whenever I have to fire up SQL Server to perform some analyses there are a few functions, keywords and capabilities that I always find myself referring to in order to analyze my data. As is the case with most T-SQL users, even those of us that have been using T-SQL for over a decade, in our heads we always know what we want to do but will refer to our favorite syntax reference sources in order to progress. I decided to make a handy reference sheet for myself and then decided to post it here for anyone else.

How to Create a Temporary Table in T-SQL / SQL Server

Temporary (i.e., temp) tables enable the storage of result sets from SQL scripts yet require less record locking overhead and thus increase performance. They remain in effect until they are explicitly dropped, or until the connection that created them is discontinued.

As I see it, their main benefit is that they preclude me from writing difficult to comprehend nested queries since I can place a result set inside a temp table and then join it back to a normal table at-will.

In this example, the results of permanent table ‘TABLE1’ will be placed into global temporary table ##TEMPTABLE:

SELECT 
     FIELDNAME1,
     FIELDNAME2,
     FILEDNAME3 
INTO ##TEMPTABLE 
FROM TABLE1

Temp tables are stored in the tempdb system database.

TempDB

“The tempdb system database is a global resource that is available to all users connected to the instance of SQL Server or connected to SQL Database.”

Additional Reference:

What do the Hashtags Mean in T-SQL Temp Table Creation?

 The number of hash signs “#” preceding the name of the temp table affects whether the scope of the table is local or global.

  • If you precede the temp table name with “#”, then the table will be treated as a local temp table.
  • If you precede the temp table with “##”, then the table will be treated as a global temp table.

“You can create local and global temporary tables. Local temporary tables are visible only in the current session, and global temporary tables are visible to all sessions. Temporary tables cannot be partitioned. Prefix local temporary table names with single number sign (#table_name), and prefix global temporary table names with a double number sign (##table_name).”

Additional References:


How to Drop a Temp Table
in T-SQL / SQL Server

 There are times when you will need to rerun code that creates a temp table. If the temp table has already been created, you will encounter an error.

“There is already an object named ‘##TEMP_TABLE_NAME’ in the database.”

Place the following code above the creation of your temp tables to force SQL Server to drop the temp table if it already exists. Change ##TEMP_TABLE_NAME to your table name and use the correct number of hashtags as applicable to a local (#) or global (##) temp table.

IF OBJECT_ID('tempdb..##TEMP_TABLE_NAME') IS NOT NULL
DROP TABLE ##TEMP_TABLE_NAME

How to Add a New Field to a Temp Table in T-SQL / SQL Server (ALTER TABLE)

Here is example T-SQL that illustrates how to add a new field to a global temp table. The code below adds a simple bit field (holds either 1 or 0) named FIELD1 to the temp table, declares it as NOT NULL (i.e., it must have a value) and then defaults the value to 0.

ALTER TABLE ##TEMP_TABLE
ADD FIELD1 Bit NOT NULL DEFAULT (0)

The following code changes the data type of an existing field in a global temp table. FIELD1 has its data type changed to NVARCHAR(2) and is declared as NOT NULL.

ALTER TABLE ##TEMP_TABLE
ALTER COLUMN FIELD1 NVARCHAR(20) NOT NULL;

Additional References:


How to Use a CASE Statement in T-SQL / SQL Server

The following information on the CASE statement is direct from Microsoft:

The CASE expression evaluates a list of conditions and returns one of multiple possible result expressions. The CASE expression has two formats:

  • The simple CASE expression compares an expression to a set of simple expressions to determine the result.
  • The searched CASE expression evaluates a set of Boolean expressions to determine the result.

Both formats support an optional ELSE argument.

CASE can be used in any statement or clause that allows a valid expression. For example, you can use CASE in statements such as SELECT, UPDATE, DELETE and SET, and in clauses such as select_list, IN, WHERE, ORDER BY, and HAVING.

Examples from Microsoft:

SELECT
ProductNumber,
Category =
CASE ProductLine
WHEN 'R' THEN 'Road'
WHEN 'M' THEN 'Mountain'
WHEN 'T' THEN 'Touring'
WHEN 'S' THEN 'Other sale items'
ELSE 'Not for sale'
 END,
Name
FROM Production.Product
ORDER BY ProductNumber;


SELECT
ProductNumber,
Name,
"Price Range" =
CASE
WHEN ListPrice =  0 THEN 'Mfg item - not for resale'
WHEN ListPrice = 50 and ListPrice = 250 and ListPrice < 1000 THEN 'Under $1000'
ELSE 'Over $1000'
END
FROM Production.Product
ORDER BY ProductNumber ;

Here is a link to great post that highlights some of the unexpected results when using the CASE statement.


How to Use the Cast Function in T-SQL / SQL Server

When you need to convert a data field or expression to another data type then the cast function can be helpful. I typically have the need to take imported text fields and evaluate them as a datetime. The cast statement below helps me resolve this issue.

Select cast(txtOrder_Date as datetime) as Order_Date

This statement can also be used in a WHERE clause to filter the text as if it were a true datetime field/.

Where cast(txtOrder_Date as datetime)) between '20170101' and '20181231'

Furthermore, you can cast a literal string to an integer or decimal as needed.

Select cast(‘12345’ as int) as Integer_Field
Select cast(‘12345.12’’ as decimal (9,2)) as Decimal_Field

When your FIELDNAME is a text value, you can use the cast function to change its data type to an integer or decimal, and then sum the results. Here are a few examples I have had to use in the past with the sum function.

sum(cast(FIELDNAME as int)) as Sum_Overall_Qty

sum(cast(ltrim(rtrim(FIELDNAME2)) as decimal(38,2))) as Sum_Sales_Price

Additional Reference:


Using the REPLACE Function in T-SQL / SQL Server

The Replace function is useful when you need to replace all occurrences of one character or substring with another character or substring. The following select will replace the string ‘Anthony’ with ‘Tony’.

Select REPLACE(‘My name is Anthony’, ‘Anthony’, ‘Tony’);

REPLACE Function

Additional Reference:


How to Convert a Negative Text Number in Parenthesis Format to a Numeric Data Type (T-SQL / SQL Server)

I’ve found this particular expression useful when trying to convert a negative number in text format to a decimal value when the text is enclosed in parentheses; i.e., changing (123.45) to -123.45

It makes use of the REPLACE function to find the leading parenthesis and replace it with a negative sign. This first REPLACE is nested inside another REPLACE function in order to find the trailing parenthesis and replace

Select cast(replace(replace('(123.45)','(','-'),')','') as money);

REPLACE function2

You can also use the convert function to accomplish the same result. Below I used this line of code to sum the negative formatted text (represented by FIELD_NAME) by converting it to the money data type after replacing the parenthesis.

sum(convert(money,replace(replace(FIELD_NAME,'(','-'),')',''))) as Sum_Domestic_Price


COALESCE Function in T-SQL / SQL Server

The COALESCE function is very useful when replacing NULL field values with a substitute value. Per Microsoft, the COALESCE function evaluates in order a comma delimited list of expressions and returns the current value of the first expression that initially does not evaluate to NULL.

For example,

SELECT COALESCE(NULL, NULL, 'third_value', 'fourth_value');

returns the third value because the third value is the first value that is not null. I will use the COALESCE function at times to replace NULL values with 0 for use in calculations.

Select COALESCE(NULL_FIELD, 0)

https://docs.microsoft.com/en-us/sql/t-sql/language-elements/coalesce-transact-sql?view=sql-server-2017

Yet Another Market Basket Analysis in Tableau

This video represents part two in my Market Basket Analysis series.

The steps in the post were inspired by the book Tableau Unlimited written by former co-worker of mine, Chandraish Sinha. I wasn’t planning to construct another market basket analysis video but when I saw the approach outlined in his book, I felt like it warranted sharing with my readers and followers.

In this version we’ll use default Tableau Superstore data to show the relationship between sub-categories on an Order; all without using a self table join. The visualization and analysis is driven by a user selection parameter.

Once the user selects a sub-category, the bar chart visualization updates to reflect the number of associated sub-category items on the same order.

Sample Superstore Data 2

Watch the video and as always get out there and do some great things with your data!

Feel free to also check out Part 1 here where we create a simpler correlation matrix version that shows all the sub-category relationships in one visual.

 

 

 

Market Basket Analysis in Tableau

 

A favored analysis technique employed by retailers to help them understand the purchase behavior of their customers is the market basket analysis. When you log on to Amazon, most likely you’ve noticed the “Frequently Bought Together” section where Jeff Bezos and company would like to cross-sell you additional products based upon the purchase history of other people who have purchased the same item.

Market Basket Analysis influences how retailers institute sales promotions, loyalty programs, cross-selling/up-selling and even store layouts.

If a retailer observes that most people who purchase Coca-Cola also purchase a package of Doritos (I know they’re competing companies), then it may not make sense to discount both items at once as the consumer might have purchased the associated item at full price anyhow. Understanding the correlation between products is powerful information.

In this video, we’ll use Tableau Superstore data to perform a simple market basket analysis.

Sample Superstore Data 2

Feel free to interact with this market basket analysis on Tableau Public and then download and dissect the workbook.

Watch the video and as always get out there and do some great things with your data.

Feel free to also check out Part 2 here where we’ll create an analysis driven by a user selection parameter.

When Corporate Layoffs Don’t Work

“When downsizing is a knee-jerk reaction, it has long-term costs. Employees and labor costs are rarely the true source of the problems facing an organization. Workers are more likely to be the source of innovation and renewal.” [1]

Case in Point: Circuit City Laid Off Employees for Over-performance

There were a combination of factors that lead to the demise of former electronics retailer Circuit City. A number of these reasons were self-inflicted wounds. The company located its stores in subprime locations, stopped selling appliances to cut warehouse storage and distribution costs and underinvested in its web presence at a time when consumer preferences were beginning to shift online.

However, the company’s biggest blunder was its decision to layoff its most experienced and knowledgeable sales persons while trying to compete in the competitive electronics retail marketplace. In March of 2007, Circuit City announced a scheme to layoff 3,400 hourly workers (roughly 8% of its workforce), while offering a severance package with the ability to reapply to former jobs at a reduced salary. Any reapplications had to occur after a mandatory 10 week cooling off period. Circuit City practiced genteelism by branding its cost cutting and de-skilling scheme a “wage management initiative”.

Management decided to staff its stores with fewer people, with fewer skills, making less money and expected this combination to yield long term positive results. As a result of the layoffs, Circuit City placed knowledgeable, experienced sales staff on a platter and served them to its main competitor, Best Buy. Additionally, where did Circuit City expect to find quality people who would work for a company that did not value loyalty, experience and wage increases?

“From a strategy perspective, customer-facing sales personnel would appear to be a core resource and potential differentiator for a consumer products retailer,” he [Kevin Clark, an assistant professor of management at Villanova School of Business] says. “Especially in an era of rapidly changing and more complex consumer electronics, knowledgeable sales personnel who are perceived by customers as ‘experts’ can be a source of competitive advantage.” [2]

Reportedly, “employees who were paid more than 51 cents above a set pay range for their departments were fired.” [3] However, solidifying the trope of senior executives reaping the gains without the pains, the CEO and Chairman of Circuit City received almost $10 million in various kinds of compensation for steering the company to its imperiled state. [4]

In under two years (i.e., November 2008), Circuit City announced it was going out of business. By laying off its highest paid hourly workers and replacing them with cheaper less skilled workers, in-store customer service levels plummeted which negatively impacted customer perception and sales.

Southwest Airlines Gets it Right

Waving flag of Southwest Airlines editorial 3D rendering

Treating employees as mere cogs and judging employees by costs and not by the overall value they create is self-defeating.

Some companies don’t understand that making workers happy leads to elevated productivity and higher retention levels. High employee morale should be table-stakes, instead it is a strategic key differentiator. Southwest Airlines has never had a layoff in its 47 plus years of existence. That’s laudable when you consider that airlines endured the fallout from 9/11 and the Great Recession (when oil prices spiked over $100 a barrel). As a well deserved consequence, Southwest Airlines routinely leads domestic airlines in customer satisfaction.

Consider this example of how Southwest Airlines treated its recruiting team during the global financial crisis:

“At one point, however, Southwest Airlines was staring at a tough time financially and it did ‘corporate redeployment’. It had 82 employees in the recruiting team. When the company put [in] a hiring freeze, it also wondered what to do with 82 of its employees in this particular team. The company utilised them for customer service. The result: Customer satisfaction went up as a result of this team’s enhanced skill set. When the economy recovered, the team went back to its original job; only this time, they had an additional skill set, which helped the company and the customers alike.” [1]

If you were in the airline industry would you rather work for Southwest Airlines or another domestic competitor (that I mercifully will not name) which embodies layoffs, labor strife and toxic mismanagement of employees?

The Negative Impact of Layoffs

There is a time and place for layoffs. However, more often than not, companies layoff employees during down times in the business cycle to simply lessen the impact on profits, not to avoid a collapse of the business. Against their own best interests, companies also announce layoffs during times of rising profits which causes their best people to head for greener pastures. Any expected cost savings are negated by lower productivity (when the best performers leave), lower innovation and a remaining demoralized workforce subjected to the negative effects of survivor syndrome (i.e., the feeling of guilt after seeing longtime co-workers discarded).

Additionally companies are impacted by “Brand equity costs—damage to the company’s brand as an employer of choice.” [1]. Sites like Glassdoor offer unfairly laid off employees the opportunity to share their sense of betrayal online which can significantly impact a company’s reputation.

Shortsighted management typically operates under the assumption that layoffs will positively impact shareholders. While financial analysts may cheer downsizing efforts, research indicates that layoffs have negative effects on share prices.

“A recent analysis of 41 studies covering 15,000 layoff announcements in more than a dozen countries over 31 years concluded that layoff announcements have an overall negative effect on stock-market prices. This remains true whatever the country, period of time or type of firm considered.”[1]

It should come as no surprise that Circuit City’s stock price fell 4% the day after the company pulled the plug on its most experienced employees. [5]

References:

[1] Employment Downsizing and its Alternatives. Retrieved from https://www.shrm.org/foundation/ourwork/initiatives/resources-from-past-initiatives/Documents/Employment%20Downsizing.pdf

[2] Circuit City plan: Bold strategy or black eye? NBC News. April 2, 2007. Retrieved from http://www.nbcnews.com/id/17857697/ns/business-careers/t/circuit-city-plan-bold-strategy-or-black-eye/

[3] Circuit City Cuts 3,400 ‘Overpaid’ Workers: Washington Post. March 29, 2007. Retrieved from http://www.washingtonpost.com/wp-dyn/content/article/2007/03/28/AR2007032802185.html

[4] Thousands Are Laid Off at Circuit City. What’s New?. New York Times. April 2, 2007 https://www.nytimes.com/2007/04/02/business/media/02carr.html

[5] It’s the Workforce, Stupid! The New Yorker. April 30, 2007. Retrieved from https://www.newyorker.com/magazine/2007/04/30/its-the-workforce-stupid

Circuit City Image Copyright : nazdravie

Use the Power BI Switch Function to Group By Date Ranges

In this latest video, I’ll explain how to use a handy DAX function in Power BI in order to group dates together for reporting. We’ll examine a dashboard that contains fields corresponding to purchase item, purchase date and purchase cost. We’ll then create a calculated column and use the SWITCH function in Power BI to perform our date grouping on the purchase date.

Watch the video to learn how to group dates into the following aging buckets, which can be customized to fit your specific need.

  • 0-15 Days
  • 16-30 Days
  • 31-59 Days
  • 60+ Days

If you are familiar with SQL, then you’ll recognize that the SWITCH function is very similar to the CASE statement; which is SQL’s way of handling IF/THEN logic.

Even though we’re creating a calculated column within Power BI itself, best practice is to push calculated fields to the source when possible. The closer calculated fields are to the underlying source data, the better the performance of the dashboard.

My Submission to the University of Illinois at Urbana-Champaign’s Data Visualization Class

I’m a huge fan of MOOCs (Massive Open Online Courses). I am always on the hunt for something new to learn to increase my knowledge and productivity; and because I run a blog, MOOCs provide fodder for me to share what I learn.

I recently took the Data Visualization class offered by the University of Illinois at Urbana-Champaign on Coursera. The class is offered as part of the Data Mining specialty of six courses that when taken together can lead to graduate credit in its online Master of Computer Science Degree in Data Science.

Ok enough with the brochure items. For the first assignment I constructed a visualization based upon temperature information from NASA’s Goddard Institute for Space Studies (GISS).

Data Definition:

In order to understand the data, you have to understand why temperature anomalies are used as opposed to raw absolute temperature measurements. It is important to note that the temperatures shown in my visualization are not absolute temperatures but rather temperature anomalies.

Basic Terminology

Here’s an explanation from NOAA:

“In climate change studies, temperature anomalies are more important than absolute temperature. A temperature anomaly is the difference from an average, or baseline, temperature. The baseline temperature is typically computed by averaging 30 or more years of temperature data. A positive anomaly indicates the observed temperature was warmer than the baseline, while a negative anomaly indicates the observed temperature was cooler than the baseline.”

Interpreting the Visualization

The course leaves it up to the learner to decide which visualization tool to use in order to display the temperature change information. Although I have experience with multiple visualization programs like Qlikview and Power BI, Tableau is my tool of choice. I didn’t just create a static visualization, I created an interactive dashboard that you can reference by clicking below.

From a data perspective, I believe the numbers in the file that the course provides is a bit different than the one I am linked to here but you can see the format of the data that needs to be pivoted in order to make an appropriate line graph.

All of the data in this set illustrates that temperature anomalies are increasing from the corresponding 1951-1980 mean temperatures as years progress. Every line graph of readings from meteorological stations shows an upward trend in temperature deviation readings. The distribution bins illustrate that the higher temperature deviations occur in more recent years. The recency of years is indicated by the intensity of the color red.

Let’s break down the visualization:

UIUC Top Portion

Top Section Distribution Charts:

  • There are three sub-sections representing global, northern hemisphere and southern hemisphere temperature deviations
  • The x axis represents temperature deviations in bins of 10 degrees
  • The y axis is a count of the number of years that fall between the binned temperature ranges
    • For example, if 10 years have a recorded temperature anomaly between 60 and 69 degrees, then the x axis would be 60 and the y axis would be 10

UIUC Distribution Focus.png

  • Each 10 degree bin is comprised of the various years that correspond to a respective temperature anomaly range
    • For example in the picture above, the year 1880 (as designated by the tooltip) had a temperature anomaly that was 19 degrees lower than the 30 year average. This is why the corresponding box for the year 1880 is not intensely colored.
    • Additionally, the -19 degree anomaly is located in the -10 degree bin (which contains anomalies from -10 to -19 degrees)
    • These aspects are more clearly illustrated when interacting with the Tableau Public dashboard
  • The intensity of the color of red indicates the recency of the year; for example year 1880 would be represented as white while year 2014 would be indicated by a deep red color

Bottom Section Line Graph Chart:

UIUC Bottom Portion

  • The y axis represents the temperature deviation from the corresponding 1951-1980 mean temperatures
  • Each line represents the temperature deviation at a specific geographic location during the 1880-2014 period
  • The x axis represents the year of the temperature reading

UIUC Gobal Average

In the above picture I strip out the majority of lines leaving only the global deviation line. Climate science deniers may want to look away as the data clearly shows that global temperatures are rising.

Bottom Line:

All in all I thought it was a decent class covering very theoretical issues regarding data visualization. Practicality is exclusively covered in the exercises as the class does not provide any instruction on how to use any of the tools required to complete the class. I understand the reason as this is not a “How to Use a Software Tool” class.

I’d define the exercises as “BYOE” (i.e., bring your own expertise). The class forces you to do your own research in regards to visualization tool instruction. This is especially true regarding the second exercise which requires you to learn how to visualize graphs and nodes. I had to learn how to use a program called Gephi in order to produce a network map of the cities in my favorite board game named Pandemic. The lines between the city nodes are the paths that one can travel within the game.

UIUC Data Viz Week 3

If you’re looking for more practicality and data visualization best practices as opposed to hardcore computer science topics take a look at the Coursera specialization from UC Davis called “Visualization with Tableau”.

In case you were wondering I received at 96% grade in the UIUC course.

My final rating for the class is 3 stars out 5; worth a look.