My Journey to Obtaining the Certified Business Information Professional (CBIP) Certification

As of the date of this blog post I can proudly say that I have completed the certification suite of exams that comprise the Certified Business Intelligence Professional (CBIP) designation. My aim in taking the test was threefold.

  1. Discover how my knowledge and experience stacked up against professional standards issued by a reputable body in data and computing.
  2. Find additional motivation to constantly educate myself regarding data and business intelligence since the certification requires renewal.
  3. Bolster credentials, because it never hurts one’s bottom line to show you have expertise in your profession.

If you’ve found this page via search, you’re no doubt already acquainted with this certification offered by The Data Warehouse Institute (TDWI). I started with what I though would be the most difficult test based upon what I have researched; the Information Systems Core (i.e. IS Core). However, this was not the case as the specialty exam was the most difficult in my opinion.

Test 1: Information Systems Core (i.e. IS Core):

12/15/17: I wish I could share some detailed information about the test but that is not allowed per CBIP guidelines. All I can say is that the scope of information covered is very broad.

“The IS Core examination (Information Systems Technology) covers the base 4 year model curriculum from ACM and AIS for information systems – the entire spectrum of organizational and professional skills, teams and supervision, strategic organizational systems development and project management, systems development, web development, databases and systems integration – the subject matter, testing your ability to recognize, differentiate, and understand the definitions of the concepts covered.” – CBIP Examinations Guide

For adequate preparation, you’ll first need to spend $135 on the examinations guide. Unfortunately, the examinations guide is not something you can simply study and then go sit for the test. It is basically a reference book that points you to other sources to consider for test preparation. The guide also outlines the various subject areas that will appear on the test. Let me stress that you should not sit for this test without pertinent work experience and education. You will need to draw upon your knowledge and experiences to have a legitimate shot at passing.

My intent was to devote about 3 weeks’ worth of study time to tackle the IS Core but my work severely got in the way of that plan. I ended up devoting only ten hours of study time in total, but this was certainly not by design.

First I took the sample test of 42 questions in the examinations guide and fared pretty well. This gave me some confidence to continue with my scheduled exam date when I found out that my work was going to shorten my available study time.

The test was difficult. I’m not going to sugar coat this aspect. While I was taking the proctored exam, I could count on two hands where I was confident that I had chosen the correct answer (out of 110 questions). Part of the difficulty of the exam is the fact that you are presented with 4 choices where at least two of these choices could be a satisfactory answer.

Test 2: Data Foundations

12/31/17: I performed much better on the Data Foundations test, scoring well above the mastery level threshold of 70%. I was buoyed by my performance on the Information Systems Core test and only scheduled about 10 hours of study time in preparation for Data Foundations. I used one reference book to prepare. My advice for this test would be to have an understanding of metadata concepts; (this is listed as a subject area already cited in the CBIP Examinations Guide). Make the DAMA Guide to the Data Management Body of Knowledge your best friend. I used the 1st edition in lieu of the 2nd edition in my preparation since I already had the 1st edition in my possession.

Test 3: Specialty Exam: Data Management

1/14/18: This was the most difficult of the three exams that I sat. It may have been a function of my limited preparation as I only put in about 3 hours of study time. The scope of topics regarding this exam is so broad that I planned to again leverage my experience and knowledge to power me through. The majority of questions on this exam required narrowing down the answers to the two best answers and then selecting one. There is a persistent overlap between what could be acceptable and what the exam decrees is the one right answer. I’m not giving away anything that isn’t already on the outline shared by TDWI but you’ll really need to brush up on your knowledge of data governance, data management, data warehousing and master/reference data.

My Background:

Not to be immodest (I only want to share my mindset for sitting the exam with somewhat minimal study) but I’ve been working with data for 15 plus years and hold both an MBA and a Masters in Information Management. Before becoming a BI/data and analytics consultant, I worked back office in a bank supporting the monthly update of three credit risk data marts. Thankfully all of that hard gained experience working in a financial institution’s back office paid-off. Surprisingly, the number of right answers I gained from study time were minimal. Your mileage may vary in this regard.

Reference Material:

Here are the reference materials I used in my preparation; fortunately, (with the exception of the CBIP manual) I already had these in my library due to graduate studies. Depending upon your level of experience, you may need to supplement your effort with additional books. I will say that both Wikipedia and Search Business Analytics were very helpful for looking up unfamiliar terms.

 

Best of luck to you on your journey to CBIP certification!

Photo Copyright: dragonimages / 123RF Stock Photo

Advertisements

Create A Barbell/DNA Chart in Tableau with NBA Data

 

A Barbell, Dumbbell or DNA chart should be considered when you want to illustrate the difference or degree of change between two data points. In this video I will use NBA data from the 2016-2017 season (courtesy of basketball-reference.com) to illustrate the difference between team wins and losses.

If you’re interested in Business Intelligence & Tableau subscribe and check out my videos either here on this site or on my Youtube channel.

Data Profiling with Experian Pandora

 

Experian Pandora is a robust data profiling and data quality tool that enables users to quickly obtain useful statistics related to their data. Once you load your data into the interface, you can identify statistical anomalies and outliers in the data within seconds. To gain these types of insights, normally I have to write SQL scripts or use the SSIS Data Profiling Tool against SQL Server data. Experian Pandora is much easier to use against data in .csv files or Excel spreadsheets since you can simply drag and drop those items into the interface.

A lack of data profiling leads to poor data quality which leads to inaccurate information and poor business performance. I believe you will find this tool a worthy add to your data toolbox.

Download the Free Data Profiler: https://www.edq.com/experian-pandora/data-profiling/

If you’re interested in Business Intelligence & Tableau subscribe and check out all my videos either here on this site or on my Youtube channel.

Anthony B. Smoak

General Motors’ Information Technology: IT’s Complicated

General Motors and its relationship with technology has been one of innovation followed by periods of stagnation. Its technology staffing strategy of choice has been acquisition, followed by pure outsourcing, until it settled on its current insourcing approach. New startups like Tesla and Uber have a profound effect on a rapidly evolving automotive industry. GM as an industry incumbent must embrace new trends regarding autonomous vehicles and all the requisite software and technology to remain viable. The company currently believes than an insourced IT staff can help it develop competitive advantages.

The EDS Acquisition

General Motors has a long history of employing Electronic Data Systems Corporation (EDS) to service its information technology needs. The $2.5 billion acquisition of EDS in June of 1984 from billionaire Ross Perot was a move to help impose structure upon GM’s unorganized maze of data-processing systems and IT infrastructure. From the start, there were culture clashes between the two organizations; although EDS saw significant revenue increases after the acquisition. The management styles of brash, outspoken EDS founder Ross Perot and the bureaucratic GM CEO Roger Smith were incompatible.

“Problems surfaced within a year when the differences in management style between Perot and Smith became evident. The August 1984 issue of Ward’s Auto World suggested ‘Mr. Perot is a self-made man and iconoclast used to calling his own shots … Roger B. Smith [is] a product of the GM consensus-by-committee school of management, never an entrepreneur.’” [1]

Additionally, six thousand GM employees were transferred from GM to EDS at lower pay [2], which served to stoke the fires of the culture clash.

From 1984 until it was eventually spun-off in 1996, EDS was a wholly owned subsidiary of GM. Although there was an ownership separation, the two behemoths were still tightly coupled in regard to technology staffing. The decision to divest itself of EDS was a strategic decision by GM to focus on its core competency of vehicle manufacturing. EDS also gained the freedom to win additional technology contracting work from other organizations.

1600px-EDS-Plano-TX-5071

HP Enterprise Services (formerly EDS, Electronic Data Systems) corporate headquarters in Plano, Texas (Wikipedia)

Post EDS Spin-Off

Post spin-off, General Motors continued to contract with EDS for technology services as it still accounted for a third of EDS’s revenues at the time. Perceived as Texas “outsiders” by the Detroit incumbents, EDS found it difficult to deal with the fragmented nature of GM’s systems across various business units and divisions. While EDS had the requisite technical expertise, it did not always have enough internal influence to navigate GM’s intense political landscape. Obtaining consensus amongst business units in regard to technology decisions was a challenging endeavor. In an attempt to address these issues, incoming GM CIO Ralph Szygenda spearheaded the creation of an internal matrixed organization called Information Systems & Services (IS&S).

IS&S was created as a matrix organization consisting of internal GM technology executives and various other technologists (e.g. business and systems analysts). The new organizational structure consisted of a dual reporting relationship; IS&S members simultaneously reported to the CIO organization and to their local business unit leadership.

Generally, matrix organizations are instituted in order to promote integration. The advantage of the matrix organization is that it allows members to focus on local initiatives in their assigned business unit and it enables an information flow from the local units to the central IT organization. General Motors is a famously siloed global organization. With the creation of IS&S, members could now promote information sharing between different functions within GM and address the cross-organizational problems that had challenged EDS.

The matrix structure is not without weaknesses. To quote a famous book, “No man can serve two masters.” Employees in a matrix organization often deal with additional frustrations as they attempt to reconcile their allegiances and marching orders from conflicting authorities.  “Matrix organizations often make it difficult for managers to achieve their business strategies because they flood managers with more information than they can process” [3]. From my own personal experiences of working with IS&S while employed at GM subsidiary Saturn, I observed that members were inundated with meetings as they tried to stay up to date with the plans and initiatives of the central IT organization while trying to remain focused on their internal business units.

 A Return to EDS Insourcing

From the creation of IS&S in 1996 until 2012, GM relied upon a variety of outsourced contractors and vendors to deliver information technology services such as Capgemini, IBM, HP and Wipro. In 2010 GM renewed an existing technology outsourcing contract with the old EDS (now HP) for $2 billion.

The general wisdom in regard to outsourcing is that companies will seek to focus on those core activities that are essential to maintain a competitive advantage in their industry. By focusing on core competencies, companies can potentially reduce their cost structure, enhance product or service differentiation and focus on building competitive advantages.

In a reversal of its longstanding IT sourcing strategy, GM made headlines in 2012 with the decision to insource and hire thousands of technologists to supplement its bare bones IT staff. New GM CIO Randy Mott reasoned that an internal technical staff would be more successful working with business units and would deliver technology needs at a cheaper cost than outside providers. These savings could then be used to drive IT innovation and fund the capabilities needed to compete in a rapidly evolving automotive industry.

“By the end of this year (2012) GM will employ about 11,500 IT pros, compared with 1,400 when Mott started at the company four years ago, flipping its internal/external IT worker ratio from about 10/90 to about 90/10, an astounding reversal” [4].

GM decided to hire over 3,000 workers from HP that were already working at GM as part of its Global Information Technology Organization. The move could be considered an act of “getting the band back together” as HP purchased EDS in 2008 for $13.9 billion. Randy Mott was the CIO of HP before assuming the same position at GM. It is plausible that this fact factored into GM’s insourcing decision calculus.

It should be noted that insourcing IT personnel is not without risks. Insourcing requires a company to compete for technical resources which can be difficult in cutting edge technology areas. Furthermore, the complexities of running IT in house “requires management attention and resources that might better serve the company if focused on other value-added activities” [3].

GM’s Information Technology Transitions from Commodity to Innovation

The automotive industry is embarking upon significant changes as it deals with innovations and disruptions from the likes of Uber and Tesla. To illustrate this point, Tesla (founded in 2003) had a higher market capitalization than both GM and Ford for a period of three months in 2017. Auto industry incumbents like GM are focusing on automating and streamlining commoditized processes as well as applying IT to more innovative value-added functions (e.g. computerized crash testing, simulations to shorten vehicle development times and data analysis for profit optimization).

In its early years, GM had been widely perceived as an innovator before making a series of missteps that harmed this reputation. GM fell behind on hybrid engine development after taking a technology lead in the electric vehicle space. The company defunded its lauded EV1 offering in the early 1990s to appease the bean counters. The company also starved innovative upstart Saturn of the necessary funds to introduce new models for a period of five years.

2000-2002_Saturn_SL_--_03-16-2012_2

2000-2002 Saturn SL2 (Wikipedia) The innovative Saturn subsidiary was starved of funds.

“G.M.’s biggest failing, reflected in a clear pattern over recent decades, has been its inability to strike a balance between those inside the company who pushed for innovation ahead of the curve, and the finance executives who worried more about returns on investment” [6].

After a government bailout in 2009, the company promised to emerge leaner and commit itself to technology leadership. Automakers are now focusing on software development as a source of competitive advantage. As a result, GM has opened four information technology innovation centers in Michigan, Texas, Georgia and Arizona. These locations were chosen in order to be close to recent college graduates from leading computer science programs.

GM Opens Fourth IT Innovation Center in Chandler, Arizona

One of GM’s 4 new Information Technology Innovation Centers 

Additionally, GM purchased Cruise automation which is developing autonomous driving software and hardware technology. It is even testing a ride-sharing app for autonomous vehicles. The purchase will bolster GM’s technology staff and efforts in an emerging space.

“Harvard Business School professor Alan MacCormack, an expert in product development management within the software sector, says that outsourcing even routine software development can carry risks for companies that are seeking innovation in that area. He notes that today’s vehicles have more software and computing power than the original Apollo mission. ‘Everybody can make a decent enough powertrain. But what differentiates you is what you can do with your software,’ he says of car makers generally. ‘Companies have to be careful that they don’t outsource the crown jewels’” [6].

The company also developed an internal private cloud nicknamed Galileo, to improve its business and IT operations and consolidated twenty three outsourced data centers into two insourced facilities [7].

With its new cadre of insourced technologists, GM will need to find a way to bridge the ever-persistent culture gaps between innovative technologists, bureaucratic management and the Excel zealots in finance.

“IT is core, I think, to GM’s revival, and I think it will be core to their success in the future,” – Former GM CEO Dan Akerson [7]

References:

[1] http://www.fundinguniverse.com/company-histories/electronic-data-systems-corporation-history/

[2] Nauss, D.  (May 20, 1994). Pain and Gain for GM : EDS Spinoff Would Close Stormy, Profitable Chapter. Los Angeles Times. Retrieved from http://articles.latimes.com/1994-05-20/business/fi-60133_1_gm-employees

[3] Keri E. Pearlson, Carol S. Saunders, Dennis F. Galletta. (December 2015). Managing and Using Information Systems, A Strategic Approach
6th edition. Wiley Publishing ©2016

[4] Preston, R. (April 14, 2016). General Motors’ IT Transformation: Building Downturn-Resistant Profitability. ForbesBrandVoice. Retrieved from https://www.forbes.com/sites/oracle/2016/04/14/general-motors-it-transformation-building-downturn-resistant-profitability/#67b37d551222

[5] Boudette, N. (July 6, 2017). Tesla Loses No. 1 Spot in Market Value Among U.S. Automakers. The New York Times. Retrieved from https://www.nytimes.com/2017/07/06/business/tesla-stock-market-value.html

[6] Leber, J. (November 5, 2012). With Computerized Cars Ahead, GM Puts IT Outsourcing in the Rearview Mirror. MIT Technology Review. Retrieved from https://www.technologyreview.com/s/506746/with-computerized-cars-ahead-gm-puts-it-outsourcing-in-the-rearview-mirror/

[7] Wayland, M. September 18, 2017. GM plots next phase of IT overhaul Unlocking the Potential of a Vast Data Empire. Automotive News. Retrieved from http://www.autonews.com/article/20170918/OEM06/170919754/gm-it-randy-mott

Featured Image Copyright: akodisinghe / 123RF Stock Photo

Tableau Filtering Actions Made Easy

This is a guest post provided by Vishal Bagla, Chaitanya Sagar, and Saneesh Veetil of Perceptive Analytics.

Tableau is one of the most advanced visualization tools available on the market today. It is consistently ranked as a ‘Leader’ in Gartner’s Magic Quadrant. Tableau can process millions of rows of data and perform a multitude of complex calculations with ease. But sometimes analyzing large amounts of data can become tedious if not performed properly. Tableau provides many features that make our lives easier with respect to handling datasets big and small, which ultimately enables powerful visualizations.

Tableau’s filtering actions are useful because they create subsets of a larger dataset to enable data analysis at a more granular level. Filtering also aids user comprehension of data. Within Tableau data can be filtered at the data source level, sheet level or dashboard level. The application’s filtering capabilities enable data cleansing and can also increase processing efficiency. Furthermore, filtering aids with unnecessary data point removal and enables the creation of user defined date or value ranges. The best part is that all of these filtering capabilities can be accessed by dragging and dropping. Absolutely no coding or elaborate data science capabilities are required to use these features in Tableau.

In this article, we will touch upon the common filters available in Tableau and how they can be used to create different types of charts. After reading this article, you should be able to understand the following four filtering techniques in Tableau:

  1. Keep Only/Exclude Filters
  2. Dimension and Measure Filters
  3. Quick Filters
  4. Higher Level Filters

We will use the sample ‘Superstore’ dataset built in Tableau to understand these various functions.

1. Keep Only/Exclude Filters in Tableau

These filters are the easiest to use in Tableau. You can filter individual/multiple data points in a chart by simply selecting them and choosing the “Keep Only” or “Exclude” option. This type of filter is useful when you want to focus on a specific set of values or a specific region in a chart.

While using the default Superstore dataset within Tableau, if we want to analyze sales by geography, we’d arrive at the following chart.

1.png

However, if we want to keep or exclude data associated with Washington state, we can just select the “Washington” data point on the map. Tableau will then offer the user the option to “Keep Only” or “Exclude”. We can then simply choose the option that fits our need.

2.png

2. Dimension and Measure Filters

Dimension and measure filters are the most common filters used while working with Tableau. These filters enable analysis at the most granular level. Let’s examine the difference between a dimension filter and a measure filter.

Dimension filters are applied to data points which are categorical in nature (e.g. country names, customer names, patient names, products offered by a company, etc.). When using a dimension filter, we can individually select each of the values that we wish to include or exclude. Alternatively, we can identify a pattern for the values that we wish to filter.

Measure filters can be applied to data points which are quantitative in nature, (e.g. sales, units, etc.). For measure filters, we generally work with numerical functions such as sum, average, standard deviation, variance, minimum or maximum.

Let’s examine dimension filters using the default Tableau Superstore dataset. The chart below displays a list of customers and their respective sales.

3.png

Let’s examine how to exclude all customers whose names start with the letter ‘T’ and then subsequently keep only the top 5 customers by Sales from the remaining list.

One way would be to simply select all the customers whose names start with ‘T’ and then use the ‘Exclude’ option to filter out those customers. However, this is not a feasible approach when we have hundreds or thousands of customers. We will use a dimension filter to perform this task.

When you move the Customer Name field from the data pane to the filters pane, a dialogue box like the one shown below will appear.

4.png

As shown in the above dialogue box, you can select all the names starting with “T” and exclude them individually. The dialogue box should look like the one shown below.

5.png

The more efficient alternative is to go to the Wildcard tab in the dialogue box and select the “Exclude” check box. You can then choose the relevant option “Does not start with”.

6.png

To filter the top 5 customers by sales, right click on “Customer Name” in the Filters area, select “Edit Filter” and then go to the “Top” tab in the filter dialogue box. Next, choose the “By Field” option. Make your selections align to the following screenshot.

top-5-customers-by-sales-filter

After performing the necessary steps, the output will yield the top 5 customers by sales.

top 5 customers by sales

Let’s move on to measure filtering within the same Tableau Superstore dataset. We’re going to filter the months where 2016 sales were above $50,000. Without a measure filter applied, our sales data for 2016 would look like the following:

9.png

To filter out the months where sales were more than $50,000, move the sales measure from the data pane to the filter pane. Observe the following:

10.png

Here, we can choose any one of the filter options depending upon our requirement. Let’s choose sum and click on “Next”. As shown below, we are provided with four different options.

11.png

We can then choose one of the following filter options:

  • Enter a range of values;
  • Enter the minimum value that you want to display using the “At least” tab;
  • Enter the maximum value that you want to display using the “At most” tab;
  • From the Special tab, select “all values”, “null values” or “non-null” values;

Per our example, we want to filter for sales that total more than $50,000. Thus, we will choose the “At least” tab and enter a minimum value of 50,000.

12.png

In the output, we are left with the six months (i.e. March, May, September, October, November, December) that have a sum of sales that is greater than $50,000.

13.png

Similarly, we can choose other options such as minimum, maximum, standard deviation, variance, etc. for measure filters. Dimension and measure filters make it very easy to analyze our data. However, if the dataset is very large, measure filters can lead to slow performance since Tableau needs to analyze the entire dataset before it filters out the relevant values.

3. Quick Filters

Quick filters are radio buttons or check boxes that enable the selection of different categories or values that reside in a data field. These filters are very intuitive and infuse your visualizations with additional interactivity. Let’s review how to apply quick filters in our Tableau sheet.

In our scenario, we have sales data for different product segments and different regions from 2014 to 2019. Our data looks like the following:

14.png

We want to filter the data by segments and see data for only two segments (Consumer and Corporate). One way to do this would be to use a dimension filter, but what if we want to compare segments and change the segment every now and then? In this scenario, a quick filter would be a useful addition to the visualization. To add a quick filter, right click on the “Segment” dimension in Marks pane and choose “Show Filter”.

15.png

Once we click on “Show Filter”, a box will appear on the right side of the Tableau screen. The box contains all constituent values of the Segment dimension. At this point, we could choose to filter on any segment value available in the quick filter box. If we were to select both Consumer and Corporate values, Tableau will display two charts instead of three.

16

Similarly, we can add other quick filters for region, country, ship status or any other dimension.

17.png

4. Higher Level Filters

Dimension, measure and quick filters are very easy to use and make the process of analyzing data hassle free. However, when multiple filters are used on a large data source, processing becomes slow and inefficient. Application performance degrades with each additional filter.

The right way to begin working with a large data source is to initially filter when making a connection to the data. Once the data is filtered at this stage, any further analysis will be performed on the remaining data subset; in this manner, data processing is more efficient. These filters are called Macro filters or Higher-Level filters. Let’s apply a macro level filter on our main data source.

We can choose the “Add” option under the Filters tab in top right corner of the Data Source window.

18.png

Once we click on “Add”, Tableau opens a window which presents an option to add various filters.

19.png

Upon clicking “Add” in the Edit Data Source Filters dialogue box, we’re presented with the entire list of variables in the dataset. We can then add filters to the one we select. Let’s say we want to add a filter to the Region field and include only the Central and East region in our data.

20.png

Observe that, our dataset is filtered at the data source level. Only those data points where the region is either Central or East will be available for our analyses. Let’s turn our attention back to the sales forecast visualization that we used to understand quick filters.

21

 

In the above window, we observe options for only “Central” and “East” in the Region Filter pane. This means that our filter applied at the data source level was successful.

Hopefully after reading this article you are more aware of both the importance and variety of filters available in Tableau. However, using unnecessary filters in unorthodox ways can lead to performance degradation and impact overall productivity. Therefore, always assess if you’re adding unnecessary options to your charts and dashboards that have the potential to negatively impact performance.

Author Bio:

This article was contributed by Perceptive Analytics. Vishal Bagla, Chaitanya Sagar, and Saneesh Veetil contributed to this article.

Use Parameters in Tableau to Enhance Your Tables

When you receive a requirement to make a boring Excel style table in Tableau, consider spicing up the table by incorporating parameters. One clever use of parameters enables you to incorporate user defined rows and columns into a Tableau table layout. As a user selects a parameter value (representing a column or row), the table is dynamically updated to show the column or row that was selected.

“Parameters are useful when you want to add interactivity and flexibility to a report, or to experiment with what-if scenarios. Suppose you are unsure which fields to include in your view or which layout would work best for your viewers. You can incorporate parameters into your view to let viewers choose how they want to look at the data.

When you work with parameters, consider the following two things that are important in making them useful:

They need to be used in calculations.
The parameter control needs to be displayed so that viewers can interact with it.”

In this video I will show you how to infuse an otherwise boring table with some parameter driven interactivity. Enjoy!

Reference: Tableau Online Help

Benford’s Law Visualization in Tableau

Benford’s law, also called the first-digit law, is an observation about the frequency distribution of leading digits in sets of numerical data. The law states that in many naturally occurring collections of numbers, the leading significant digit is likely to be small [1]. For example, in sets that obey the law, the number 1 appears as the most significant digit about 30% of the time, and the percentages decrease all the way down to a leading digit of 9, which appears 4.6% of the time.

Why Run This Analysis?

When fraudsters are fabricating data, they may not know to manufacture fake data in a manner that conforms to Benford’s Law.  Constructing a Benford’s Law visualization in Tableau can help you determine if your numerical data is fake or at least raise doubts about its authenticity.

In short, remember that one isn’t always the loneliest number!

If you’re interested in Business Intelligence & Tableau subscribe and check out all my videos either here on this site or on my Youtube channel.

[1] https://en.wikipedia.org/wiki/Benford%27s_law

 

Return Unmatched Records with SQL and Microsoft Access

Over the course of many years of building SQL scripts, I’ve tended to help SQL novices perform the set difference operation on their data. This post will not provide in-depth coverage on SQL run plans and tuning minutiae, but I do want to provide a high level overview for the novice.

If we define set A as the three numbers {1, 2, 3} and set B as the numbers {2, 3, 4} then the set difference, denoted as A \ B, is {1}. Notice that the element 1 is only a member of set A.

A picture is worth a thousand words as they say. A Venn diagram will be effective at illustrating what we’re trying to accomplish in this post.

Venn Diagram Difference

This blog post will cover using SQL and Microsoft Access to address capturing the shaded records in set A. If you have a database table named A and wanted to determine all of the rows in this table that DO NOT reside in another table named B, then you would apply the set difference principle.

LEFT OUTER JOIN & IS NULL SYNTAX

There are multiple ways to implement the set difference principle. It helps if there is a common join key between both sets of data when performing this analysis.

If I were working with two tables, one containing inventory data and one containing order data. I could write the following SQL script to return all the inventory rows that do not reside in the orders table.

SELECT table_inventories.*
FROM   table_inventories
       LEFT OUTER JOIN table_orders
                    ON table_inventories.id = table_orders.id
WHERE  table_orders.id IS NULL  

MICROSOFT ACCESS EXAMPLE

Consider the following tables in Microsoft Access. Observe that table_orders has fewer records than table_inventories.

Access Example Inventory Access Example Orders

We can construct a set difference select query using these tables to return all of the products in table_inventories that have not been ordered. Create a query in Microsoft Access in a similar fashion as shown below.

Access SQL Difference Join

The result of this query would produce the following two products that are not in table_orders.

Access Example Query Result

The Microsoft Access Query & View Designer would automatically generate the following SQL if you cared to open the Access SQL editor.

SELECT table_inventories.*
FROM   table_inventories
LEFT JOIN table_orders
ON table_inventories.id = table_orders.id
WHERE  (( ( table_orders.id ) IS NULL ));

Notice that LEFT JOIN is automatically created instead of LEFT OUTER JOIN. In Microsoft Access, the OUTER operation is optional. Also notice that Access loves to add additional parentheses for reasons known only to Microsoft.

Per Microsoft Access SQL Reference:

Use a LEFT JOIN operation to create a left outer join. Left outer joins include all of the records from the first (left) of two tables, even if there are no matching values for records in the second (right) table [1].

NOT EXISTS SYNTAX

Let’s step away from Microsoft Access for the remainder of this post. The NOT EXISTS approach provides similar functionality in a more performance friendly manner as compared to the LEFT OUTER JOIN & IS NULL syntax.

SELECT table_inventories.*
FROM   table_inventories
WHERE  table_inventories.id NOT EXISTS (SELECT table_orders.id
FROM   table_orders);

EXCEPT SYNTAX (T-SQL)

Alternatively, we could use the SQL EXCEPT operator which would also accomplish the task of returning inventory ids that do not reside in the orders table (i.e. inventory items that were never ordered). This syntax would be appropriate when using SQL Server.

SELECT table_inventories.id
FROM   table_inventories
EXCEPT
SELECT table_orders.id
FROM   table_orders

Per Microsoft:

EXCEPT
Returns any distinct values from the query to the left of the EXCEPT operator that are not also returned from the right query [2].

MINUS SYNTAX (ORACLE)

The following script will yield the same result as the T-SQL syntax. When using Oracle, make sure to incorporate the MINUS operator.

SELECT table_inventories.id
FROM   inventories
MINUS
SELECT table_orders.id
FROM   table_orders

Now take this tip and get out there and do some good things with your data.

Anthony Smoak

 

References:

[1] Access 2007 Developer Reference. https://msdn.microsoft.com/en-us/library/bb208894(v=office.12).aspx

[2] Microsoft T-SQL Docs. Set Operators – EXCEPT and INTERSECT (Transact-SQL). https://docs.microsoft.com/en-us/sql/t-sql/language-elements/set-operators-except-and-intersect-transact-sql

[3] Oracle Help Center. The UNION [ALL], INTERSECT, MINUS Operators. http://docs.oracle.com/cd/B19306_01/server.102/b14200/queries004.htm

Venn diagram courtesy of http://math.cmu.edu/~bkell/21110-2010s/sets.html

General Electric’s Digital Pivot

Digital technologies will touch and transform every business sector. No industry will be completely safe from either nimble venture capital backed startups or disruptive “new economy” organizations born of the digital age. Industrial companies founded at the tail-end of the 19th century would not be expected to reside at the forefront of digital change and experimentation. But this is exactly where we find General Electric. The 125 year old company has embarked upon a remarkable transformation of its people, technology and overall business model to embrace an impressive digital strategy in a relatively short period of time.

However, I’d be remiss without mentioning that the digital shift has not resulted in immediate changes to operating results. In successful companies, continual reinvention is the name of the game even when business is good. General Electric however has endured a number of challenges over the past decade (e.g. financial crisis, share price, executive shakeup, activist investors) but the company does deserve credit for its impressive shift to digital while addressing a myriad of ongoing challenges.

GE has invested significantly in machine learning, artificial intelligence, open source software development, 3D printing, internet of things (IoT), internet connected drones and all the accompanying personnel required to make its digital ambitions feasible. General Electric has stated an audacious plan to be one of the world’s top ten software companies with sales and services worth as much as $15 billion by 2020 [1]. The company will have invested $6.6 billion, from 2011 through the end of this year in transforming itself into a “digital industrial” company [2].

Bringing Good Things to Life

Back in 2009, former CEO Immelt was speaking with scientists working on the development of new jet engines. From this experience he learned that the engine sensors were generating large quantities of data from every flight but that the company was not maximizing the potential benefits of the accumulation. Traditionally, sensor data was analyzed real time by a technician to gauge current performance and then that data was discarded.

Immelt visualized a future where the company’s sensor data data would be worth as much as the machinery itself [2]. Two years later, the company established GE Digital as a separate corporate unit headquartered in San Ramon California. Immelt realized that a separate unit needed to be established in order to keep it from being stifled by the legacy organization.

“The San Ramon complex, home to GE Digital, now employs 1,400 people. The buildings are designed to suit the free-range working ways of software developers: open-plan floors, bench seating, whiteboards, couches for impromptu meetings, balconies overlooking the grounds and kitchen areas with snacks” [1].

The company even embarked upon an ad campaign where it poked fun at its stodgy corporate reputation. The campaign was intended to showcase General Electric as a destination for software engineers and other technical talent.

The aim of the company’s ambitious new unit was to house the requisite personnel and lay the groundwork for GE to pivot towards a “digital-industrial” strategy. By developing software that links together sensor data from a bevy of industrial machines, General Electric could then sell additional services like predictive maintenance to its industrial customers. As machines become smarter, they can run more efficiently, use less fuel and raise alerts before costly breakdowns occur. Smarter machines lead to longer lived equipment.

GE Digital was eventually merged with the company’s corporate IT department. In this manner, algorithms or other development work could be leveraged by multiple business units which helped minimize duplication of effort across many silos. Pre-GE Digital, business units were making choices based upon local conditions which resulted in an inefficient bundle of technologies and platforms.

The Centerpiece of General Electric’s Digital Strategy – Predix

The most ambitious offering from GE Digital’s strategy is the development of an open source industrial operating system named Predix. The company has poured significant investment capital (more than a billion dollars) into its creation. Predix was partially developed to head off competition from traditional tech players who have been considering forays into the industrial internet space. Just as fellow “old economy” blue chip Wal-Mart has had to revamp its digital strategy to compete with Amazon, General Electric realized that it needed to go on the offensive to head off potential competition from more mature tech savvy players such as IBM and Google. Additionally, Siemens, a more traditional industrial competitor, has a competing product known as MindSphere which is also angling for a piece of the industrial internet pie.

Predix was developed as a cloud based platform-as-a-service which enables asset performance management. The product aims to be the “Microsoft Windows” of the industrial internet, in that it functions as an operating system and enables third party application development. In a similar manner to Amazon expanding the market for cloud computing with its Web Services offering, GE is betting that there is a similar market opportunity for Predix in the industrial internet space to derive value from the cloud, data and analytics.

General Electric has also partnered with Apple to create a Predix software development kit for iOS. This move expands Predix’s potential reach to a popular mobile operating system which powers millions of iPhones and iPads [3].

“The basic idea is that G.E. and outside software developers will write programs to run on Predix. This software might, for instance, monitor the health and fine-tune the operation of equipment like oil-field rigs and wind-farm turbines, improving performance, reducing wear and adapting to changing environmental conditions. It amounts to software delivering the equivalent of personalized medicine for machines” [1].

One component of Predix involves creating a virtual “digital twin” of a highly complex industrial machine (e.g. aircraft engine, turbine, or locomotive engine). This digital twin is a real time virtual model which displays a host of performance metrics. The digital twin leverages machine learning to gain insights from simulations and other machines and then marries that information with human input. “Though digital twins are primarily lines of software code, the most elaborate versions look like 3-D computer-aided design drawings full of interactive charts, diagrams, and data points” [4].

The company predicts that it will have developed over one million digital twins by the end of 2017 [5].

GE Digital

Image courtesy of 2016 GE Annual Report

 

Considerations

GE has a built in competitive advantage in its attempt to become the dominant software player for the industrial internet. The company already has a sizable customer base that currently uses its industrial equipment. However, GE must generate enough of an outside developer following to make the Predix platform sustainable. The company hopes to generate up to 4 billion in Predix based revenue by 2020. In order to meet this goal, GE will need to cultivate an ecosystem of third party applications running on its Predix platform.

Non GE equipment typically forms the majority of machines in a company’s facilities. Therefore GE needs to incentivize other OEMs to write applications on Predix that can analyze data on their own equipment. There is additional value gained when customers can analyze data from GE’s sensors in concert with data from third party machinery. This combination of capabilities has the potential to provide customers a more holistic look at their asset ecosystem. MIT’s Sloan Review indicates that GE has encountered challenges with this step.

“..everyone loves the idea of benefiting from everyone else’s data, but is far less excited about sharing their own — a tragedy of the commons. The potential is there, but incentives are not well aligned” [6].

Consider that GE earns little from selling hardware and that a majority of its revenue comes from selling services [6]. The company is betting that Predix investments will profitably augment its current array of service offerings. Contractual services agreements for a large piece of industrial equipment can run up to 30 years. Meeting agreed upon machine up-time metrics results in bonus payouts to GE. However, there is a potential downside to longer running machines; namely reduced demand for additional GE machines.

Internally, Predix has allowed the company to garner significant productivity gains but those gains have been reinvested back into Predix and associated applications. Thus, these internal productivity gains (up to a billion) have not shown up on the company’s earnings [7].

Additionally, the company’s large sales force must learn to incorporate software services into their repertoire as opposed to solely pitching traditional hardware products. Chief information officers and chief technology officers will now have a seat at the buyers’ table along with the traditional operational heads and plant managers.

General Electric’s Additional Digital Investments

In addition to Predix, GE has made other digital investments and formed partnerships:

  • In 2015, GE launched a startup named “Current” which is focused on industrial scale smart lighting.
  • A new GE subsidiary named Avitas Systems will use “internet-connected drones and other robots to perform high-risk equipment inspections in industries like oil and gas” [7].
  • GE spent $1.4 billion to acquire two European 3D printing companies, Arcam AB from Sweden and SLM Solutions Group from Germany. The company has spent $1.5 billion on 3D printing investments since 2010, meaning the acquisitions will double what the company has invested in the last five years [8].
  • GE partners with Intel for sensor technology as well as Cisco for network hardware and Amazon Web Services for cloud delivery [9]. GE must be careful to ensure that a substantial portion of its digital offerings do not become non-proprietary.

Conclusion

Outgoing CEO Jeffrey Immelt’s transformation exploits have been criticized for sacrificing short term profit maximization for future earnings. Immelt doesn’t receive enough credit for running a leaner GE which enabled a massive investment in digital transformation. Immelt has stated that his initial goal was to hire a thousand software engineers to support the transformation [10]. This prodigious commitment to ramping up digital demonstrates that General Electric was anticipating disruptive innovation to negatively impact its business. History has taught us that industry incumbents have been caught off guard at best and rendered obsolete at worst by rapid changes in technology. Immelt forced the company to see the need for change as “existential”.

“Half measures are death for big companies, because people can smell lack of commitment. When you undertake a transformation, you should be prepared to go all the way to the end. You’ve got to be all in. You’ve got to be willing to plop down money and people. You won’t get there if you’re a wuss” [10]. – Jeffrey Immelt

It is a huge bet to shut down all of the company’s analytics based software ventures and redirect efforts to one platform. It takes guts to open up your software to competitors, enabling them to reap benefits. It is audacious to infuse senior personnel with new leadership from outside the company who specialize in a non core, forward looking discipline. As a result, there are now dedicated digital organizations and chief digital officers positioned inside each of GE’s businesses. Even the marketing organization is attracting individuals who can speak and sell digital.

However, bumps on the road to becoming a successful digital business should be expected. The desired financial gains from digital investments have not yet materialized for General Electric as indicated by share appreciation. Per Gartner:

“The IoT is emerging as a key enabler of our digital future, and global spending on IoT – including all hardware, software and services – will increase in the next five years. However, the path to capturing benefits will not be a straight line. It will have many twists and turns as companies pursue big plans, hit roadblocks, learn and adjust. Some will give up, while others will follow through and realize the transformational potential the IoT can have in helping them become a successful digital business” [11].

Jeffrey Immelt has moved on and new CEO John Flannery has signaled an intent to pick up and run with the digital baton. In my opinion, General Electric has more favorably positioned itself to compete and win in a challenging new environment that extends beyond physical engines and turbines.

References:

[1] Lohr, S. Aug, 27 2016. G.E., the 124-Year-Old Software Start-Up. New York Times. https://www.nytimes.com/2016/08/28/technology/ge-the-124-year-old-software-start-up.html

[2] Lohr, S. June 27, 2017. G.E. Results Show Next Chief’s Challenges at Revamped Company. New York Times. https://www.nytimes.com/2017/07/21/business/ge-john-flannery.html

[3] Bruno, G. October 18 2017. Apple, GE Announce Partnership to Develop Industrial IoT Apps. TheStreet. https://www.thestreet.com/story/14347334/1/apple-ge-announced-partnership-to-develop-industrial-iot-apps.html

[4] Woyke, E. June 27, 2017. General Electric Builds an AI Workforce. https://www.technologyreview.com/s/607962/general-electric-builds-an-ai-workforce/

[5] General Electric. 2016. Leading A Digital Industrial Era. 2016 Annual report

[6] Winig, L. February 18, 2016. GE’S BIG BET ON DATA AND ANALYTICS. Seeking opportunities in the Internet of Things, GE expands into industrial analytics. MIT Sloan Review. https://sloanreview.mit.edu/case-study/ge-big-bet-on-data-and-analytics/

[7] Scott, A. May 12, 2017. GE’s Immelt bets big on digital factories, shareholders are wary. Reuters. http://www.reuters.com/article/us-ge-factory-idUSKBN1880K4

[8] Geuss, M. September 6, 2016. GE buys two 3D printing companies at $1.4 billion. A Swedish and a German company join the fold to make industrial components. Ars Technica. https://arstechnica.com/information-technology/2016/09/general-electric-doubles-investment-in-3d-printing-with-1-4-billion-purchase/

[9] Iansiti, M & Lakhani, K. November, 2014. Digital Ubiquity: How Connections, Sensors, and Data Are Revolutionizing Business. Harvard Business Review. https://hbr.org/2014/11/digital-ubiquity-how-connections-sensors-and-data-are-revolutionizing-business

[10] Immelt, J. September 2017. How I Remade GE. Harvard Business Review. https://hbr.org/2017/09/inside-ges-transformation

[11] Laney, D. & Jain, A. June 20, 2017. 100 Data and Analytics Predictions Through 2021. Gartner.

Header image courtesy of 123rf.com

Use Separate Legends in Tableau (Distinct Column Coloring)

Here’s a handy tip for Tableau 10.2 and above. Learn to create an individual color legend for each measure in the view and then assign a different color palette to each column. This was very difficult to do prior to Tableau 10.2 but now you can apply different color palettes to individual columns with ease!

If you need to know how to apply conditional color formatting to dimensional values watch this video: How to Conditionally Format Text Cell Color in Tableau

If you’re interested in Business Intelligence & Tableau subscribe and check out all my videos either here on this site or on my Youtube channel.