Data Quality | Smoak Signals | Data Analytics Blog

In this tip I will provide an overview of a completely free data profiling tool (at least as of the time of this post) that is easy to use if you are in the Microsoft database stack. You can download the IDERA SQL Profiling tool and immediately put it to work to perform basic column profiling and analyses. The tool will display a summary of the data contained in a selected table and each of its columns.

Use this tool on the following systems:

Microsoft SQL Server: 2008 R2, 2012, 2014, 2016; 2017 Windows & Linux (provisional); Express, Standard, Enterprise editions
Microsoft Azure SQL Database
Amazon Relational Database Service (RDS)

IDERA SQL Data Profiler has some minor quirks but you can’t beat the price. Check out my review in the video above.

Just remember that data profiling should always be done initially before you start analyzing a new dataset or designing a new visualization. Always start with the basics.

If you find this type of instruction valuable make sure to subscribe to my Youtube channel.

This article is also posted on LinkedIn.

Data is the lifeblood of organizations. By now you’ve probably heard the comparative slogans “data is the new oil” or “data is the new water” or “data is the new currency”. A quick type of “data is the new” into the Google search bar and the first result delivered is “data is the new bacon”. I’m not sure how apt this slogan is except to say that both can be highly fulfilling.

With the exception of a well-known Seattle based retailer, most enterprises experience substantial data quality issues as data quality work is typically an exercise in “pass the buck”. An Information Week article shrewdly commented on the risks associated with the lack of data quality:

“There are two core risks: making decisions based on ‘information fantasy,’ and compliance. If you’re not representing the real world, you can be fined and your CFO can be imprisoned. It all comes down to that one point: If your systems don’t represent the real world, then how can you make accurate decisions?” [3]

The Handbook of Data Quality: Research and Practice has reported that 60% of enterprises suffer from data quality issues, 70% of manufacturing orders contained poor quality data and that poor data quality management costs companies roughly $1.4 billion every year [4].

Organizational Hamilton/Burr style face-offs occur in which IT and the business are at loggerheads over the state of data quality and ownership. The business typically believes that since data is co-mingled with the systems that IT already manages, IT should own any data issues. With the high costs of poor data quality I just cited, and the risks of nimble disrupters utilizing data more efficiently to attack incumbents’ market share, both IT and the business need to be on the same team with regard to data quality for the organization’s sake.

“The relationship between IT and the business is a source of tension in many organizations, especially in relation to data management. This tension often manifests itself in the definition of data quality, as well as the question of who is responsible for data quality.” [5]

Anecdotally, IT units do not have the desire to be held responsible for “mysterious” data and/or systems that they had no hand in standing up. In my opinion, the enterprise IT mindset is to make sure the data arrives into the consolidated Enterprise Data Warehouse or other centralized data repository; and if downstream users don’t raise concerns about data quality issues, all the better for IT. Garbage-In, Garbage-Out. If the checksums or record counts from source to target match, then it’s time to call it a day.

The developer or analyst related mindset is to immediately dive in and start building applications or reports with the potential to deliver sub-optimal results because the data was misunderstood or misinterpreted as the “golden copy”. Up-front data profiling isn’t in the equation.

Gartner has suggested that the rise of the Chief Data Officer (particularly in banking, government and insurance industries) has been beneficial towards helping both IT and the business with managing data [2]. The strategic usage of a CDO has the potential to free up the CIO and the enterprise IT organization so they can carry on with managing infrastructure and maintaining systems.

However, most experts will agree that the business needs to define what constitutes high-quality acceptable data and that the business should “own the data”. However, IT is typically the “owner” of the systems that house such data. Thus, a mutually beneficial organizational relationship would involve IT having a better understanding of data content so as to ensure a higher level of data quality [5].

From a working together perspective, I find this matrix from Allen & Cervo (2015) helpful in depicting the risks arising from one sided data profiling activities without business context and vice versa. It illustrates how both “business understanding and data profiling are necessary to minimize any risk of incorrect assumptions about the data’s fitness for use or about how the data serves the business” [1]. Although originally offered in a Master Data Management context, I find the example fitting in illustrating how business and IT expertise should work together.

From: Allen, M. & Cervo, D (2015) Multi-Domain Master Data Management: Advanced MDM and Data Governance in Practice. Morgan Kaufmann Publishers. Chapter 8 – Data Integration.

From the bottom left quadrant, low business knowledge and inadequate data profiling activities leaves the enterprise in a less than optimal position. This is not the quadrant an organization needs to languish within.
The top left quadrant illustrates that business context is high but “knowledge” is unsubstantiated because of the lack of understanding of data quality via profiling exercises. Cue quality guru W. Edwards Deming stating, “Without data, you’re just a person with an opinion.”
The bottom right quadrant illustrates the opposite problem where data profiling without business knowledge doesn’t yield enough context for meaningful analyses. Cue a “bizarro” W. Edwards Deming stating, “Without an opinion, you’re just a person with data.”
The “Goldilocks” quadrant in the upper right yields appropriate understanding of data quality and the necessary context in which to conduct meaningful analyses.

The more engaged that the business and IT are in finding common ground with respect to understanding existing data and solving data quality issues, the better positioned the organization is to avoid regulatory fines, departmental strife, threats from upstart firms and overall bad decision making. Refuse to become complacent in the data space and give data the attention it deserves, your organization’s survival just may depend upon it.

References:

[1] Allen, M. & Cervo, D (2015) Multi-Domain Master Data Management: Advanced MDM and Data Governance in Practice. Morgan Kaufmann Publishers. Chapter 8 – Data Integration.

[2] Gartner, (January 30, 2014). By 2015, 25 Percent of Large Global Organizations Will Have Appointed Chief Data Officers
http://www.gartner.com/newsroom/id/2659215

[3] Morgan, Lisa. (October 14, 2015). Information Week, 8 Ways To Ensure Data Quality. Information Week.
http://www.informationweek.com/big-data/big-data-analytics/8-ways-to-ensure-data-quality/d/d-id/1322239

[4] Sadiq, Shazia (Ed.) (2013). Handbook of Data Quality: Research and Practice: Cost and Value Management for Data Quality.

[5] Sebastian-Coleman, L. (2013). Measuring Data Quality for Ongoing Improvement: A Data Quality Assessment Framework. Morgan Kaufmann Publishers. Chapter 2 – Data, People, and Systems.

Smoak Signals | Data Analytics Blog

Sharing Knowledge: Data, Visualization & Business

Tag / Data Quality

Free SQL Data Profiling Tool: Idera SQL Data Profiler

Easily Unpivot Your Data in Excel Using Power Query

Perform Fuzzy Lookups / Approximate String Matching in Excel

The Data Quality Face-Off: Who Owns Data Quality, the Business or IT?

From: Allen, M. & Cervo, D (2015) Multi-Domain Master Data Management: Advanced MDM and Data Governance in Practice. Morgan Kaufmann Publishers. Chapter 8 – Data Integration.