In this tip I will provide an overview of a completely free data profiling tool (at least as of the time of this post) that is easy to use if you are in the Microsoft database stack. You can download the IDERA SQL Profiling tool and immediately put it to work to perform basic column profiling and analyses. The tool will display a summary of the data contained in a selected table and each of its columns.
Use this tool on the following systems:
Microsoft SQL Server: 2008 R2, 2012, 2014, 2016; 2017 Windows & Linux (provisional); Express, Standard, Enterprise editions
Microsoft Azure SQL Database
Amazon Relational Database Service (RDS)
IDERA SQL Data Profiler has some minor quirks but you can’t beat the price. Check out my review in the video above.
Just remember that data profiling should always be done initially before you start analyzing a new dataset or designing a new visualization. Always start with the basics.
If you find this type of instruction valuable make sure to subscribe to my Youtube channel.
Experian Pandora is a robust data profiling and data quality tool that enables users to quickly obtain useful statistics related to their data. Once you load your data into the interface, you can identify statistical anomalies and outliers in the data within seconds. To gain these types of insights, normally I have to write SQL scripts or use the SSIS Data Profiling Tool against SQL Server data. Experian Pandora is much easier to use against data in .csv files or Excel spreadsheets since you can simply drag and drop those items into the interface.
A lack of data profiling leads to poor data quality which leads to inaccurate information and poor business performance. I believe you will find this tool a worthy add to your data toolbox.
Most likely you have a love/hate attitude towards spreadsheets. This highly useful tip will make you fall back in love with Excel. Microsoft does a great job at providing a solid add-in that makes fuzzy lookups relatively easy to perform.
Are there more powerful approximate string matching tools out there? Of course. But if you’re using Excel, this tool should be used before applying more complicated methods.
Everyone loves visualizations but sometimes you have to roll up your sleeves and clean up the data.
Data Profiling is necessary when trying to gain an understanding of a given data set. A data profiling assessment should begin before any reporting or application development work begins. My video will demonstrate how to create a basic SSIS Data Profiling Task using SQL Server Data Tools.
According to the DAMA Guide to the Data Management Body of Knowledge:
“Before making any improvements to data, one must be able to distinguish between good and bad data…. A data analyst may not necessarily be able to pinpoint all instances of flawed data. However, the ability to document situations where data values look like they do not belong provides a means to communicate these instances with subject matter experts, whose business knowledge can confirm the existences of data problems.”