10 Items to Know When Starting A New Data Project

Are you a data professional looking to start a new data project?

Then you need to review my 10-point checklist to make sure you’re on the right track. Starting a new data project can be overwhelming. But don’t worry, with my 20 years of experience, I’m here to guide you through it.

Typically when I start to perform a new data related task or analysis for a project, I have to make sure that I meet the expected objectives, which often include identifying patterns, trends, and insights that can be used to drive business decisions.

10 Point Checklist

Point 1: First things first, you need to understand the nature of the deliverable that’s being asked for. Is it a new report, database table, column, data visualization, calculation, or a change to any of the above? In a similar fashion you also need to understand the technologies in play that you have to work with. This could be anything from Tableau, Power BI, SQL Server, Oracle, Teradata or even Microsoft Access (yes, people still use this tool).

Point 2: It’s crucial to know the desired delivery time frame for your project. You don’t want to end up with a longer timeline than what the project manager or client had in mind. Communication is key in this situation.

Point 3: Who is the intended audience for this deliverable? If it’s for an executive audience, you may need to roll the numbers up and take out some detail. If it’s for an analyst or operational audience, you may want to leave in more detail.

Point 4: How much historical data is required? What is the anticipated volume of data that your deliverable is going to generate? Don’t get caught in a situation where your solution can’t handle the trending analyses for a 2 year time frame when you only pulled data for the last 6 months.

Point 5: Understand the volume of data that your solution will generate. For example, a 5 million row output is not conducive to a 100% Excel approach. You will definitely be in the land of database analyses. However you may later present the data at an aggregated level (see point 3) via Excel but hopefully using a real data visualization tool .

Point 6: You need to understand if there’s any Personally Identifiable Information (PII) or sensitive data that you need to access in order to carry out the request. This could include social security numbers, passport numbers, driver’s license numbers, or credit card numbers.

Point 7: It’s important to understand the business processes behind the request. As data people, we tend to focus only on the data piece of the puzzle, but understanding more about the relevant business process can help you deliver the better results for your end users.

Point 8: Try to find and understand any relevant KPIs associated with the business processes on which your data project/task is affecting.

Point 9: Perform data profiling on your datasets! This can’t be stated enough. Profiling leads to understanding data quality issues and can help lead you to the source of the issues so they can be stopped.

Here are a few data profiling videos I’ve created over the years to give you a sense of data profiling in action.

Point 10: Understand how your solution will impact existing business process. By changing a column or calculation, how does this impact upstream or downstream processes? Keep your email inbox clean of those headache emails that are going to ask why the data looks different than it did last week. Most likely there was not a clear communication strategy to inform everyone of the impact of your changes.

Bonus Considerations:

Here are a few bonus considerations since you had the good fortune of reading this blog post and not just stopping at the video.

Bonus Point 1: Consider any external factors that could impact your data project. For example, changes in regulations can impact the data that you collect, analyze, and use. If the government imposes stricter regulations on data privacy (see point 6 above), you may need to change your data sources or analysis methods to comply with these regulations.

Bonus Point 2: Consider internal organizational politics when starting on a project. If you work in a toxic or siloed organization (it happens), access to data can be a challenge. For example, if the marketing department controls customer data, accessing that data for a sales analysis project may be challenging due to internal strife and/or unnecessary burdensome roadblocks.

Internal politics can also lead to potential conflicts of interest, such as when stakeholders have different goals or agendas. For example, if your data analyses could impact a department’s budget, that department may have an incentive to influence your work outcome to their advantage (or try to discredit you or your work by any means necessary).

Bonus Point 3: Finally, make sure to document everything. This includes the business requirements, technical requirements, saved emails, and any changes that were made along the way.

When I started my first office position as an intern at a well known Fortune 500 company, my mentor told me the first rule of corporate life was to C.Y.A. I’m sure you know what that means to cover. Having solid documentation of your work and an email trail for decisions made along the way can keep you out of hot water.

Conclusion

And there you have it, my 10-point checklist for starting a new data project. By following these steps, you’ll be well on your way to delivering high-quality results. Don’t forget to like and subscribe for more data-related content!

I appreciate everyone who has supported this blog and my YouTube channel via merch. Please check out the logo shop here.

Stay in contact with me through my various social media presences.

Keep doing great things with your data!

Anthony B. Smoak

Advertisement

How I Passed the Tableau Certified Data Analyst Exam

I’m proud to announce that I recently passed the Tableau Certified Data Analyst certification. If you found this article, most likely you are looking for a perspective on the exam and how to pass and earn this certification yourself. Here is the story of my journey, which may differ from the typical experience.

I had a New Year’s resolution to add the Tableau Certified Data Analyst certification to my resume because the Tableau Desktop Certified Associate certification I held was due to expire.  If you want to read up on how I passed that older exam, you can find my insights here. Some of those insights will also serve you well on passing the current exam.

I believe that certification has its advantages. It’s a way to signal to potential and current employers that you have some defined level of competency in a targeted skill. It’s also a means to strengthen the case to your employer that you deserve additional compensation (if you are under-compensated). Fortunately, I am compensated fairly now, but this has not always been the case (shout-out to highly competent middle office IT pros toiling away underappreciated, but I digress). Finally, studying allows you to stay up-to-date on the latest tools and trends in your chosen domain.

How Much Experience Do You Need?

The official exam guide states, “The best preparation is role experience and time with the product. To be prepared, candidates are strongly encouraged to have at least 6 months of experience.” I would tend to agree with this if you have used the tool extensively during this time frame. Otherwise, I would recommend at least 1 to 2 years experience with the tool and as a data analyst before attempting to sit for this one. Focus on obtaining the Tableau Specialist certification (it never expires) first before attempting this exam.

Why Did I Get Certified?

For my purposes as a senior manager in a consulting practice, certification certainly has benefits with respect to establishing credibility quickly on new projects. I may hold a manager title but you’ll never pry my hands away from keyboard-centric hands-on technical work, as I enjoy being a technical subject matter expert (and teaching/mentoring others).

Other than employment and signaling purposes, an additional benefit of certification is the personal growth and esteem benefits that you gain from tackling a goal. My body of work is visible online and I have years of relevant experience, thus certification is not something I necessarily needed but something I desired.

The main difference between the new Certified Data Analyst exam and the older Desktop Certified Associate exam is that you will now be tested on Tableau Prep, Tableau Server and Tableau Online. Having to understand aspects of Server and Online were initial concerns I held before taking this test.

I have about 7 years of experience between Tableau Public & Desktop and about a year of experience with Tableau Prep so that was not an issue. I have used Tableau Server to publish my dashboards while on a project at a large Fortune 500 company, but I would by no means consider myself a server expert. I’ve used Prep to transform data for clients without issue as it is easy to pick up with exposure and usage. Look at this listing of domain items covered on the exam.

My strategy to compensate for a lack of deep hands on experience in Domain 4 was to perform really well on all the other domains. Using this strategy, I could still potentially score 91% max (assuming I miss every Domain 4 question which would be highly improbable). If you are like me and have deep knowledge of Tableau Desktop, then you should be fine. Do not use a lack of server experience as an excuse to avoid certification. Simply read up on publishing content at these links and you should have a fighting chance. Personally, I found the Certified Data Analyst exam to be somewhat easier than the Desktop Certified Associate exam. Not easy, just a little bit easier with respect to the Tableau Desktop asks.

This Tableau Prep link could prove useful as well:

Another difference between the Certified Data Analyst exam and the older Desktop Certified Associate exam is the presence of a hands-on lab portion. I honestly found this to be the easiest section on the test, although your mileage may vary. There was one question that had me stumped only because I wasn’t sure what was being asked so I built a visual that probably did not reflect the ask. Other than that 1 question, I felt that I nailed this section.

The official exam guide states, “Candidates are encouraged to comment on items in the exam. Feedback from all comments is considered when item performance is reviewed prior to the release of new versions of exam content.” In hindsight, I should have left a comment on the question stating “unclear”.

For the hands on lab (I’m not sharing anything that isn’t already on the exam guide), definitely be familiar with filter and highlight actions, how to use a TOP N filter, how to use parameters with filters, labels, and how to add reference lines and perform custom sorting.

How Did I Prepare?

Honestly, I meant to prepare for at least a week beforehand, but life got in the way. Thus, I literally crammed my review into the span of 7 hours the Saturday before sitting the exam. I do not recommend this if you are not well versed in the tool. I simply needed to review some concepts. The listing at this website provides great links to official Tableau documentation for the subject areas covered on the exam.

Results

I completed the exam with about 35 minutes to spare. After I submitted my results online, I only had to wait an hour before I received an email stating that I had a score available. This is in stark contrast to when the beta exam was in effect. I heard that results would take months to process. I cleared the 75% hurdle despite studying for only a few hours and not having deep experience with Tableau server. I could have easily scored higher given more study time, but I was happy to pass the exam given the meager study time I allotted to the task. I’m not saying that the test was easy, I’m just fortunate that I’ve had enough experience with Desktop that I could “sacrifice” in other areas and still make it across the finish line. This strategy may not work for you if you have under a year’s experience with the tool.

Focus on These Subject Areas:

However, here is the section you came for, this is my abridged list of test focus areas. Make sure to focus on these subject areas to give yourself a good shot at passing the exam.

Start here: Here are 5 useful videos from my catalog that you should review to level up for the exam. I promise they are worth your time and will help you prepare. Do me a favor and like the videos to help others find the content as well!

I used this link to acquire access to a free practice exam: https://savvy-data-science.ck.page/1ec0f2d5a8

Additionally focus on these areas from the exam study guide:

  • INDEX function
  • Parameters
  • TOP N Filter
  • Context Filters / Data Source Filters
  • DENSERANK
  • Exporting Options
  • Sets
  • Extracts
  • DATETRUNC, DATEPART, DATENAME
  • Map Density
  • Percent Difference
  • Know How to Interpret a Box-Plot
  • Know How to Build Dual Axis Charts
  • Understand FIXED LODs
  • Understand TOTAL vs SUM
  • Understand Hierarchies
  • Understand Show Hide Container Functionality
  • Design for Mobile Layouts
  • Blending Data
  • Know How to Add Totals to Charts
  • SPLIT Function
  • Row Level Shading

Also follow Jared Flores as he has a great YouTube channel focused on Tableau Prep.

Best of luck to you. I know that you can pass this test if you have decent hands on experience with the tool. For those of you without a Tableau license, use Tableau Public to study and fill in gaps by reading blogs, watching videos and using Tableau official documentation. I believe in you!

Need Personal Data Tutoring?

Are you a beginner that needs help understanding data topics in Tableau (or Excel/SQL) and would like someone with experience to discuss your problem? If so, contact me here to schedule a 1 on 1 virtual meetup. Make sure to describe the concept that you are trying to learn in the message so I can understand if I can help. Depending upon your ask and time required we can discuss cost. Access to Tableau Public will cover most of your study needs regarding the Tableau Desktop sections and lucky for you, that is a FREE tool.

About Me (Data background):

  • Experience: 15 Years Industry + 8 Years Analytics Consulting
  • Tableau Certified Data Analyst
  • 2X Tableau Ambassador
  • MBA – Georgia Institute of Technology
  • M.S. Information Management – Syracuse University
  • B.S. Computer Science – Clark Atlanta University
  • Certified Business Intelligence Professional
  • YouTube 2.5 Million Views on my Analytics Channel

Image :@anthonysmoakdata (Instagram)

All views and opinions are solely my own and do not necessarily reflect those of my employer

I appreciate everyone who has supported this blog and my YouTube channel via merch. Please check out the logo shop here.

Thank you!!

Anthony B Smoak

How to Become a Data Analyst

I’ve been working with data for some 20 plus years as of the writing of this post. In the video below I captured my thoughts on the required hard and soft skills it takes to succeed as a data analyst. If you are looking to start your career in data as someone who has not yet graduated or as someone with tangential work experience, then this video will serve you well.

Do You Need a Computer Science Degree to be a Data Analyst?

This question is frequently asked by people such as yourself looking to make a move into data. The answer is no. You do not need a computer science degree to have a very successful data career. In the video I give my thoughts on computer science, but the reality is that although it may be helpful from a “getting a first job” perspective, it is not a requirement to succeed. Although I have an undergraduate computer science degree from Clark Atlanta University (shout-out to HBCU alums), some of the brightest minds I’ve worked with in the data space do not have a computer science degree. Bottom line; a formal computer science degree certainly helps but it is by no means necessary. All you need is the willingness to learn the tools and the perseverance to get your first data opportunity.

Do You Need a Computer Science Degree for a Data Career?

Hard Skills Required (View Video)

I’ll give you a hint, data visualization skills are a must and Tableau is the tool of choice for me.

Soft Skills Required

I’ll keep it short here and simply state that you should always look for ways to differentiate yourself and not just be seen as an interchangeable commodity worker. To paraphrase famed Harvard professor Michael Porter, a differentiation strategy advocates that a business must offer products or services that are valuable and unique to buyers above and beyond a low price. In this metaphor, think of yourself as a business and you bring multiple skill sets to your employer (other than being a single focus technical employee who can be easily outsourced for a lower price).

To be a differentiator, do not think of yourself as just being a tool specific analyst. Learn how to take requirements, communicate well, develop exceptional writing skills for business emails and documentation. Finally, learn how to present your analyses to people several pay grades above yourself when required. You want differentiation to be your competitive advantage. You do not want “low cost” to be your advantage for obvious reasons (if you’re like me, you want to be paid fairly for the value you provide).

Future Career Paths

In our jobs we desire mastery, autonomy and purpose. After a certain point in your career you may want to take a leap from the descriptive analytics path and move towards a predictive analytics path. Descriptive analytics (think data analyst or traditional business intelligence reporting analyst) deal with what has happened in the past while predictive analytics focus on what will most likely happen in the future. In order to level up in predictive analytics, you will need python, statistics, probability, and/or machine learning skills.

If you want to make the leap from data into management, you can consider obtaining an MBA or a masters degree in Management Information Systems. I happen to have an MBA from the Georgia Institute of Technology and a masters degree in Information Management from Syracuse. This may seem like a bit of overkill but I work in consulting where credentials are overly appreciated by clients (and I am a lifelong learner).

Interact with my Tableau resume here.

Conclusion

A career in data can be fun (in the early learning phases) and lucrative (mid to late career). In my case it has been a fulfilling career ever since I started work as a data analyst at General Motors many years ago. I turned myself from a commodity to a differentiator by not only learning the basics but also adding business understanding and a willingness to share what I know on this blog and my YouTube channel. I know that you can do the same. If you put in the time to learn along with the perseverance to land that first data role, you won’t need much luck at all to accomplish your goals.

Looking to land that first role or trying to move ahead in your current role? Then check out this post for the Keys for a Successful Career as a Data Analyst.

-Anthony Smoak

All views and opinions are solely my own and do not necessarily reflect those of my employer

I appreciate everyone who has supported this blog and my YouTube channel via merch. Please click here

Thank you!!

The Data Quality Face-Off: Who Owns Data Quality, the Business or IT?

This article is also posted on LinkedIn.

Data is the lifeblood of organizations. By now you’ve probably heard the comparative slogans “data is the new oil” or “data is the new water” or “data is the new currency”. A quick type of “data is the new” into the Google search bar and the first result delivered is “data is the new bacon”. I’m not sure how apt this slogan is except to say that both can be highly fulfilling.

With the exception of a well-known Seattle based retailer, most enterprises experience substantial data quality issues as data quality work is typically an exercise in “pass the buck”. An Information Week article shrewdly commented on the risks associated with the lack of data quality:

“There are two core risks: making decisions based on ‘information fantasy,’ and compliance. If you’re not representing the real world, you can be fined and your CFO can be imprisoned. It all comes down to that one point: If your systems don’t represent the real world, then how can you make accurate decisions?” [3]

The Handbook of Data Quality: Research and Practice has reported that 60% of enterprises suffer from data quality issues, 70% of manufacturing orders contained poor quality data and that poor data quality management costs companies roughly $1.4 billion every year [4].

Organizational Hamilton/Burr style face-offs occur in which IT and the business are at loggerheads over the state of data quality and ownership. The business typically believes that since data is co-mingled with the systems that IT already manages, IT should own any data issues. With the high costs of poor data quality I just cited, and the risks of nimble disrupters utilizing data more efficiently to attack incumbents’ market share, both IT and the business need to be on the same team with regard to data quality for the organization’s sake.

“The relationship between IT and the business is a source of tension in many organizations, especially in relation to data management. This tension often manifests itself in the definition of data quality, as well as the question of who is responsible for data quality.” [5]

Anecdotally, IT units do not have the desire to be held responsible for “mysterious” data and/or systems that they had no hand in standing up. In my opinion, the enterprise IT mindset is to make sure the data arrives into the consolidated Enterprise Data Warehouse or other centralized data repository; and if downstream users don’t raise concerns about data quality issues, all the better for IT. Garbage-In, Garbage-Out. If the checksums or record counts from source to target match, then it’s time to call it a day.

The developer or analyst related mindset is to immediately dive in and start building applications or reports with the potential to deliver sub-optimal results because the data was misunderstood or misinterpreted as the “golden copy”. Up-front data profiling isn’t in the equation.

Gartner has suggested that the rise of the Chief Data Officer (particularly in banking, government and insurance industries) has been beneficial towards helping both IT and the business with managing data [2]. The strategic usage of a CDO has the potential to free up the CIO and the enterprise IT organization so they can carry on with managing infrastructure and maintaining systems.

However, most experts will agree that the business needs to define what constitutes high-quality acceptable data and that the business should “own the data”. However, IT is typically the “owner” of the systems that house such data. Thus, a mutually beneficial organizational relationship would involve IT having a better understanding of data content so as to ensure a higher level of data quality [5].

From a working together perspective, I find this matrix from Allen & Cervo (2015) helpful in depicting the risks arising from one sided data profiling activities without business context and vice versa. It illustrates how both “business understanding and data profiling are necessary to minimize any risk of incorrect assumptions about the data’s fitness for use or about how the data serves the business” [1]. Although originally offered in a Master Data Management context, I find the example fitting in illustrating how business and IT expertise should work together.

picture1From: Allen, M. & Cervo, D (2015) Multi-Domain Master Data Management: Advanced MDM and Data Governance in Practice. Morgan Kaufmann Publishers. Chapter 8 – Data Integration.
  • From the bottom left quadrant, low business knowledge and inadequate data profiling activities leaves the enterprise in a less than optimal position. This is not the quadrant an organization needs to languish within.
  • The top left quadrant illustrates that business context is high but “knowledge” is unsubstantiated because of the lack of understanding of data quality via profiling exercises. Cue quality guru W. Edwards Deming stating, “Without data, you’re just a person with an opinion.”
  • The bottom right quadrant illustrates the opposite problem where data profiling without business knowledge doesn’t yield enough context for meaningful analyses. Cue a “bizarro” W. Edwards Deming stating, “Without an opinion, you’re just a person with data.”
  • The “Goldilocks” quadrant in the upper right yields appropriate understanding of data quality and the necessary context in which to conduct meaningful analyses.

The more engaged that the business and IT are in finding common ground with respect to understanding existing data and solving data quality issues, the better positioned the organization is to avoid regulatory fines, departmental strife, threats from upstart firms and overall bad decision making. Refuse to become complacent in the data space and give data the attention it deserves, your organization’s survival just may depend upon it.

References:

[1] Allen, M. & Cervo, D (2015) Multi-Domain Master Data Management: Advanced MDM and Data Governance in Practice. Morgan Kaufmann Publishers. Chapter 8 – Data Integration.

[2] Gartner, (January 30, 2014). By 2015, 25 Percent of Large Global Organizations Will Have Appointed Chief Data Officers
http://www.gartner.com/newsroom/id/2659215

[3] Morgan, Lisa. (October 14, 2015). Information Week, 8 Ways To Ensure Data Quality. Information Week.
http://www.informationweek.com/big-data/big-data-analytics/8-ways-to-ensure-data-quality/d/d-id/1322239

[4] Sadiq, Shazia (Ed.) (2013). Handbook of Data Quality: Research and Practice: Cost and Value Management for Data Quality.

[5] Sebastian-Coleman, L. (2013). Measuring Data Quality for Ongoing Improvement: A Data Quality Assessment Framework. Morgan Kaufmann Publishers. Chapter 2 – Data, People, and Systems.

Picture Copyright: wavebreakmediamicro / 123RF Stock Photo

The IT Department Needs To Market Its Value or Suffer the Consequences

This article is also published on LinkedIn.

By now it’s an all too common cliché that the IT department does not garner the respect it deserves from its counterpart areas of the business. This perceived respect deficiency can manifest itself in the lack of upfront involvement in business strategy (we’ll call you when it breaks), unreasonable timelines (do it yesterday), rampant budget cuts and layoffs (do more with less) and/or limited technical promotional tracks (promotions are for business areas only).

IT pros tend to believe that if they’re adding value, delivering difficult solutions within reasonable timeframes and providing it all in a cost efficient manner, the recognition and gratitude will follow. Typical IT and knowledge worker responsibilities fall under the high level categories of “keep things running” (you’re doing a great job so we don’t notice) or “attend to our technical emergency” (drop what you’re doing).

It’s fair to say that there is a perception gap between the true value and the perceived value of what IT brings to the table. Anecdotally, there certainly seems to be a disconnect between the perceived lack of difficulty in business asks and the actual difficulty in delivering solutions. This perception gap can occur not only between IT and the “business” but also between the non-technical IT manager and the technical rank and file.

In this era of automation, outsourcing and job instability, there is an element of danger in one’s contributions going unnoticed, underappreciated and/or misunderstood. Within IT, leaders and the rank and file need to overcome their stereotypical introverted nature and do a better job of internally marketing their value to the organization. IT rank and file need to better market their value to their managers, and in turn the IT department collectively needs to better market its value to other areas of the business.

Perception matters, but IT must deliver the goods as well. If the business misperceives the actual work that the IT department provides and equates it to commoditized functions such as “fix the printers” or “print the reports” then morale dips and the IT department can expect to compete with external third parties (vendors, consulting firms, outsourcing outfits) who do a much better job of finding the ear of influential higher–ups and convincing these decision-makers of their value.

I once worked on an extremely complex report automation initiative that required assistance from ETL developers, architects, report developers and business unit team members. The purpose was to gather information from disparate source systems, perform complex ETL on the data then and store it in a data-mart for downstream reporting. Ultimately the project successfully met its objective of automating several reports which in-turn saved the business a week’s worth of manual excel report creations. After phase 1 completion, the thanks I received was genuine gratitude from the business analyst whose job I made easier. The other thanks I received was “where’s phase 2, this shouldn’t be that hard” from the business manager whose technology knowledge was limited to cutting and pasting into excel.

Ideally my team should have better marketed the value and helped the business partner understand the appropriate timeliness (given the extreme complexity) of this win, instead of just being glad to move forward after solving a difficult problem for the business.

I believe Dan Roberts accurately paraphrases the knowledge worker’s stance in his book Marketing IT’s Value.

“’What does marketing have to do with IT? Why do I need to change my image? I’m already a good developer!’ Because marketing is simply not in IT’s comfort zone, they revert to what is more natural for them, which is doing their technical job and leaving the job of marketing to someone else, which reinforces the image that ‘IT is just a bunch of techies.’”

The IT department needs to promote better awareness of its value before the department is shut out of strategic planning meetings, the department budget gets cut, project timelines start to shrink and morale starts to dip. IT workers need to promote the value they bring to the table by touting their wins and remaining up to date in education, training and certifications as necessary. At-will employment works both ways, if the technical staff feels stagnant, undervalued and underappreciated, there is always a better situation around the corner; especially considering the technical skills shortage in today’s marketplace.

“It’s not about hype and empty promises; it’s about creating an awareness of IT’s value. It’s about changing client perceptions by presenting a clear, consistent message about the value of IT. After all, if you don’t market yourself, someone else will, and you might not like the image you end up with [1]”

References:

[1] Colisto, Nicholas R.. ( © 2012). The CIO Playbook: Strategies and Best Practices for IT Leaders to Deliver Value.

[2] Roberts, Dan. ( © 2014). Unleashing the Power of IT: Bringing People, Business, and Technology Together, second edition.