Are you a data professional looking to start a new data project?
Then you need to review my 10-point checklist to make sure you’re on the right track. Starting a new data project can be overwhelming. But don’t worry, with my 20 years of experience, I’m here to guide you through it.
Typically when I start to perform a new data related task or analysis for a project, I have to make sure that I meet the expected objectives, which often include identifying patterns, trends, and insights that can be used to drive business decisions.
10 Point Checklist
Point 1: First things first, you need to understand the nature of the deliverable that’s being asked for. Is it a new report, database table, column, data visualization, calculation, or a change to any of the above? In a similar fashion you also need to understand the technologies in play that you have to work with. This could be anything from Tableau, Power BI, SQL Server, Oracle, Teradata or even Microsoft Access (yes, people still use this tool).
Point 2: It’s crucial to know the desired delivery time frame for your project. You don’t want to end up with a longer timeline than what the project manager or client had in mind. Communication is key in this situation.
Point 3: Who is the intended audience for this deliverable? If it’s for an executive audience, you may need to roll the numbers up and take out some detail. If it’s for an analyst or operational audience, you may want to leave in more detail.
Point 4: How much historical data is required? What is the anticipated volume of data that your deliverable is going to generate? Don’t get caught in a situation where your solution can’t handle the trending analyses for a 2 year time frame when you only pulled data for the last 6 months.
Point 5: Understand the volume of data that your solution will generate. For example, a 5 million row output is not conducive to a 100% Excel approach. You will definitely be in the land of database analyses. However you may later present the data at an aggregated level (see point 3) via Excel but hopefully using a real data visualization tool .
Point 6: You need to understand if there’s any Personally Identifiable Information (PII) or sensitive data that you need to access in order to carry out the request. This could include social security numbers, passport numbers, driver’s license numbers, or credit card numbers.
Point 7: It’s important to understand the business processes behind the request. As data people, we tend to focus only on the data piece of the puzzle, but understanding more about the relevant business process can help you deliver the better results for your end users.
Point 8: Try to find and understand any relevant KPIs associated with the business processes on which your data project/task is affecting.
Point 9: Perform data profiling on your datasets! This can’t be stated enough. Profiling leads to understanding data quality issues and can help lead you to the source of the issues so they can be stopped.
Here are a few data profiling videos I’ve created over the years to give you a sense of data profiling in action.
- Create an SSIS Data Profiling Task in SQL Server
- Free Data Profiling Tool: IDERA SQL Profiler
- Data Profiling with Experian Pandora
Point 10: Understand how your solution will impact existing business process. By changing a column or calculation, how does this impact upstream or downstream processes? Keep your email inbox clean of those headache emails that are going to ask why the data looks different than it did last week. Most likely there was not a clear communication strategy to inform everyone of the impact of your changes.
Here are a few bonus considerations since you had the good fortune of reading this blog post and not just stopping at the video.
Bonus Point 1: Consider any external factors that could impact your data project. For example, changes in regulations can impact the data that you collect, analyze, and use. If the government imposes stricter regulations on data privacy (see point 6 above), you may need to change your data sources or analysis methods to comply with these regulations.
Bonus Point 2: Consider internal organizational politics when starting on a project. If you work in a toxic or siloed organization (it happens), access to data can be a challenge. For example, if the marketing department controls customer data, accessing that data for a sales analysis project may be challenging due to internal strife and/or unnecessary burdensome roadblocks.
Internal politics can also lead to potential conflicts of interest, such as when stakeholders have different goals or agendas. For example, if your data analyses could impact a department’s budget, that department may have an incentive to influence your work outcome to their advantage (or try to discredit you or your work by any means necessary).
Bonus Point 3: Finally, make sure to document everything. This includes the business requirements, technical requirements, saved emails, and any changes that were made along the way.
When I started my first office position as an intern at a well known Fortune 500 company, my mentor told me the first rule of corporate life was to C.Y.A. I’m sure you know what that means to cover. Having solid documentation of your work and an email trail for decisions made along the way can keep you out of hot water.
And there you have it, my 10-point checklist for starting a new data project. By following these steps, you’ll be well on your way to delivering high-quality results. Don’t forget to like and subscribe for more data-related content!
I appreciate everyone who has supported this blog and my YouTube channel via merch. Please check out the logo shop here.
Stay in contact with me through my various social media presences.
Keep doing great things with your data!
Anthony B. Smoak