Data Cleaning Verification: A Checklist

 


Correct the most common problems

Make sure you identified the most common problems and corrected them, including:

  • Sources of errors: Did you use the right tools and functions to find the source of the errors in your dataset?

  • Null data: Did you search for NULLs using conditional formatting and filters?

  • Misspelled words: Did you locate all misspellings?

  • Mistyped numbers: Did you double-check that your numeric data has been entered correctly?

  • Extra spaces and characters: Did you remove any extra spaces or characters using the TRIM function?

  • Duplicates: Did you remove duplicates in spreadsheets using the Remove Duplicates function or DISTINCT in SQL?

  • Mismatched data types: Did you check that numeric, date, and string data are typecast correctly?

  • Messy (inconsistent) strings: Did you make sure that all of your strings are consistent and meaningful?

  • Messy (inconsistent) date formats: Did you format the dates consistently throughout your dataset?

  • Misleading variable labels (columns): Did you name your columns meaningfully?

  • Truncated data: Did you check for truncated or missing data that needs correction?

  • Business Logic: Did you check that the data makes sense given your knowledge of the business? 


    Review the goal of your project

    Once you have finished these data cleaning tasks, it is a good idea to review the goal of your project and confirm that your data is still aligned with that goal. This is a continuous process that you will do throughout your project-- but here are three steps you can keep in mind while thinking about this: 

    • Confirm the business problem 

    • Confirm the goal of the project

    • Verify that data can solve the problem and is aligned to the goal

Comments

Most Popular