Data Cleaning Verification: A Checklist
Correct the most common problems
Make sure you identified the most common problems and corrected them, including:
Sources of errors: Did you use the right tools and functions to find the source of the errors in your dataset?
Null data: Did you search for NULLs using conditional formatting and filters?
Misspelled words: Did you locate all misspellings?
Mistyped numbers: Did you double-check that your numeric data has been entered correctly?
Extra spaces and characters: Did you remove any extra spaces or characters using the TRIM function?
Duplicates: Did you remove duplicates in spreadsheets using the Remove Duplicates function or DISTINCT in SQL?
Mismatched data types: Did you check that numeric, date, and string data are typecast correctly?
Messy (inconsistent) strings: Did you make sure that all of your strings are consistent and meaningful?
Messy (inconsistent) date formats: Did you format the dates consistently throughout your dataset?
Misleading variable labels (columns): Did you name your columns meaningfully?
Truncated data: Did you check for truncated or missing data that needs correction?
Business Logic: Did you check that the data makes sense given your knowledge of the business?
Review the goal of your project
Once you have finished these data cleaning tasks, it is a good idea to review the goal of your project and confirm that your data is still aligned with that goal. This is a continuous process that you will do throughout your project-- but here are three steps you can keep in mind while thinking about this:
Confirm the business problem
Confirm the goal of the project
Verify that data can solve the problem and is aligned to the goal
Comments
Post a Comment