FOUNT Courselets

Courselet Title: Data Cleaning

Author: David Koop
Contact Email: dakoop@niu.edu

Description:

This courselet provides information on using pandas to do data cleaning. Data cleaning is the process of identifying and correcting errors, inconsistencies, and inaccuracies in datasets. It is an important step in the data preparation process before analysis or modeling can be performed. It is a time-consuming process, especially for large datasets. However, the benefits of high-quality data are significant and can improve decision-making, increase efficiency, and reduce costs. Conversely, poor data quality can lead to inaccurate results, incorrect conclusions, and costly mistakes. Data cleaning involves various techniques such as handling missing values, removing duplicates, converting data types, and checking for consistency. These techniques help to ensure that the data is accurate, reliable, and consistent for analysis. This courselet uses a modified version of the Kaggle Netflix dataset.

Link to Artifact: https://www.chameleoncloud.org/experiment/share/2dcbb773-a2a6-4275-a421-de878e68624d

Apply for a FOUNT badge to add your courselet to the table!