Actions
CompressPdf #473
openData Cleaning and Pre-processing for Machine Learning
Start date:
10/07/2024
Due date:
10/08/2024 (about 18 months late)
% Done:
100%
Estimated time:
16:00 h
Description
Description:¶
The collected data from the remote database needs to undergo proper cleaning and pre-processing to ensure optimal results for the machine learning model. This issue outlines the steps taken for data extraction, feature engineering, data standardization, skewness checks and correlation analysis.
Task List:¶
Data Extraction
- Extract the data from the remote database.
- Verify the completeness and consistency of the extracted data.
Data Cleaning
- Handle missing values.
- Remove or address outliers.
- Ensure that all features are in the appropriate format for analysis.
Data Standardization
- Standardize data to ensure all features are on a comparable scale.
Correlation Analysis
- Analyze the correlation between input parameters and output size.
- Visualize the correlation matrix for better understanding.
Feature Engineering
- Create new features based on correlations of existing parameters.
- Validate the significance of the newly created features.
- Add features that improve the model’s predictive performance.
- Ensure the dataset is ready for the next stage of the machine learning pipeline (e.g., model training, validation).
Actions