Project

General

Profile

Actions

CompressPdf #473

open

Data Cleaning and Pre-processing for Machine Learning

Added by Zahid Hassan over 1 year ago. Updated over 1 year ago.

Status:
Complete
Priority:
High
Assignee:
Category:
poc
Target version:
Start date:
10/07/2024
Due date:
10/08/2024 (about 18 months late)
% Done:

100%

Estimated time:
16:00 h

Description

Description:

The collected data from the remote database needs to undergo proper cleaning and pre-processing to ensure optimal results for the machine learning model. This issue outlines the steps taken for data extraction, feature engineering, data standardization, skewness checks and correlation analysis.

Task List:

Data Extraction

  • Extract the data from the remote database.
  • Verify the completeness and consistency of the extracted data.

Data Cleaning

  • Handle missing values.
  • Remove or address outliers.
  • Ensure that all features are in the appropriate format for analysis.

Data Standardization

  • Standardize data to ensure all features are on a comparable scale.

Correlation Analysis

  • Analyze the correlation between input parameters and output size.
  • Visualize the correlation matrix for better understanding.

Feature Engineering

  • Create new features based on correlations of existing parameters.
  • Validate the significance of the newly created features.
  • Add features that improve the model’s predictive performance.
  • Ensure the dataset is ready for the next stage of the machine learning pipeline (e.g., model training, validation).
Actions #1

Updated by Redmine Admin over 1 year ago

  • Status changed from To Do to Complete
Actions

Also available in: Atom PDF