This content originally appeared on DEV Community and was authored by Zaryab Ahmad
Project Overview
Built a machine learning model to predict car prices using vehicle specifications and features.
Data & Preprocessing
- 205 cars with 16 features (engine specs, dimensions, fuel type, etc.)
- Encoded categorical variables using Label Encoding
- Scaled features with StandardScaler for better model performance
Models Compared
Model | Training Score | Testing Score |
---|---|---|
Linear Regression | 84.5% | 79.4% |
Decision Tree | 91.7% | 85.3% |
SVR | -10.8% | -9.9% |
Random Forest | 98.2% | 95.6% |
Results
- Random Forest performed best with 95.6% accuracy
- Mean Absolute Error: $1,313
- Model can predict prices for new car specifications
Key Insights
- Ensemble methods (Random Forest) handle complex patterns better
- Engine specs and dimensions are major price factors
- Proper data preprocessing is crucial for success
- Some models (like SVR) may not suit all dataset types
Takeaway
Random Forest proved ideal for this regression problem, balancing accuracy and robustness while handling the complex relationships in car pricing data.
ai #python #datascience #machinelearning
This content originally appeared on DEV Community and was authored by Zaryab Ahmad