85 lines
2.3 KiB
Markdown
85 lines
2.3 KiB
Markdown
# ft_linear_regression
|
|
|
|
A simple linear regression project in Python for educational purposes. The goal is to predict the price of a car based on its mileage using linear regression, and to visualize the results and the model's confidence.
|
|
|
|
## Features
|
|
|
|
- Reads car mileage and price data from CSV datasets.
|
|
- Learns optimal parameters (`theta0`, `theta1`) using gradient descent.
|
|
- Normalizes input data for effective training.
|
|
- Exports learned parameters to a CSV file.
|
|
- Predicts car prices for any given mileage.
|
|
- Calculates R² (coefficient of determination) to indicate model confidence.
|
|
- Visualizes the regression line, cost function evolution, and confidence score.
|
|
|
|
## Project Structure
|
|
|
|
```
|
|
ft_linear_regression/
|
|
│
|
|
├── datasets/
|
|
│ ├── data.csv # Default dataset
|
|
│ ├── big_data.csv
|
|
│ ├── negative_data.csv
|
|
│ ├── nonlinear_data.csv
|
|
│ ├── perfectpositive_data.csv
|
|
│ ├── small_data.csv
|
|
│ └── variance_data.csv
|
|
│
|
|
├── model.py # Main file: trains, saves, and visualizes the model
|
|
├── estimate.py # Script to estimate a price given a mileage
|
|
├── confidence.py # Calculates R² confidence score
|
|
└── thetas.csv # Saved learned parameters after running model.py
|
|
```
|
|
|
|
## Usage
|
|
|
|
### 1. Train the Model
|
|
|
|
Simply run:
|
|
|
|
```bash
|
|
python model.py
|
|
```
|
|
|
|
- Trains the linear regression model on `datasets/data.csv`.
|
|
- Saves the learned parameters (`thetas.csv`).
|
|
- Displays a plot with the data, regression line, confidence score (R²), and cost vs epochs.
|
|
|
|
### 2. Estimate Car Price
|
|
|
|
After training, estimate a car price by mileage:
|
|
|
|
```bash
|
|
python estimate.py
|
|
```
|
|
|
|
- Enter the mileage when prompted.
|
|
- Outputs the predicted price.
|
|
|
|
### 3. Confidence Score
|
|
|
|
The R² (confidence) score is printed on the regression plot, indicating how well the model fits the data.
|
|
|
|
## Custom Datasets
|
|
|
|
You can swap out the dataset by changing the `data_path` variable in `model.py` or by providing your own CSV file in the `datasets/` folder. The CSV must have columns: `km,price`.
|
|
|
|
## Requirements
|
|
|
|
- Python 3.x
|
|
- pandas
|
|
- numpy
|
|
- matplotlib
|
|
|
|
Install dependencies with:
|
|
|
|
```bash
|
|
pip install pandas numpy matplotlib
|
|
```
|
|
|
|
## Notes
|
|
|
|
- The model uses simple linear regression (1 feature: km).
|
|
- Data is normalized for stable and faster learning.
|