Banglore House Price Prediction (Deploying Realtime) and Key Learnings of the project

 

Building a Web App for House Price Prediction: Deploying the Machine Learning Model

In our previous blog, we explored Exploratory Data Analysis (EDA) and built a predictive model to estimate house prices. After testing multiple regression models, XGBoost Regressor proved to be the most accurate, and we saved the trained model using pickle for future use. Now, it's time to take the next step—deploying our model as a web application!

A web app allows users to interact with the model in real-time, making predictions based on their input. In this blog, we will:

  • Set up a web framework using Streamlit to build an interactive UI.
  • Load the saved model and use it for predictions.
  • Create a user-friendly interface where users can enter property details and get instant price estimates.
  • Deploy the web app so it can be accessed by anyone online.

Step-by-Step Implementation

1. Import Required Libraries

First, we import the necessary libraries:

import streamlit as st
import pandas as pd
import pickle
import numpy as np
import sklearn
import xgboost

2. Load the Machine Learning Model

The trained model is saved in a Model.pkl file and loaded into the app using Pickle:

model=pickle.load(open('Model.pkl','rb'))

3. Load the Dataset

We load the cleaned dataset to populate options for dropdown menus:

house=pd.read_csv('Cleaned_data.csv')
area_type = house['area_type'].unique()
availability = house['availability'].unique()
location = house['location'].unique()

4. Set Up the Streamlit Interface

We use Streamlit widgets like selectbox and number_input to collect user input:

st.title("Car Price Predictor")

area_type_1 = st.selectbox("area type does your house belong to?", area_type)
availability_1 = st.selectbox("Is it availble now or will it be availble soon?",
availability)
location_1 = st.selectbox("Which location is the house at?", location)
bath_1 = st.number_input("How many baths does the house have?", value=None,
placeholder="Type a number...")
bhk_1 = st.number_input("How many rooms does the house have?", value=None,
placeholder="Type a number...")
sqft_1 = st.number_input("What is the sqft of the house?", value=None,
placeholder="Type a number...", min_value=300)

5. Make Predictions

When the "Predict" button is clicked, the app collects the inputs and passes them to the

trained model for prediction:

if st.button('Predict'):
    prediction=model.predict(pd.DataFrame(columns=["area_type", "availability", "location",
"bath", "bhk", "sqft"],
                                          data=np.array([area_type_1, availability_1,
location_1, bath_1, bhk_1, sqft_1]).reshape(1, 6)))
and display the predicted value:
    st.text("The house may cost approximately Rs. "+str(int(prediction[0])*100000))

Deploying the App

You can deploy the app using Streamlit Cloud or other platforms:

  1. Save the code as app.py
  2. Run locally with:

Streamlit run app.py

Example Output

Here’s how the app might look when deployed:

  1. User Input Section
  • Select the area type, availability status location,
and enter details like square footage, bathrooms, and bedrooms.
  1. Prediction Result
  • On clicking "Predict," the app will display the estimated house price

Key Learnings from Bangalore House Price

Prediction blog series

1. Data Cleaning and Preprocessing is Crucial

  • Handling Missing Values: Cleaning missing or inconsistent data is the foundation of a reliable model. For instance, we addressed missing entries in the bath column and extracted numerical values from ranges in the total_sqft column.
  • Outlier Removal: Filtering unrealistic data, such as outliers in price-per-square-foot or total square footage, helps improve model accuracy.
  • Feature Engineering: Adding derived features, such as price per square foot, provides more meaningful insights for the model.

2. Exploratory Data Analysis (EDA) Informs Feature Selection

  • Visualizing data distribution and relationships using tools like histograms and correlation heatmaps uncovers patterns and dependencies in the dataset.
  • For example, correlations between features like bath, bhk, and price influenced our feature selection for the predictive model.

3. Choosing the Right Model Matters

  • Model Comparison: Testing multiple regression algorithms, such as Linear Regression, Decision Trees, and XGBoost, ensures that we select the best-performing model.
  • XGBoost: In this project, XGBoost delivered the highest R² score, making it the ideal choice for deployment.

4. Model Deployment with Streamlit is Straightforward

  • Integration: The trained XGBoost model was serialized using pickle and loaded into the Streamlit app, enabling real-time predictions.
  • User-Friendly Interface: Dropdown menus, number inputs, and buttons make the app intuitive
for users to interact with.

5. Reproducibility is Key

  • By saving the model pipeline, preprocessing steps, and cleaning logic, the workflow becomes replicable for future iterations or for expanding the app to other cities.

8. Web Application Deployment Completes the Workflow

  • Deploying the app on platforms like Streamlit Cloud ensures the solution is accessible to end-users, bridging the gap between data science and real-world applications

These learnings emphasize the importance of a comprehensive workflow that spans data cleaning, EDA, model selection, and deployment, culminating in an interactive and practical tool for predicting house prices in Bangalore.

Summary

Comprehensive Data Preprocessing

Handling missing data, inconsistent formats, and outliers ensured the dataset was clean and

ready for modeling.

Features such as price_per_sqft were engineered, and categorical variables (e.g., area_type and location)

were handled using OneHotEncoding.

Pipeline Creation and Model Selection

A Pipeline was created using ColumnTransformer, StandardScaler, and XGBoostRegressor.

XGBoost was selected for its high R² score, demonstrating its suitability for the task.

Real-Time Web Application

A user-friendly Streamlit interface enabled interactive predictions based on inputs like area type,

location, and square footage.

The app effectively bridges the gap between machine learning and end-user interaction.

The trained model (Model.pkl) was integrated into a Streamlit app, allowing real-time predictions.

Conclusion

This project highlights the synergy between data science and web development:

Clean data and powerful models like XGBoost deliver accurate predictions.

Streamlit apps able non-technical users to interact with machine learning models seamlessly.

The structured pipeline ensures scalability and easy adaptability for other datasets or cities.

Comments