Bangalore House Price Prediction (EDA and Model building)
Exploring EDA and Model Building: Unveiling Patterns in Data
In our previous blog, we laid the foundation for data handling—understanding how to clean, preprocess, and prepare raw data for analysis. Now, we take the next step in our data science journey by diving into Exploratory Data Analysis (EDA) and model building.
EDA serves as the critical bridge between raw data and meaningful insights. It helps us uncover patterns, detect anomalies, and gain a deeper understanding of the dataset through visualization and statistical analysis. With a well-explored dataset, we can then move forward to model building, where we apply machine learning algorithms to make predictions and extract valuable knowledge.
Exploratory Data Analysis
Distribution
of Prices
A
visualization of the target variable helps identify patterns and outliers:
Correlation Heatmap:
Exploring relationships between numerical features:
Building the Predictive Model
Splitting the Data
Importing the dependencies to build and evaluate the predictive model:
Utilizing a ColumnTransformer to standardize and transform numerical features efficiently:
Creating the dictionary of the above objects:
Applying the for loop to run the algorithms shown above to find the best model:
And these are the results:
For Linear Regression R2 Score 0.7970679542666947 MSE 1945.7399545076323 ================================================== For Lasso R2 Score 0.7875594982465522 MSE 2036.908319352207 ================================================== For Ridge R2 Score 0.7970501849908871 MSE 1945.9103287320524 ================================================== For XGBRegressor R2 Score 0.8508123625187537 MSE 1430.4312850980166 ================================================== For RandomForestRegressor R2 Score 0.8285199494851324 MSE 1644.1739621856352 ================================================== For AdaBoostRegressor R2 Score 0.7070223697603215 MSE 2809.109221140365 ================================================== For gradient Boost Regressor R2 Score 0.8289161842782232 MSE 1640.3748092943463 ================================================== For Decision Tree Regressor R2 Score 0.7453190035472872 MSE 2441.915906682901 ==================================================
As the XGboost regressor has the highest R2_score, that model has been selected and a pipeline has
been made:
The pipeline has been dumped using pickle module to be loaded in web application to make predictions:
In this blog, we explored the process of Exploratory Data Analysis (EDA) and Model Building,
uncovering patterns in the dataset through visualization and statistical analysis.
We examined the distribution of house prices, identified relationships between numerical features using
a correlation heatmap, and implemented various machine learning algorithms to build a predictive
model.
After evaluating multiple regression models, XGBoost Regressor emerged as the best-performing model
with the highest R² score of 0.85 and the lowest Mean Squared Error (MSE).
This model was then incorporated into a pipeline, ensuring efficient preprocessing and prediction, and
subsequently saved using pickle for deployment in a web application.
This structured approach—from data handling to model deployment—demonstrates the power of
EDA and machine learning in deriving valuable insights and making accurate predictions.
With the trained model now ready, it can be integrated into real-world applications to assist in data-driven
decision-making.
Stay tuned for the next blog, where we will explore deploying this model in a web application
for real-time predictions!


Comments
Post a Comment