Predicting Apartment Prices in Buenos Aires

This project focuses on predicting apartment prices in Buenos Aires using a dataset of property listings. The primary goal is to build a machine learning model that can accurately predict the price of apartments in U.S. dollars.

The process involves several key stages:

Data Exploration and Cleaning: The author begins by loading and examining the dataset, which includes features like property type, location (neighborhood, latitude, longitude), surface area (total and covered), number of rooms, and price. A significant amount of data cleaning is performed, which includes dropping columns with too many missing values, removing outliers, and imputing missing values for important features.
Feature Engineering and Preparation: Categorical features like property_type and place_name (neighborhood) are converted into a numerical format using one-hot encoding. The data is then split into training and testing sets.
Modeling and Evaluation: Several regression models are trained and compared, including Linear Regression, Ridge, Lasso, Random Forest, Gradient Boosting, and XGBoost. The models are evaluated based on Mean Absolute Error (MAE) and the R-squared (R²) score.
Results and Conclusion: The XGBoost Regressor performed the best, achieving the lowest MAE and the highest R-squared score. An analysis of feature importance from the best model revealed that the most influential factors in determining an apartment’s price are its total surface area, covered surface area, and location.