Evaluating the Performance of Random Forest Algorithm in Classifying Property Sale Amount Categories in Real Estate Data
Main Article Content
Abstract
This study explores the use of machine learning algorithms to classify property sale categories in real estate data, focusing on the performance of the Random Forest algorithm. The dataset, comprising over one million records of property sales from 2001 to 2022, includes features such as sale amount, assessed value, sales ratio, property type, and residential type. The primary objective is to determine which algorithm better predicts property sale categories and to assess how these predictions can aid in market segmentation and property valuation. After preprocessing the data by removing irrelevant columns and handling missing values, we applied the Random Forest classifier to predict five key property types: 'Single Family', 'Residential', 'Condo', 'Two Family', and 'Three Family'. The model achieved an accuracy of 82.98%, with high recall for categories like 'Single Family' and 'Condo', but struggled with 'Residential', which displayed a lower recall due to its diverse nature. The findings suggest that the Random Forest algorithm performs well in predicting certain property types, but improvements are needed for categories with more variation. The study highlights the importance of selecting relevant features such as sale amount and assessed value, which were found to be the most influential in determining property type. Real estate professionals can leverage these machine learning models for more accurate market segmentation, leading to better pricing and marketing strategies. However, the study also acknowledges limitations, such as the complexity of the 'Residential' category and potential data imbalance. Future research could focus on incorporating additional features, such as location-specific data or detailed property descriptions, and testing alternative algorithms to further enhance classification accuracy.
Article Details

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with International Journal for Applied Information Management agree to the following terms: Authors retain copyright and grant the International Journal for Applied Information Management right of first publication with the work simultaneously licensed under a Creative Commons Attribution License (CC BY-SA 4.0) that allows others to share (copy and redistribute the material in any medium or format) and adapt (remix, transform, and build upon the material) the work for any purpose, even commercially with an acknowledgement of the work's authorship and initial publication in International Journal for Applied Information Management. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in International Journal for Applied Information Management. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).