Clustering Netflix Shows Based on Features Using K-means and Hierarchical Algorithms to Identify Content Patterns

Main Article Content

B Herawan Hayadi
Eko Priyanto

Abstract

This study explores clustering patterns within Netflix's movie catalog by applying K-means and hierarchical clustering algorithms. The primary objective is to identify distinct content groups based on features such as movie duration, release year, and content ratings. The dataset, which includes 5,185 Movies, was preprocessed by handling missing values, one-hot encoding categorical variables, and standardizing numerical features. Four distinct clusters were identified, with each cluster exhibiting unique characteristics. Cluster 0 primarily consists of longer, family-friendly Movies rated TV-14, while Cluster 1 contains shorter, mature Movies with a TV-MA rating. Cluster 2 represents a diverse range of TV-MA Movies with moderate durations, and Cluster 3 focuses on adult-oriented, longer Movies with an 'R' rating. These findings offer valuable insights into Netflix's content strategy, highlighting the platform's ability to cater to different audience segments based on content type and viewer preferences. The results suggest that Netflix can leverage clustering patterns to improve its recommendation system and content acquisition strategy. However, the study is limited by the absence of user-specific data and the reliance on basic metadata features. Future research could explore the integration of additional features like user ratings and apply deep learning techniques for more sophisticated clustering.

Article Details

How to Cite
[1]
B. H. Hayadi and E. Priyanto, “Clustering Netflix Shows Based on Features Using K-means and Hierarchical Algorithms to Identify Content Patterns”, Int. J. Appl. Inf. Manag., vol. 5, no. 2, pp. 98–110, Jul. 2025.
Section
Articles