Member-only story
Normalized Discounted Cumulative Gain (NDCG) — The Ultimate Ranking Metric
NDCG — the rank-aware metric for evaluating recommendation systems
Recommendation systems are everywhere. Since you’re reading this article, there’s a good chance Medium recommended it on your feed. This article will explore NDCG — Normalized Discounted Cumulative Gain, the rank-aware metric for evaluating any recommendation system model.
What are Recommendation Systems?
Recommendation systems help users discover relevant items like products, profiles, posts, videos, ads, or information based on their preferences or behavior. These platforms handle millions of items, and displaying the most relevant ones is key to boosting user engagement and business metrics. Companies such as Amazon, LinkedIn, Twitter, Instagram, Reddit, Spotify, YouTube, Netflix, Medium, and Quora use recommendation systems in their apps.
These systems are typically two-stage systems consisting of a retrieval model followed by a ranking model. The retrieval model funnels down the most relevant items from millions of items based on a similarity metric and passes them to the ranking model. The ranking model ranks the items on a more granular level.
For example, when a user searches for “blue jeans” in an e-commerce app, the retrieval model finds similar items, and the ranking model, trained on interaction data (e.g., clicks or orders), predicts the likelihood of user engagement. The ranked items are then shown in descending order based on their predicted scores.
How do we evaluate the ranking models? Why NDCG?
Accuracy, precision, recall, F1-score, and ROC AUC are popular ML model evaluation metrics. However, are they a suitable metric to evaluate the ranker model’s performance? We’re interested in the actual output score of an impression, not the class it belongs to. These metrics treat all predictions equally and don’t consider the ordering of items, which is critical in ranking tasks. They measure relevance in a binary manner, which isn’t sufficient.
Rankers need metrics like NDCG, MRR (Mean Reciprocal Rank), or Precision@K that evaluate the position and…