The five-star review system has long been the standard for evaluating products, services, and especially apps in the digital marketplace. But a growing chorus of critics argues that the system is fundamentally broken. The latest exhibit comes from Terry Godier, founder of a new RSS reader called Current. In a series of posts, Godier pointed out a paradox that many app developers know all too well: in the current five-star paradigm, anything below five stars is practically a disaster, rendering the entire system meaningless.
Godier noted that many 4-star reviews for his app include glowing praise like "This is my favorite app!" or "Gamechanger!" Yet these positive 4-star ratings actually hurt the overall rating of the app. An app with mostly 5-star ratings but a sprinkling of 4s can see its average drop significantly enough to push it below a 4.5, making it less visible in app store rankings. This is the broken heart of the five-star system.
The problem is not unique to Current. The App Store, Google Play, Amazon, Yelp, and countless other platforms rely on a scale that is effectively binary: a 5-star rating means success, while anything less is often perceived as a failure. Users have been conditioned to reserve 5 stars for exceptional experiences, while treating 4 stars as a good rating. But algorithms and customer psychology treat 4 stars as a warning sign. The result is a system that punishes honest feedback and incentivizes rating inflation.
The Psychology Behind the Stars
Psychologists and behavioral economists have studied the phenomenon of rating scales for decades. The five-star system suffers from what is known as "scale compression." Because people tend to avoid extremes—both the very high and the very low—the majority of ratings cluster around 4 and 5. This creates a situation where the difference between a 4.0 and a 4.5 average can be profound in terms of visibility and trust. A 4.0 average might appear mediocre, while a 4.5 average signals excellence. Yet the underlying ratings may differ by only a few percentage points.
Furthermore, there is a well-documented bias called "negativity bias" which causes users to weigh negative experiences more heavily than positive ones. This is why a single 1-star review can tank an app's average far more than a single 5-star review boosts it. But the five-star system also amplifies this bias in a counterintuitive way: a 4-star review, which is positive, is statistically treated as a negative because it pulls the average down.
Additionally, cultural differences come into play. In some countries, a 4-star rating is considered excellent because 5 stars are seen as perfection reserved for the divine. In others, anything less than 5 is a sign of dissatisfaction. App developers must compete globally, yet the rating system does not account for these cultural nuances.
The Current Case Study
Terry Godier's Current app is a well-designed RSS reader that has garnered praise for its simplicity and speed. In its early days, it accumulated a mix of 5-star and 4-star reviews. The 4-star reviewers often left comments like "Great app, but I wish it had dark mode" or "Excellent, just needs a few tweaks." But those same reviewers intended to be helpful, not harmful. They thought they were giving a high rating while leaving constructive feedback. Instead, they inadvertently lowered the app's average below the threshold for being featured in Apple's "Best New Apps" or Google's "Editors' Choice."
This is not an isolated incident. Many developers report similar experiences. Another developer of a popular weather app saw his rating drop from 4.8 to 4.6 after a wave of 4-star reviews from users who loved the app but wanted more widget customization. The developer was torn: appreciate the feedback but suffer the ranking penalty. Eventually, he resorted to begging users to leave 5-star ratings if they liked the app, a practice that feels desperate but is common.
The Current example is just the latest in a long line of cases that expose the flaws. The five-star system has been criticized by experts for years, but platforms have been slow to change due to the inertia of existing user behavior and the difficulty of finding a better alternative.
Historical Context: How We Got Here
The five-star rating system traces its roots to the early days of e-commerce. Amazon popularized it in the late 1990s as a simple way to gauge product quality. At the time, it was revolutionary because it gave consumers a voice that could be aggregated. Before that, ratings were often binary (thumbs up/down) or absent entirely.
However, as online reviews proliferated, the system became gamified. Sellers and app developers realized that 5-star ratings were critical for success. This led to review manipulation, fake reviews, and even entire industries built around boosting ratings. Legitimate apps struggle to compete with those that engage in shady practices. Meanwhile, users become skeptical of any app with a perfect 5.0 average, suspecting review fraud.
Platforms have attempted to mitigate these issues. Apple introduced the option for developers to prompt users for ratings after a certain number of launches, but this also led to prompt fatigue. Google Play uses a rolling average and accounts for recent reviews more heavily. Yet neither has solved the core problem: the scale is too coarse for the nuance of human opinion.
Alternatives and Potential Fixes
What might replace the five-star system? Some propose a simple binary like/dislike approach, as seen on YouTube and many social platforms. This eliminates the ambiguity of the middle stars but loses the ability to convey degrees of satisfaction. Others suggest a 10-point scale, which provides more granularity but may overwhelm users and still face compression.
A more promising idea is the "upvote" system used on Reddit and Stack Overflow, where users can upvote or downvote, and the aggregate score is not an average but a net score. This avoids the problem of 4-star ratings being averaged down. However, this system can be gamed through voting rings.
Another concept is to display ratings as a distribution rather than a single average. For example, an app store could show that 60% of users gave 5 stars, 25% gave 4 stars, etc. This gives users more context and reduces the impact of a few lower ratings. Some sites already do this, but it is not mainstream in app stores.
Yet another approach is to use personalized ratings based on the user's preferences or past behavior. If a user tends to like similar apps, a 4-star review from that user might be weighted more heavily than a 5-star from someone who hates everything. But this introduces complexity and raises privacy concerns.
Some companies have experimented with removing reviews entirely for a period, as Apple did with its Podcasts app for a while. The result was that discoverability suffered even more. So reviews are necessary, but they must be improved.
One practical fix that many developers advocate is to change the labeling of the five-star scale. Instead of 1=Hate, 2=Dislike, 3=Neutral, 4=Like, 5=Love, they propose: 1=Poor, 2=Fair, 3=Good, 4=Great, 5=Excellent. This might shift user psychology so that 4 is seen as a very positive rating that does not penalize the app. But the algorithms would still treat 4 as a subtraction from 5.
The fundamental issue is that any average-based system will be sensitive to outliers. A single 1-star review can be devastating. A better approach might be to use median scores instead of averages, which are less affected by extremes. But median scores are less intuitive to users.
Another idea is to implement a "recommend or not recommend" binary system, then display the percentage that recommend it. This is used by some platforms like Amazon for the "Would you recommend this product?" question. It simplifies decision-making and avoids the star confusion.
Finally, artificial intelligence could play a role in analyzing review text to detect sentiment and adjust the rating accordingly. For example, a review that says "I love this app, but it needs one small feature" with a 4-star rating might be reweighted to reflect the positive sentiment. However, this opens the door to algorithmic bias and errors.
The five-star review system is broken, and the Current example is exhibit 472,304. It is a problem that affects developers, consumers, and platforms alike. While no perfect solution exists, the conversation is necessary. The fact that a well-regarded app like Current can be harmed by positive reviews is a sign that the system needs an overhaul. Until then, developers must continue to navigate a landscape where a 4-star rating is often a kiss of death.
In the meantime, users can help by understanding the impact of their ratings. If you truly love an app, give it 5 stars. Reserve 4 stars for apps that are good but have notable flaws. And if you want to give feedback, use the comment box instead of reducing your star rating. This small behavioral shift can make a big difference in a broken system.
Source: The Verge News