Metacritic, video games, reviews, scores, homepage

The trouble with review scores

Following Eurogamer’s announcement in February regarding changes to the way they review video games, Kim asks: is dropping review scores the way to go?


Kim says…

Earlier this year, a number of video game journalism websites took the radical decision to remove scores from their reviews of new releases. The most recent was Eurogamer in early February and before that the now-defunct Joystiq in January; and other sites such as Kotaku have long been advocates of a no-score policy. Journalists seem to be moving away from percentages and numbers towards broader, vaguer awards and recommendations.

When you consider that the review industry – not just for video games but for other purchases also – has long been based on numerical scores, the decision to drop them now seems like a pretty drastic one. So why are so many websites going in this direction and does it help or hinder the community overall? As a first step towards figuring this out, let’s take a look at the review policies of some major gaming sites to gain clarity on where this change has come from.

Changing landscapes, changing reviews

In place of scores, Eurogamer is now going for ‘one-line summaries for every review’ along with an awards structure. They say: “The first and most important component of our new system is a punchy summary that will appear at the top of every review… Beyond this, we still wanted a way to highlight the games that we feel most strongly about. So some – but [not all] – games will be flagged as Recommended, Essential or Avoid.”

Joystiq introduced the Joystiq Excellence Award to help ‘recognise the best of the best’.

Joystiq went for a similar approach before the company’s closure: “At the bottom of every review, you’ll find a quick summary of its important points, which we’re calling ‘In Other Words’. In a few sentences (a paragraph at most), we’ll tell you what you need to know. Furthermore, we’ll give you the Breakdown to help answer common questions like ‘Does it have a season pass?’ and ‘Is there split screen?’” They also introduced the Joystiq Excellence Award to help ‘recognise the best of the best’.

Kotaku have been following the same policy since January 2012, originally set out by Stephen Totilo: “I believe that our writers should have the flexibility to review in the way that best suits the game. They can review the game by writing an essay or writing bullet points; they can review a game with a poem or comic strip. The format for the main part of their review will conform to whichever approach best suits the reviewer’s voice and the game they’re writing about.” A summary box with an answer of Yes, No, or Not Yet is displayed to show whether gamers should play the title in question.

The main reason for dropping review scores seems to be the fact that the way video games are made and distributed has changed almost beyond recognition. In the 1990s it was taken for granted that a copy given to a reviewer was the same at that which would be picked up in shops by consumers; but in recent times this is no longer the case. As Eurogamer Editor Ollie Welsh put it: “In the last few years in particular, the rise of digital distribution and the assumption that most consoles and all computers are connected to the internet has resulted in a much more fluid game development. Some games might evolve right up to the moment of their commercial release, with a day one update. Some games are released commercially long before they are finished, via ‘early access’ versions. Some games never stop evolving.”

Narrowing down something as fluid as a video game and sticking a number on it isn’t as easy as it used to be or necessarily useful.

This means that narrowing down something as fluid as a video game and sticking a number on it isn’t as easy as it used to be or necessarily useful. Richard Mitchell of Joystiq said: “Between pre-released reviews, post-release patching, online connectivity, server stability and myriad other unforeseeable possibilities, attaching a concrete score to a new game just isn’t practical. More importantly, it’s not helpful to our readers.”

Totilo from Kotaku explains why they don’t like statistics: “We’ve long avoided putting scores on games because, as a team, we’re not interested in describing the quality of a piece of artwork in terms of a number.” He even admits that some of his team ‘despise review scores’ and doesn’t seem bothered by potentially losing readers: “I respect the fact that some gamers want to see a number as a useful shorthand. It’s not one we’ll provide, but you can find plenty of numerically-scored game reviews elsewhere.”

Divorcing Metacritic

Something these sites have in common is that they’ve removed their listings from Metacritic following their decisions. This is a website that aims to ‘help consumers make an informed decision about how to spend their time and money on entertainment’ by providing an aggregated score, believing that ‘multiple opinions are better than one, user voices can be as important as critics, and opinions must be scored to be easy to use.’

The way Metacritic’s ‘Metascores’ are calculated is a closely-guarded secret.

The way the site’s ‘Metascores’ are calculated is a closely-guarded secret but they provide the following explanation: “Creating our proprietary Metascores is a complicated process. We carefully curate a large group of the world’s respected critics, assign scores to their reviews, and apply a weighted average to summarise the range of opinions. The result is a single number that captures the essence of critical opinion in one Metascore. Each movie, game, television show and album featured on Metacritic gets a Metascore once we’ve collected at least four critics’ reviews.”

As an example, let’s look at a recent release: first-person shooter Evolve by Turtle Rock Studios. At the time of writing, this game had achieved a Metascore of seventy-eight based on opinions from twenty-six critics with sites such as IGN and Polygon giving it favourable reviews; but it had only achieved a 4.4 User Score based on unfavourable critiques from 613 people. If the website’s goal was to ‘provide multiple opinions’, they certainly achieved it with this one.

Eurogamer explains their removal with the reasoning that there isn’t a fair way to interpret their new review system in Metacritic’s hundred-point scale: “We don’t want to do it ourselves and we don’t want Metacritic doing it for us. For many game creators, far too much is riding on a Metascore – good or bad – for us to allow it to be influenced by a rating that we don’t think represents us fairly, or that we don’t have full control over.”

The Secret World, video game, Metacritic, metscore, reviews

Joystiq gave a similar reason: “When converted to Metacritic’s 100 point scale, each of [our] five stars translated to a whopping 20 points. Games with recognisable flaws but redeemable gameplay – three stars, according to our guidelines – showed up as 60/100. Wonderful games that were just shy of greatness – not quite good enough to get five stars – saw their 4-star ratings converted to 80/100. We weren’t comfortable with the way some games were represented.”

Kotaku is also pushing against the aggregate site: “We don’t give numerical scores, letter grades or star ratings on our reviews. We don’t get included in Metacritic.” So why are so many gaming journalism websites turning their back on what was once seen as an excellent traffic driver and the best resource for gamers everywhere?

The Metascore monster

The first reason is criticism around the way Metascores are calculated. It’s been suggested that Metacritic converts each review into a percentage the website decides for itself; that it manually assesses the tone of reviews that have no explicit score and then assigns one themselves. Alongside this, weighting is also applied so major gaming sites may have a greater influence than smaller ones (although the company refuses to reveal the weights applied to each publication). If this is the case, the numbers shown on Metacritic are highly biased and possibly far from the truth.

The Xentax Foundation’s hypothesis was that Metascores and User Scores differ significantly, based on the statistical flaws and the idea that gamers rate titles in a different way to journalists.

The Xentax Foundation analysed the website’s game-related data to investigate a primary hypothesis: that Metascores and User Scores differ significantly, based on the statistical flaws described above and the idea that gamers rate titles in a different way to journalists. Author Mike Zuurman published his conclusions in a paper in January 2014, stating: “The data at Metacritic leaves a lot to be desired and seems to be heavily biased… Caution is necessary when using Metacritic information to guide your own decision to buy or not buy a game. Until there is more transparency on how this process takes place at Metacritic, more transparency on the flow of funding from which sources, and the observed biases are removed, the database is of limited use for the end-users.”

Zuurman’s paper is a thought-provoking read and there are a number of interesting points to be taken from it. It shows that in the majority of cases Metascores don’t match User Scores; the latter is higher across all platforms overall but when looking at the video games receiving high marks, the majority of gamers score them lower thus indicating possible overrating by critics. Reviews listed at Metacritic are dominated by IGN, GameSpot and GameZone. When you consider that the titles possibly overrated are usually those by big American publishers such as Rockstar, Blizzard Entertainment and EA: is there a US bias?

Metacritic’s influence on the gaming industry

The second reason is criticism of the fact that Metacritic holds a certain amount of sway over the gaming industry. It’s believed that the website has become such a powerful force, it doesn’t just impact the nature of reviews: it directly influences development and marketing, as well as how developers are paid for their work. There are a number of high-profile examples of this.

Back in 2012, Activision was involved in a legal battle with the Call of Duty creators and had to reveal details of its original contract with Bungie for Destiny.

Back in 2012, Activision was involved in a legal battle with Call of Duty creators Vince Zampalla and Jason West and had to reveal details of its original contract with Bungie for Destiny as part of the court case. In section 10.3 of the public documents was this statement: “Activision shall pay to Licensor a quality bonus (the ‘Quality Bonus’) in the amount of Two Million Five Hundred Thousand Dollars ($2,500,000 should Destiny Game #1 achieve a rating of at least 90 as determined by gamerakings.com (or equivalent suitable services if gamerankings.com is no longer in service) as of thirty (30) days following the commercial release of Destiny Game #1 on Xbox 360.”

This instance may not involve Metacritic but it does show just how much importance publishers place on review scores by adding clauses around aggregated website rankings into contracts with developers. If they’re about to enter into a multi-million dollar deal with a development studio, they want to make sure they’re going to see a worthwhile return on their investment; and such agreements could be a way to minimise any business risk. Linking a developers’ bonus to a games’ Metascore is nothing new, and the practice has been widely reported in the industry for years.

Ars Technica’s April 2014 analysis found a direct relationship between scores and Steam sales, their estimates showing that a title with a Metascore of ninety or more will sell fifty times as well as one with a number of less than thirty. However, they also noted that individual releases can buck this trend: for instance, first-person shooter Orion: Dino Horde had managed a respectable 314,000 estimated sales despite being shown as a thirty-six on Metacritic. They concluded by saying: “Overall, our data shows that, all things being equal, a game with better review scores has a better chance of selling well than one with worse review scores. That’s especially true at extreme ends of the review scale, where games come out as almost guaranteed success or failures based on critical consensus.”

The studio employee claimed the publisher used the Metascores as leverage against the studio, first to negotiate for less favourable terms, and then to turn down the pitch entirely.

In an article for Kotaku in April 2013 about how review scores hurt video games, Jason Schreier said: “An employee of a well-known game studio told me about a recent pitch meeting with a publisher, during which the publisher brought up the studio’s last two Metacritic scores, which were both average. The studio employee asked that I not name the parties involved, but claimed the publisher used the Metascores as leverage against the studio, first to negotiate for less favourable terms, and then to turn down the pitch entirely.”

Many believe that such practices lead to a lack of creativity and innovation in big-budget titles, as the slightest drop in an average score can have disastrous consequences for the developer. For example, following the launch of The Secret World in July 2012 Funcom’s share-price decreased significantly. In an Investor Relations update on their website they declared: “The company attributes this mostly to the aggregate review score, the ‘Metascore’, for the game at Metacritic together with other public sources for tracking the performance of games.”

As Eurogamer puts it: “Over the years, we’ve come to believe that the influence of Metacritic on the games industry is not a healthy one (and we’re not alone in this opinion in the industry, either). This is not the fault of Metacritic itself or the people who made it, who just set out to create a useful resource for readers. It’s a problem caused by the over-importance attached to Metascores by certain sectors of the games business and audience… The result has been conservatism in mainstream game design and a stifling of variety in critical voices. In short: it’s meant less interesting and innovative games.”

Gaming journalism websites don’t like the Metacritic because the Metascores it calculates are possibly biased and publishers place too much emphasis on this number.

So based on what we’ve found out above, it seems as if the reason gaming journalism gaming websites are dropping review scores is because they can no longer be applied in the modern world where video game development is so fluid. This therefore means that they can no longer be listed on Metacritic because recommendations can’t be easily translated into a hundred-point scale. And they don’t like the aggregate website anyway, because the Metascores it calculates are possibly biased and publishers place too much emphasis on a number which could be inaccurate.

Playing the game

But even with Eurogamer’s change in direction, they still couldn’t escape scores entirely: “When searching for reviews in Google however, you will still see star ratings attached to [our] reviews: five stars for essential, four for Recommended, one for Avoid, three for everything else.” And the reason for this: “Google is a very important source of traffic for us, and it’s vital that our reviews are made easy to find by being as featured as prominently as possible. The star ratings help a great deal with this, and we feel that the scheme [we’ve] just described is a pretty close match for our system that won’t misrepresent our reviews.” So the website that said the ‘influence of Metacritic on the games industry is not a healthy one’ now has a biased rating system themselves, as none of their critiques will be presented by two stars; and apparently the stars shouldn’t be ‘misinterpreted as [Eurogamer] sneaking a numerical score out there by stealth.’

Their new stance seems understandable when you consider the industry’s increased – and perhaps damaging – focus on Metascores. The number at the bottom of a review is an imprecise way of distilling a colourful opinion into a single digit, when it’s actually the content that comes before it that matters; and writers can potentially find themselves responsible for a developer’s future. Obsidian CEO Feargus Urqhart said in a conversation with Schreier of Kotaku back in December 2012: “A lot of times when we’re talking to publishers – and this is no specific publisher – there are conversations I’ve had in which the royalty that we could get was based upon getting a 95.”

Writers need to carefully consider their words and stand by their opinions, giving a release the evaluation they believe it truly deserves.

But this shouldn’t be a consideration in the critical process: writers need to carefully consider their words and stand by their opinions, giving a release the evaluation they believe it truly deserves. And it’s not just the publishers that play the game. For example, in Schreier’s article on how Metacritic hurts video games he said: “One developer – a high-ranking studio employee who we’ll call Ed – told me he hired someone to write a mock review [before the title’s release], then just shredded it. Ed didn’t care what was inside. He just wanted to make sure the reviewer – a notoriously fickle scorer – couldn’t review his studio’s game [after its release]. Ed knew that by eliminating at least one potentially-negative review score from contention, he could skew the Metascore higher.”

Metacritic’s FAQ page shows that Eurogamer’s Italian and Polish sites are still included within the aggregate calculations, despite the company announcing they would no longer be listed. So while not scoring new releases, some of their sites based in countries outside of the UK (although it should be noted this isn’t the majority) could be critiquing the same titles and slapping a numerical rating on them – meaning the company still technically appears on Metacritic, and receives traffic from that direction along with any advertising revenue generated from it.

Everybody has their own opinion

So should scores should be removed from video game reviews altogether? I asked several colleagues and friends, and here are their reactions.

Ben, our Community Manager: “I’ve been an advocate for not having review scores for years now and make no apologies for it. I can definitely see the attraction to a reader of being able to skip to a number and make a snap judgement on a game but at the end of the day it’s just an arbitrary figure. To really understand a review you have to read it, understand the reviewer’s thought processes and reasoning behind why they like (or don’t like) certain aspects of a game. Read as many opinions as you can, absorb the views of a number of sources and make up your own mind if it’s a game you want to play or not.”

Phil, our Webmaster: “I can’t see this whole ‘dropping review scores’ thing getting anywhere in the future, it’s just one big bandwagon built for those who spend more than a few hours a week on the internet. It’s like a meme: just not funny and rather boring. These websites seem to be changing their review policies for the sake of sounding hip and fresh, which others copy and readers blindly back as though it’s ‘the next cool thing’.”

Tim, our partner at GeekOut South-West: “Video game journalism is an art, which can’t always be graded with a traditional mathematical equation. What you score each aspect of a game is irrelevant, as you have to look at the overall picture. Much like a grand painting, you have to look at a game from all angles to truly appreciate it.”

Joel, our partner at Quotes from the Tabletop: “One word: Metacritic. If gaming magazines remove scoring on their reviews, then Metacritic ultimately becomes useless and it is THE best rating you can get on a game. One man’s opinion don’t mean anything against the opinions of the whole.”

Pete, designer and friend of 1001Up: “I go to a site such as Eurogamer to get an idea of whether a game is worth spending my hard-earned cash on. While a nice long review of the game is good and it’s informative to read what the writer finds positive and negative, it’s all based on their personal opinion. That’s no good to me when parting with my £50 that’s taken a couple of hours to earn in my windowless hell-hole of a workplace. What I like to see at the end of that review is something that indicates a scalable buy-or-not-to-buy judgement based on a set of parameters that all the titles reviewed are based on and which are agreed on by a team. What those parameters are depends on which site reviews I read.”

The case for review scores

Personally, I have a fondness for review scores: they suit the way I think and I like being able to get an impression of a writer’s overall opinion before delving into the detail of their review. I understand that they shouldn’t be taken in isolation and the same number can have wildly-varying meanings on different websites. Based on the information my research for this article uncovered, I can understand why some journalists want to drop them; but I don’t necessarily believe it’s the right way to go.

One of the reasons for enjoying gaming as much as we do is because nothing stands still and there’s always something new to capture our interest.

Eurogamer argues that: “The way video games are made and distributed has changed almost beyond recognition… All of this has made the job of reviewing games much more unpredictable and complex.” But isn’t that part of what makes this job fun? One of the reasons for enjoying gaming as much as we do is because nothing stands still and there’s always something new to capture our interest; and playing video games should mean we’re pretty adept at figuring out solutions to problems.

Picking a score for a review isn’t easy and a number can only be appropriately assigned once all of the words have been written. But these challenges come with the territory: it’s a reviewers’ responsibility to explain to a fellow gamer whether it’s worth picking up a certain release. Rating a title on an easily-understandable numerical scale means we have to carefully consider the paragraphs that come beforehand and stand by our opinions; and if we can’t do this, then surely we’re not doing our job properly? Dropping scores becomes unnecessary if you can ensure your writers put effort into their work and provide sensible reasoning to support their views, and your policy clarifies that scores apply to games as they were at the time they were played.

Eurogamer’s next point is that a number is ‘a very reductive way to represent a nuanced, subjective opinion, and that the arguments started by scores aren’t productive.’ In part I agree with this: video games are somewhat unique in just how much focus is placed on a number and gamers can have a tendency to debate these to the point of obsession. Joystiq agreed with this sentiment too, saying: “A score can’t tell you what a critic liked or disliked about a game, or why. It can’t tell you what qualities are most valued by the review’s author.”

Scores are only as good as the words they’re trying to reflect, but they can create a standard and add structure to opinion.

But criticism will never be perfect, regardless of whether it’s presented via a numerical score, simple recommendation or a full-blown essay; and while arguments aren’t useful, discussion is good. Reviews are subjective and not everyone will agree with a certain view, but nobody can deny they’re a part of our discourse and the dialogue of our community. Scores are only as good as the words they’re trying to reflect and can’t give any true detail when taken by themselves; however, when consistently applied they can create a standard and add structure to opinion.

Despite dropping the numbers, Eurogamer ‘still wanted to highlight the games that [they] feel most strongly about’. But their move to Recommended, Essential and Avoid awards could be interpreted as a change from a ten-point to a four-point scale (if you include those video games that aren’t given an award at all); a new system that’s vaguer, doesn’t allow for granularity and could potentially cause confusion. Several titles may be given a ‘Recommended’, but without a number it’s hard to tell whether they received an enthusiastic seal of approval or barely made the grade.

A number can be a pretty good indicator for regular readers. If you already know that a particular writer has a similar taste in video games to you, then their review scores can become a useful shorthand for how well you’re likely to receive certain titles. This is why we encourage readers to follow the 1001Up team individually on Twitter so they can find out what makes each of us tick – and sometimes we might just say something funny or post pictures of kittens.

You can’t force gamers to do anything

In my experience, forcing gamers to do anything has never resulted in anything positive.

During my research for this article, I found that the main reason why both journalists and readers are in favour of removing scores is that it will force gamers to read through the entire review rather than just skipping ahead to the number at the bottom. But in my experience, forcing gamers to do anything has never resulted in anything positive. Those who used to scroll forward to the score won’t suddenly start reading every word within an article; they’ll just find another website who can quantify the quality of a release in the way they want.

Joystiq believed that ‘without the full context of a review to explain it, about the only thing a score is good for is deciding whether you want to take your time to read the review in the first place.’ But that one number can say it all. If the video game you’ve been looking forward to scores a nine, you probably won’t bother reading the article to find out why – you’ll just go and buy it. But if it receives a number you weren’t expecting, you’ll want to find out more to see if whatever it was the reviewer didn’t like could be something that would annoy you too. So in some cases, the numbers can actually encourage gamers to spend time on the details.

And there’s evidence to suggest we actually want numerical scores. Polygon’s Reviews Editor Arthur Gies was quoted in an article by Ars Technica as saying: “We had access to peers at other outlets who had gotten rid of review scores after having them, who saw the interest in their review content suffer considerably, and sites who added review scores and saw a commensurate increase in interest in their reviews… The anecdotal accounts and experiences we had suggested that readers want them, whether they admit to it or not.”

If removing review scores is a way of ensuring a gamer reads through the whole article, then readers will remain on the same website for a longer period of time while they take in the detail.

There’s one other thing that strikes me here, although I’m no marketing expert and so can’t confirm whether it’s true or not. If removing review scores is a way of ensuring a gamer reads through the whole article, then readers will remain on the same website for a longer period of time while they take in the detail. And if this is the case, wouldn’t that website then appear a lot more attractive to potential investors and advertisers? Hmm.

Don’t give up on review scores

I believe scores do matter, but the reason why they’re important has become clouded. They should be seen as a rough guide or a summary of a well-thought-out opinion, not the review as a whole or as a means for publishers to secure their profit. A number’s value lies not in the number itself but in the thought and judgement behind it.

Video games aren’t cheap, especially when compared to other forms of entertainment. So if you’re about to spend £50 on a new big-budget release, checking out review scores may not be a perfect exercise but it can give you a good idea of whether to part with your money. That being said however, everyone’s definition of ‘value’ is different; you’ll only get a good impression of ‘worth’ if you do your research, look at a range of opinions from different writers, and take in the detail when you need it.

As a journalist, it’s my responsibility to present enough evidence about the title I’m reviewing to allow readers to formulate their own conclusions.

Everybody is different when it comes to reviews. Some like a full evaluation with plenty of detail on a video game, while others don’t have the time or inclination to sit through a lengthy article. We wouldn’t be doing our job properly if we didn’t try to cater for both preferences and provide information in a format that’s accessible. As a gaming journalism website, that’s just good business sense; and as press, it’s my responsibility to present enough evidence about the title I’m reviewing to allow readers to formulate their own conclusions.

Here at 1001Up, we’d obviously love for you to read our articles instead of skipping ahead to the grade, as a static review score has a limited shelf-life and can never fully represent the reviewer’s assessment. But they do have their uses: they complement the text when not considered in isolation, provide a generation of a title’s quality, and give readers an understanding of where we’re coming from. Above all they help us to determine which titles should be included in our own list of the greatest video games in the world.

It simply wouldn’t be practical for us to complete our 1001 Project without applying a score to video games. But it’s not just a case of slapping an arbitrary number on a title: a single score is potentially meaningless because a game can be appreciated on a number of different levels. So since starting in February 2013 we’ve been basing our ratings on six separate factors to try and give as comprehensive a picture as possible, more information on which can be found in our guide.

As long as gamers need a concise way to get an opinion, scores will have a place and can exist in conjunction with written detail.

Perhaps the review industry is giving up too easily by saying that scores can’t possibly work. As long as gamers need a concise way to get an opinion – especially a series of them – they will have a place and can exist in conjunction with written detail. There are people interested in arguing numbers and people who are more concerned with discussing the points raised within a review; and neither are mutually exclusive.

Back to top

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s