On Tetris

A commenter to one of my earlier L/N posts asked what N-Score I would give Tetris. It is an interesting question and one I considered myself when I first began thinking about L/N.

To begin with, the N-Score would depend on the specific version of Tetris being reviewed. Any version of Tetris with a decent soundtrack and nice, crunchy sound effects would score somewhere in the average range of 9 to 12. A lack of, or low quality, sound effects or music could bring the N-Score below average while a truly excellent soundscape could pull it a little above average. Remember, I score games relative to their concept, so a complex narrative isn't a requirement for a puzzle game like Tetris. However, there are a number of things that a Tetris game could do to earn a significantly better than average N-Score.

For one, a dynamic soundtrack could do a lot to improve the "narrative experience" of Tetris. If the music matched the state of your current game, whether frantic and tense as you play on the knife's edge of defeat or triumphal and celebratory after landing a well set up tetris, you would certainly be more immersed in the game.

A Tetris game could also take a few cues from mahjong games and add opponents or antagonists that you play against. Instead of pieces dropping without explanation from the top of the screen, they could be positioned and dropped by an animated character. Tougher opponents would drop them faster and with different algorithms.

A Tetris game which did both of these could potentially earn a very high N-Score. So really, high N-Scores are not limited to certain narrative centric genres. Any game which maximizes the artistic and narrative potential of its concept can earn a high N-Score.

L/N Implementation Details

In an earlier post I described a dual metric for reviewing video games called the L/N system. That first post details both my philosophy of assigning quantitative scores and the qualities to be measured by each metric. Now it is finally time to discuss the specifics of my implementation of the L/N system. I say 'my implementation' because the L/N concept itself only applies to the what and doesn't extend to the how. Other reviewers are encouraged to develop their own implementations of the system.

A Problem of Scale

Probably the most common set of complaints I see regarding quantitative scores relate to the scale being used, which is almost always linear. Sometimes there is confusion in interpretation because the scale used is numerically linear but semantically nonlinear. For instance, the difference between a 6 and a 7 is greater or less than the difference between a 9 and a 10. Another common source of misinterpretation regards what constitutes an average score. Percentile scales and 10 point scales are particularly susceptible to this type of misinterpretation. Many readers will interpret a 7 or 70% as an average score even though it is well above the median score of 5 or 50%. The print magazine EGM recently moved from a 10 point scale to a letter grade scale to mitigate this sort of ambiguity. To avoid confusion, the scale used should be numerically and semantically consonant. The numerical mean should match the semantic mean and the relative numerical differences should be consistent with their semantic interpretation.

Another common complaint related to numerical scales is that the differences between scores often seem arbitrary. Percentile scales, for instance, are much more precise than any reviewer can justify. Having too much precision in a scale will undermine it to the point that readers will begin to question the accuracy as well. Linear scales compound the precision problem because the greatest works are often much, much more impressive than the average, but extending a linear scale far enough to adequately represent that may make the scale appear to be overly precise.

My Solution

So, I want to avoid linear scales and use a nonlinear one that gamers have an intuitive grasp of. It is also best if the scale is not easily confused with a linear one. My solution is to use a scale based on 3d6. Yep, the very same scale used to measure ability scores in Dungeons and Dragons(for those not in the know, D&D players generate their characters' ability scores by rolling three 6-side dice and summing the results). The biggest numerical advantage to using 3d6 is that it is distributed normally, i.e. the graph of possible values is a bell curve. The major non numerical advantage is that it is familiar, and even if you don't know what a normal distribution is, you probably realize that rolling an 18 is quite a bit harder than rolling a 17. The scores themselves will be represented visually by images of three dice. This serves to add another layer of information based on the specific dice chosen to represent the score. A 13 could be represented by 5-4-4 or 6-6-1, for example. A score of 6-6-1 would indicate potential greatness brought down by one or more serious flaws.

To provide a little more detail, the median score is 10.5. Since I'm obviously only using whole numbers, this means the average score ranges from 9 to 12. Scores within this range account for roughly 48% of the population. Less than half of one percent of the population would have an 18. Of course I am not going to assign scores with the sole aim of fitting a probability distribution, but the distribution does help define the relative difference between scores.

Guidelines For Assigning Scores

Now that I have defined the scale, I'd like to put forth a few guidelines I will follow when assigning scores. A number of questions have been raised by readers so far and hopefully these guidelines will paint a clearer picture of my scoring criteria.

First, context is important when scoring a game. The very highest scores are reserved for those games which are both incredibly well executed and groundbreaking in some way, and innovation is meaningless outside of the context of when the game was released. The original DOOM would score higher than the many clones which followed, even though some may have been just as well executed. Technological context is also important. Were a new 16 bit console game released today, its technology-related aspects would not be compared against current gen consoles.

In addition to historical and technological context, the concept behind the game is also important. Many reviewers already rate games relative to their genre, but I would like to be explicit in my belief that reviewers should take it a step further and make an effort to divine the developers' specific ideals and goals. I am a firm believer that not all games, not matter how excellent, will appeal to all players. If the very concept of a game and what it is trying to achieve doesn't appeal to a player, then they probably will not like it. A game should be judged in light of these factors. However, this doesn't mean that the concept itself is beyond reproach. Lack of ambition, for example, will certainly prevent achieving the highest scores.

For determining the N-Score, a useful guideline is to consider how interesting the game would be if you were watching someone else play it. This doesn't perfectly capture my concept of the N-Score, but it is still useful to consider because many elements measured by the N-Score can be appreciated by someone other than the player, such as music, art design, story, etc. There are some crucial differences though. Player immersion is something that the N-Score should be concerned with, but immersion is hard to measure if you aren't the one playing. The visual feedback provided by video games in response to user input can make the user feel like a hero in ways that movies and other passive media cannot. This type of immersion isn't captured by the L-Score because the visual feedback often has little or no impact on actually playing the game. It is a complex topic, but hopefully these examples clarify the concept of the N-Score.

And with that, I believe I have written more about my video game review philosophy and metrics than any mainstream gaming web site or print mag. Kind of a shame considering that I haven't written a single review and they've written thousands.

EDIT: Added a paragraph on 'concept' to the guidelines.

EDIT: Eurogamer actually has a fairly detailed description of their scoring policy. It doesn't address the same problems as my system, but at least it provides a detailed semantic description of their scale.

A Defense of Game Review Metrics

The subject of video game reviews turned out to be a hot topic last week. In addition to my post Reviewing and Scoring Video Games, there was a column on gamesetwatch by Simon Parkin and an interesting article at PopMatters by L.B. Jeffries. Jeffries makes an excellent point in distinguishing the majority of game reviews today from the type of real criticism that the industry could use a lot more of -- reviews are targeted towards consumers making purchasing decisions while criticism is targeted towards the game makers themselves. According to Jeffries, most reviews today don't go beyond "this game is/isn't fun" to explore the why's which could help game developers make better games. Parkin also has a few points to make concerning consumer oriented reviews. Parkin contrasts video game reviews with consumer electronics reviews, noting that an objective measure of quality isn't possible for games in the way that it is for consumer electronics. Instead, Parkin says, game review scores are really a measure of how well a game lives up to its pre-release hype, although consumers still view them as an objective measure of quality.

In reading the comments to these articles and others, I've gathered that many people agree with Parkin's point that attempts to objectively rate games are fundamentally flawed. While I agree that pure objectivity is impossible, there are a number of reasons why quantitative metrics are still worthwhile:

1) Quantitative metrics allow for searching and sorting. If a reader finds a critic he often agrees with, he can quickly find all games that the critic rated highly without having to skim the text of every review.

2) Quantitative metrics allow for algorithmic processing and analysis. Even though "wisdom of crowds" aggregating sites such as metacritic are often flawed, one shouldn't condemn the entire concept. Metacritic has a lot of problems, but most are related to the site's implementation. A critic's review scores could even be used to rate the critic himself. The potential applications are endless.

3) Multidimensional metrics provide a framework for the reviewer, hopefully improving consistency when assigning scores. Flaws in games are naturally more apparent when the game is judged from different perspectives, reducing the likelihood of a reviewer reflexively handing out a perfect score to a flawed game merely because it does some things better than any game which came before it.

Futhermore, enough people like review scores to prevent them from going away anytime soon, so we might as well spend a little time thinking about creating better metrics.

Out of all the critics of game review metrics, the group I most respect are those, like Jeffries, calling for more insightful criticism and less consumer oriented reviews. I also consider this to be a very real problem, but it doesn't entirely preclude the use of metrics. Certainly there are many focused pieces of criticism which wouldn't have anything to gain by applying a numerical rating, but more macroscopic pieces which analyze the entire work could still gain a lot from quantitative metrics. I, for one, plan on writing reviews that utilize both metrics and, hopefully, insight.

Reviewing and Scoring Video Games

I've been considering writing a few game reviews for my blog, which inevitably leads to thinking about scoring systems. Assigning a concrete score to any creatively produced work isn't something to take lightly. If a grade is assigned, it naturally creates an aura of objectivity and carries the weight of perceived authority. In many cases, the grade assigned carries more weight than the content of the review itself. The final score also opens the critic to criticism as well. If the critic desires the air of authority that concrete scores engender, he must take as much responsibility for the score assigned as he does for the content of his review.

It is for all these reasons that a critic should think carefully about any scoring system that he adopts. It is vital that the system used is consistent with the critic's philosophy of judging the medium in question. For me, the act of assigning a score of some sort is important because I believe that works of art CAN be judged objectively. I wouldn't bother with criticism at all if I didn't feel that this was the case. The challenge is to devise a scoring system which is informative enough to allow readers with their own varying predilections to make their own interpretations of quality without sacrificing the objectivity and finality of assigning a 'final' score. It is difficult to do this with a single, one dimensional metric.

In the old days game magazines would rate games on graphics, sound, difficulty, etc. Breaking the score down to this level of granularity is problematic for a couple of reasons. First, I might not be an expert in every category I might determine is necessary to judge. I feel much more qualified to judge a game's graphical quality than I do its sound design, for example. Second, it is important for a critic to take a stand on excellence, to make a final judgment. A myriad of small judgments certainly doesn't carry the same weight as one definitive score. And finally, the metrics used to describe one work's greatness may not paint an accurate picture of another.

Speaking of video games specifically, academics in the field of game studies can be roughly divided into two different camps, the narrativists and the ludologists(the wikipedia entry for ludology has a brief description of the differences for the uninitiated). I have yet to see a game scoring metric which synthesizes the current academic discussion on games. Therefore, I am proposing the use of a system which consists of two scores, one measuring the game's excellence from a ludological perspective and the other rating the narrative as it applies to the game. For lack of better terminology I will refer to these as the L-Score and N-Score, respectively. I am personally more of a ludologist, but that doesn't obviate the importance of narrative elements. After all, people play games for different reasons.

The L-Score is the score which is most closely related to the uniqueness of the medium. I have argued before that games are different from art because they aren't simply admired, they are also played. It is the interactive nature of games which ludologists emphasize, and so one can think of the L-Score as a metric for gameplay and game design. Mechanics, systems, and level design are the key components measured by the L-Score.

If the L-Score is a measure of a game's design, then the N-Score is a measure of its artistic achievement. The narrative, in this case, is defined rather broadly. It consists of the game's music, writing, visual style, sound design, overall setting, etc. All of these factors influence the player's involvement in the game and are therefore important even if they don't have much of a direct impact on the actual gameplay.

There may be some overlap between the components measured by the L-Score and the N-Score. For instance, the sound design in a first person shooter may provide an increased level of information and awareness to the perceptive player. Such a feature could be considered relevant to both the N-Score and the L-Score. Likewise, in an exploration intensive RPG interesting environments may be necessary to realize the goals of the game's design, making those environments important from a design perspective as well as an artistic one. Despite any overlap between what is being measured by the two metrics, each metric is still able to stand on its own.

All games are scored relative to what they are trying to achieve, with the very highest scores reserved for true innovation. The traits that make a good RPG are simply quite different from those of an action game, and so the game's concept must of course be in mind when considering the quality of the game's design. Similarly, when judging a game's narrative it would be silly to expect the same level of exposition from a shmup as from an RPG. The narrative of a shmup is less about plot and more about evoking a certain feeling through music and visual presentation. Genres which are more narratively focused will in some ways be judged to a higher standard. The fact that many story-focused RPGs require 40+ hours to finish places a huge burden on developers to create a consistently strong narrative and interesting setting. A five stage shmup should not be punished for having less content(unless more content would make for a better shmup.)

There has been a lot of debate recently concerning game review scores, with several print magazines altering or eliminating their review scoring system(EGM and Play, respectively.) I believe the main reason for dissatisfaction with most current game review metrics is that they no longer accurately reflect gamers' increasingly sophisticated view of the medium. Games are simply more complex than other forms of consumer entertainment, and as video game consumers continue to become more sophisticated they will demand more sophistication from video game critics. The solution is for video game critics to draw from the emerging field of game studies. My proposed L/N scoring system is the first step toward applying game studies research to video game review scores.

