Sunday, June 15, 2008

L/N Implementation Details

In an earlier post I described a dual metric for reviewing video games called the L/N system. That first post details both my philosophy of assigning quantitative scores and the qualities to be measured by each metric. Now it is finally time to discuss the specifics of my implementation of the L/N system. I say 'my implementation' because the L/N concept itself only applies to the what and doesn't extend to the how. Other reviewers are encouraged to develop their own implementations of the system.

A Problem of Scale

Probably the most common set of complaints I see regarding quantitative scores relate to the scale being used, which is almost always linear. Sometimes there is confusion in interpretation because the scale used is numerically linear but semantically nonlinear. For instance, the difference between a 6 and a 7 is greater or less than the difference between a 9 and a 10. Another common source of misinterpretation regards what constitutes an average score. Percentile scales and 10 point scales are particularly susceptible to this type of misinterpretation. Many readers will interpret a 7 or 70% as an average score even though it is well above the median score of 5 or 50%. The print magazine EGM recently moved from a 10 point scale to a letter grade scale to mitigate this sort of ambiguity. To avoid confusion, the scale used should be numerically and semantically consonant. The numerical mean should match the semantic mean and the relative numerical differences should be consistent with their semantic interpretation.

Another common complaint related to numerical scales is that the differences between scores often seem arbitrary. Percentile scales, for instance, are much more precise than any reviewer can justify. Having too much precision in a scale will undermine it to the point that readers will begin to question the accuracy as well. Linear scales compound the precision problem because the greatest works are often much, much more impressive than the average, but extending a linear scale far enough to adequately represent that may make the scale appear to be overly precise.

My Solution

So, I want to avoid linear scales and use a nonlinear one that gamers have an intuitive grasp of. It is also best if the scale is not easily confused with a linear one. My solution is to use a scale based on 3d6. Yep, the very same scale used to measure ability scores in Dungeons and Dragons(for those not in the know, D&D players generate their characters' ability scores by rolling three 6-side dice and summing the results). The biggest numerical advantage to using 3d6 is that it is distributed normally, i.e. the graph of possible values is a bell curve. The major non numerical advantage is that it is familiar, and even if you don't know what a normal distribution is, you probably realize that rolling an 18 is quite a bit harder than rolling a 17. The scores themselves will be represented visually by images of three dice. This serves to add another layer of information based on the specific dice chosen to represent the score. A 13 could be represented by 5-4-4 or 6-6-1, for example. A score of 6-6-1 would indicate potential greatness brought down by one or more serious flaws.



To provide a little more detail, the median score is 10.5. Since I'm obviously only using whole numbers, this means the average score ranges from 9 to 12. Scores within this range account for roughly 48% of the population. Less than half of one percent of the population would have an 18. Of course I am not going to assign scores with the sole aim of fitting a probability distribution, but the distribution does help define the relative difference between scores.

Guidelines For Assigning Scores

Now that I have defined the scale, I'd like to put forth a few guidelines I will follow when assigning scores. A number of questions have been raised by readers so far and hopefully these guidelines will paint a clearer picture of my scoring criteria.

First, context is important when scoring a game. The very highest scores are reserved for those games which are both incredibly well executed and groundbreaking in some way, and innovation is meaningless outside of the context of when the game was released. The original DOOM would score higher than the many clones which followed, even though some may have been just as well executed. Technological context is also important. Were a new 16 bit console game released today, its technology-related aspects would not be compared against current gen consoles.

In addition to historical and technological context, the concept behind the game is also important. Many reviewers already rate games relative to their genre, but I would like to be explicit in my belief that reviewers should take it a step further and make an effort to divine the developers' specific ideals and goals. I am a firm believer that not all games, not matter how excellent, will appeal to all players. If the very concept of a game and what it is trying to achieve doesn't appeal to a player, then they probably will not like it. A game should be judged in light of these factors. However, this doesn't mean that the concept itself is beyond reproach. Lack of ambition, for example, will certainly prevent achieving the highest scores.

For determining the N-Score, a useful guideline is to consider how interesting the game would be if you were watching someone else play it. This doesn't perfectly capture my concept of the N-Score, but it is still useful to consider because many elements measured by the N-Score can be appreciated by someone other than the player, such as music, art design, story, etc. There are some crucial differences though. Player immersion is something that the N-Score should be concerned with, but immersion is hard to measure if you aren't the one playing. The visual feedback provided by video games in response to user input can make the user feel like a hero in ways that movies and other passive media cannot. This type of immersion isn't captured by the L-Score because the visual feedback often has little or no impact on actually playing the game. It is a complex topic, but hopefully these examples clarify the concept of the N-Score.


And with that, I believe I have written more about my video game review philosophy and metrics than any mainstream gaming web site or print mag. Kind of a shame considering that I haven't written a single review and they've written thousands.

EDIT: Added a paragraph on 'concept' to the guidelines.

EDIT: Eurogamer actually has a fairly detailed description of their scoring policy. It doesn't address the same problems as my system, but at least it provides a detailed semantic description of their scale.

Labels: , , ,

4 Comments:

At 1:42 PM, Anonymous Anonymous said...

So, you're suggesting a review would now consist of the following: L/N = 4-5-5/3-4-4 = 14/11 ?

That's a lot of digits... you may have crossed beyond the threshold of easy/understandable with this notion of 3 die for each score of L and N...

I think the definition of the normal distribution is key, and a welcome piece of this vision - but perhaps just define the distribution as you have and stick with L/N itself as the result (e.g. 14/11, and not the breakdown of how you *get* 14 and 11).

And is 3d3 not enough of a spread (and less misleading (than 18), as the top score would be a 9)? Is 3d6, itself, not a victim of being too precise? You are, after all, defining *two* 3d6 scores per game.

 
At 2:17 PM, Blogger Jon said...

My hope was that the visual graphic of three dice would reduce the complexity. Perhaps I'm wrong on that count though. Also, is it clear that the precise dice chosen don't have an exact meaning? It is meant to merely be a vague glimpse into the text of the review. Another reason for using dice is to make it clearer that the distribution is normal/D&D-like.

As for precision, I chose 3d6 primarily because it is familiar. I agree that precision is potentially a problem, but I do want to adequately capture the "high end" of achievement. Do you think odds of 1/27 is adequate for the highest score? I would like the max score to be a really rare event.

I am amenable to change though. This is what comments are for! I could always just use 4 stars, but I like complexity I guess :)

 
At 3:34 PM, Blogger statuskuo said...

Love your posts. Funny thing is I found it at penny-arcade.com but I just arrived here in Taipei about 8 hours ago. Keep em coming while I'm here :)

 
At 3:06 PM, Anonymous Anonymous said...

I didn't think you would make it more complicated, when I read your previous post.

As I said in an earlier comment: Keep the the 10 point scale -- it's familiar and easy to generate meta-scores like on Meta-Critic.

The L/N score will be two independent 10 point scales all you need to add is Buffs like Genre Buff and Sequel Buff which will just be +1 each.

Each reviewer would give a Naked Score, say, a 7/10. Each reader would voluntary add in the Buffs if they like the Genre or liked the Original Game.

 

Post a Comment

<< Home