I haven't said much about metrics here for about eight months, but that doesn't mean I haven't had more to say. I've just been sorting it out. :)
As I mentioned last time, there are different kinds of metrics to meet different needs. Operational metrics can often be gathered into 5-minute averages as they come out, those can be aggregated into hour-long averages a few hours after they're generated, and a week or two after that they can be rolled into 24-hour data points - the amount of data you need to actually store is tiny. It's trivial to store this data in MySQL, never mind every other relational database on the planet. Most of the work is even already done; just feed it to things like Cacti or MRTG and you're done.
The data that is ultimately visible to players on the web is also a relatively small amount of data: keeping the last 1,000 character events around for a blog system, a full character sheet, PvP rankings, etc. could all be stored in a handful of GB. Any relational database you care to name can probably handle it, on a cheap server from whatever vendor you like, on any storage solution you like - the whole thing can even be backed up onto a $20 flash drive if you want backups. Again, this is not a big deal.
...And then there's the meaty metrics, the things that let you generate pretty graphs of "death by level and zone" for the past month of game activity. Turns out, that's quite a bit of data. Darius Kazemi identified storage to be one of the big hurdles in dealing with gameplay metrics, in his talk with Larry Mellon at Austin GDC: Wake Up and Smell the Metrics! From the informal survey I've performed, a terabyte of data per month is easy to generate with decent coverage and a decent number of players; the field of suitable databases suddenly narrows.
To handle a terabyte of data (assuming you can condense most of the data down to reports once it's a month old), you have to invest a bit more in your database server: a RAID of high end disks is the tip of the iceberg, 8GB RAM is a good starting point, and 4 cores is a minimum (CPU is almost definitely not your first bottleneck). You're probably shelling out big money for your database license now, and adding capacity is far from a linear exercise.
In game development, we're often inclined to point to these kinds of problems and proudly declare that we're working on problems no one else is, so we probably can't use any one else's expertise. It's quite often true in game AI programming, graphics, and so on, but in this case the opposite is true: we've got options on who to emulate. The approach I outlined above is close to a low-end enterprise data warehousing model; they probably use more Java and have bigger budgets, but I think it's recognizable. MMOG servers can resemble enterprise software so thoroughly at times that even Sun Microsystems has started getting involved!
...But it's not the only model to follow. As it turns out, web developers deal with very similar problems: the access log on web servers gathers usage data at the same kind of granularity as "every time a character swings their axe," and moderately popular sites probably see data quantities on par with most MMOGs. Facebook claims it's logging system handles tens of billions1 of messages per day, which is hundreds of thousands of messages per second.
And you know what? These guys aren't logging all data to a central server immediately, because it doesn't scale. They aren't even trying to parse and load into databases. They aren't buying really expensive machines to choke down all of that data. Instead, they're going distributed, and utilizing something like Google's MapReduce - often in the form of Hadoop.
For comparison to the aforementioned "baseline beefy database server," I took a look at what a small cluster of decent Dell machines could provide. $10,000 - what could be, by itself, the price of a database license - can build a small cluster with 8TB of triply redundant storage2. With a system like MapReduce, you can actually utilize all that distributed horsepower to chew through data analysis. What's more, storage and processing capabality scales somewhat linearly with cost!3
I haven't been able to put this talk to any serious test just yet, but I have some ideas. Time-to-level data should be easy to generate, deaths per level, and even the relative popularity of different zones should all be straight-forward. It's a very different paradigm from relational databases, but then SQL presents its own unique challenges too. It's not a full solution yet by any means, but on the other hand the ability to store that much data, and process it however you like... it has a lot of potential.
1. High Scalability has the article where I first heard the number, although their math is off by an order of magnitude. Some of my arguments here are echoes of the conclusions over there.
2. You can easily hit 16GB RAM total, 32 cores, and 24 1TB disks spread out across 4 machines. Each datum being stored exists on three of the four machines, so whatever kind of hardware failure you have - up to and including one of the machines completely failing - data isn't lost. You also have higher total I/O bandwidth.
3. Some things, like network bandwidth or data center footprint or heat output, cause plateaus where the next step up requires only a modicum of computer hardware cost, but significant other expenses. So it's not completely linear, but compare it to scaling up a database server :-)
EDIT: added a footnote, and re-worded a few things - nothing big. :-)
Tuesday, December 2. 2008
Metrics: Planning
Trackbacks
HadoopDB: worth investigating
I've written a bit here before about giving up on traditional relational databases for gathering, storing, and analyzing game metrics data. All three steps have serious scalability issues for the "one big machine" model. However, SQL is often much more
I've written a bit here before about giving up on traditional relational databases for gathering, storing, and analyzing game metrics data. All three steps have serious scalability issues for the "one big machine" model. However, SQL is often much more
Weblog: Anson the Gnome
Tracked: Jul 23, 06:11
Tracked: Jul 23, 06:11


Being familiar with scalable enterprise-class data warehousing backends and their cost, your 'small cluster of decent Dell machines' to do essentially the same thing is mind-boggling.
I like it.
I suppose you can start with basic early and only say report on enemies faced, loot found, items created, etc. Then as CPU power becomes available include things like DPS delivered/taken, healing done, reources harvested, etc.
It's a fun problem :)
With a cluster of machines all sharing the archived data, and all with horsepower to do analysis on at least sections of it (which is what Hadoop basically boils down to), chugging through the last month of data to answer a new question isn't that hard. The new hard part is formulating the question in such a way that MapReduce can answer it. :-)
Hadn't seen Hadoop, similar to BigTable?