Scaling Game AI Using Collaborative P2P Overlay

Neha Singh
IIT Bombay, Mumbai, India
neha_ns@cse.iitb.ac.in

ABSTRACT
For rich player experience, the games today incorporate complex artificial intelligence (AI) of the computer generated characters. Incorporating this intelligent behaviour being computationally intensive, it is not possible to scale it on a limited resource centralized server as more and more players are added. Rather we need organic scaling of resources. In this paper, we propose an efficient technique for scaling computationally intensive game AI by using the resources of the end-systems. Firstly, we propose a dynamic partitioning index based on quadtree for the virtual space and its mapping to the peer set with heterogeneous resource availability. Secondly, we propose a distributed event delivery model to get neighbouring state updates while minimizing communication overhead and latency. Thirdly, we introduce an efficient technique for finding aggregates over mobile objects with data stored in a distributed system by extending the multi-resolution aggregate trees to work in a distributed system and over mobile objects. Lastly, we propose a multi-phase update algorithm that is highly concurrent but also atomic with respect to read queries. The analysis and experimental results prove the efficacy of the algorithms.

1. PROBLEM AND MOTIVATION
Since the early days of SIMNET [2], a U.S. government project for large scale combat simulations, to the recent boom of Massively Multiplayer Online Games (MMOG), efforts to allow people to interact in realistic, immersive virtual environments have gone a long way. Networked Virtual Environment or NVE is a shared synthetic environment distributed across several computers and linked by a network where players interact with computer controlled characters. Interesting artificial intelligence(AI) of these characters is one the key factors for the success of such NVE applications. Such computer controlled characters are widely used in commercial games such as War of WorldCraft, RealTime Strategy games, Virtual Training Simulators (e.g., for military training such as Close Combat Tactical Trainer), etc. One of the most popular games -“World of Warcraft” boasts more than 6,000,000 players with 500,000 players being connected at any time [1].

Currently most MMOGs have a centralized architecture - either a single server or a cluster of server. The hosted infrastructure for a commercial grade MMOG requires the deployment of hundreds (or even thousands) of servers. For instance, Second Life where players develop their own identities and add elements to the game as they go, uses nearly 1,000 dual processor/dual- core systems [1]. So at 4,000 processors, this is a pretty intensive system. For games involving rich artificial intelligence (AI), this becomes an even greater cause for concern as it requires computationally intensive scripts.

There are two common strategies employed for scaling. First is by deploying larger number of server clusters. However as the capacity requirement increases, the increase in resource does not provide a linear increase in the server capacity. Second is by introducing multiple worlds by zoning. Zoning refers to the technique of tiling the virtual environment into areas small enough for a single server to handle. Only players within a common zone are allowed to interact, typically by forming an event broadcast group. However this multiple world architecture is not truly one coherent virtual world. It also lacks flexibility and limits the deployment of user-designed extensions. The server has to be over-provisioned to handle peak loads and this approach is costly for server-side bandwidth, hardware, and maintenance.

The solution to scaling NVEs to such massive proportion calls for an architectural change. Thus rather than a priori allocation of resource, it needs to increase resources in an organic fashion with the number of players and so P2P systems offer a natural solution. The first challenge is to split the set of data points across the end systems or peers. Also this partitioning needs to be dynamic, as the hot spots change as game progresses, and also need to take care of heterogeneous resource availability at end systems. Now that we have split the data onto multiple peers, the second problem is of answering aggregate queries over a given space containing mobile point data (representing moving game players) when the data is stored in a distributed system. This is required because scripts controlling computer controlled characters require data in aggregate rather than raw data. For instance scripts like, “run in fear if the number of enemies exceed the number of friends around” or “find the minimum health player”, etc., require aggregates like count and min respectively. In a centralized system, we can address this issue by building a dynamic index for each type of aggregate as done in [6]. In a distributed computing system however, the data now is split across various systems. Every peer has a local state and they communicate asynchronously with each other by passing messages over the communication channel. As entities move from one node’s region to another, they are transferred using messages. Thus the problem of getting the aggregate over a sub space reduces to getting partial aggregates from multiple peers and combining them to get a consistent result. In a distributed computing system absence of a global common memory or clock makes it difficult to record the global state of the system efficiently. Combining local snapshots may lead to inconsistent answer due to indeterminate transmission delays in the communication channel. Thus recording a consistent global state of a distributed system is not a trivial task.

In this paper, we present an efficient solution for running such computationally intensive game AI on a peer-to-peer overlay thus utilizing the processing resource and bandwidth of the peer systems. The key contributions of this paper are as follows. Firstly, we propose a new architecture to partition the NVE and map the regions on to a peer-to-peer overlay. This partitioning preserve the locality of data points, the key to minimizing communication overhead. Since the entities keep changing their position, the partitioning method allows dynamic load balancing over the set of live peers.It also handles heterogeneous resource availability across peers. Secondly, we present an efficient distributed event delivery model for the peers to exchange the state information. To test this we evaluated multiple communication models differing in their latency, communication overhead and processing overhead. Thirdly, we describe an efficient method to get such spatial aggregates over mobile data points. We extend the multi-resolution tree, wherein aggregates are stored with decreasing resolutions as the depth of the partitioning tree increases, to support a dynamic object set. Our read query and aggregate tree update protocols ensure that updates are atomic to the read. We first present the naive update protocol and then a highly concurrent multi-phase update protocol. We then present the analysis and experimental results to substantiate our claims.

2. RELATED WORK
To the best of our knowledge, there has been no work on scaling game AI on a peer-to-peer overlay and on index structures for aggregate data over moving objects in a distributed system. The need for organic scaling of resources for supporting multi-players games has lead to many recent research work based on using peer-to-peer system like in [2], [5], [6], etc. However, in them the common approach is by sending unicast or multicast messages to send out object state information. Unicast does not scale well as the number of concurrent users increase. Multicast is able to scale when combined with some partitioning method for instance in [5]. In this the whole space is statically partitioned and divided into a fixed number of regions and a peer is chosen to be its coordinator. This system does not handle effectively changes in the hotspots in the virtual space. It proposes either fixing limits on the player density, static partition of different sizes or dynamically repartition the whole region. The first two methods reduce flexibility and the third leads to high overhead. In [10], the game space is dynamically divided depending on the positions of player characters using Voronoi graph partition algorithm. There has been some recent work on scaling processor intensive games on a centralized architecture using multi-query optimization [6]. However the techniques suggest in this work are complimentary to our work and can be used to further scale our system by using multi-query optimization on each peer. Getting spatial aggregates for Multi-Resolution Aggregate (MRA) tree has been proposed in some papers like in [3]. However [3] considers a centralized architecture with static data and its key proposal is an iterative algorithm that gives aggregate answer with increasing confidence interval as more nodes of the MRA tree are explored. This work is again complementary to our model with dynamic data and can be used to extend our work for getting approximate aggregate with increasing quality.

3. BACKGROUND
In this section we discuss the DHT used, AI in games and the simulation engine. Pastry [9] is a peer-to-peer routing technique that is efficient, scalable, fault resilient and self-organizing. It is used to map the quadtree regions on to the set of live end systems or peers. The peer nods are assigned a nodeId. It is prefix based; given a key, Pastry routes an associated message towards the node whose nodeId is numerically closest that of the key, among all live nodes. The game consists of mainly two classes of entities - the entities controlled by the human players called player entities (PE) and the computer controlled entities (CCE) - entities created by the game designer and controlled by the computer. The action of the CCEs are controlled by scripts [11]. AI in games is used for generating intelligent and complex behavior for these entities. These scripts take the information of the environment around the entity’s location as input and calculate their next move. They are computationally intensive and hence are usually not scalable to more than a handful of entities in a centralized system [11].  Also though the NVEs give a feeling for continuous motion, the processing of the game AI scripts is done by a discrete event simulator and the continuous effect is given by the rendering engine [11]. Thus all computer games are architected so that the AI engine processes its objects in clock ticks while still giving a feeling of continuous motion to the players.

4. PROPOSED SOLUTION
In this section we present the algorithms used for partitioning the virtual space onto the set of live peers, the distributed event delivery model for getting regular updates of the state of entities in the neighbouring partitions and the protocols used to get consistent aggregate over this set of distributed mobile entities.

4.1. PARTITIONING THE VIRTUAL SPACE
The global space is divided onto the distributed set of live peers. There are two main criteria for evaluating the partitioning data structure - dynamic load balancing and locality of data. The hot spots on the virtual space change as the entities change their position, the spatial partitioning needs to be dynamic. We note that the entities are interested not in the whole global state of the NVE but only a small region around its position called the area of influence (AoI). Area of influence (AoI) of a player is defined as region which it takes into account for calculating its next move. Preserving locality of data while mapping the entities onto the peer-to-peer overlay greatly reduces the communication overhead.

There are a variety of data structures available for partitioning the virtual space across different peers like hashing techniques, space filling curves [4] [8], tree-based structures [10] [11]. Among these, the locality of data is best preserved in tree-based data structures. The most common among these are R-trees [11], K-D trees[10], Quad-tree. Out of them, dynamic load balancing is very expensive in terms of communication overhead in the first two. Quad-tree (Figure 2) offers an efficient way of regular decomposition of the space. The partitions are independent of the order in which the data points (or entities) are inserted and the decomposition is implicitly known by all the peers in the system. The data points are stored in the leaf nodes of the quad-tree. Since these points are mobile, in case the location of a data point moves out of a peer’s region, it passes the object’s data to the relevant peer using message communication. We assume these peers to be processes connected by a bidirectional channel. Message communication is FIFO, asynchronous, delivered reliably with finite but arbitrary time delay.

4.2 DISTRIBUTED EVENT DELIVERY MODEL

Without redundant processing
Overlap model with redundant processing of the fringe region on neighboring node
Figure 1: (a) Without redundant processing p2 requires state information from region A (b) Overlap model with redundant processing of the fringe region on neighboring node
In this section we discuss a framework for efficient data dissemination in distributed simulation environment in a scalable fashion. Running scripts for the CCEs require the surrounding environment as input which may get split across multiple peers due to partitioning of the virtual space. Hence we require a framework for getting updates from neighbours at each clock tick. The key idea exploited here is the temporal and spatial locality of the users in the virtual world to limit size of maximum relevant region i.e., each entity is only concerned about the state of the environment in its AoI. We define the fringe region as the entire region who’s state information can be relevant to the neighboring partition for processing the next move of the entities in it. We first discuss the two existing models for this and then present our overlap model. Let RTT be the average round-trip time between the peers and γ be the average processing time for each peer.

4.2 DISTRIBUTED MULTI-RESOLUTION AGGREGATE TREE
For getting the aggregates, we construct a multi-resolution aggregate tree (MRA tree) on top of our partitioning tree structure. A MRA is a multi-dimensional index structure in which the leaf nodes store the data points and aggregates whereas the non-leaf nodes store just the aggregates of all data points (or entities) indexed by the node. In this case, the leaf nodes store aggregates <sum, count, min, max>, whereas the intermediate nodes store the aggregate in the form <sum, count,minarray,max array>, where minarray and max array is the list of min and max values for each of the child nodes. In the following section we present the protocol to get the aggregate set over any region of space.

4.2.1 Reader's Protocol
Consider an aggregate read query fired to get the aggregates set over the query region Q ⊆ Rspace. The query traverses the index structure top-down using message passing and selectively exploring the nodes. Given a query region Q and a node N, there can be the following possible relations between Q and N - contained, partially overlapping, enclosing and disjoint. The node’s aggregate is not relevant for the query in case it is disjoint with Q while the aggregate over all data points of N is needed in case Q encloses it. In both these case further traversal is not needed. It is needed only if Q is partially overlaps or is contained in N.

Distibuted quad-tree
Figure 2: (a) Partitioned virtual space (b) Distributed quad-tree index

We use locks for concurrency control. A naive way to ensure concurrency control is to acquire locks on the nodes while traversing down the tree and release the locks only after the read is complete. However this reduces the index structure concurrency. To ensure both concurrency and atomicity, we use the well known crabbing protocol. In crabbing protocol, the root is first locked in shared mode. After acquiring lock on all required children in shared mode, the lock on the node is released. This prevents an update coming from top-down to overtake the read.

4.3 CONCURRENCY CONTROL IN DISTRIBUTED MULTI-RESOLUTION AGGREGATE TREE
The points being indexed by the distributed MRA tree are mobile and hence their location may change. Also new data points may be added or deleted. All these actions generate an updates which require the distributed MRA tree to be updated and the aggregates stored at the nodes indexing those data points to be modified. Updates percolate from leaf nodes to higher levels of the hierarchy. Our aim is to update the distributed MRA index such that these modifications are atomic with respect to the aggregate read queries. We define the following terms:

In the following sections, we present two algorithms for maintaining the distributed aggregate index structure such that the updates are atomic for concurrent the read queries. In both we use locking for concurrency control.

4.3.1 Naive Update Protocol
For acquiring and releasing locks, we use the concept of the update tree. Update tree represents all the nodes that can be affected by that transaction. Thus the naive way is to get an X-lock on all the nodes of this tree and then update them. Now there are two ways to acquire the locks – either top-down or bottom-up the update tree. Acquiring the lock bottom-up can lead to a deadlock with the read query coming from top-down. Hence we acquire the locks top-down. The protocol consists of two phases called the acquire lock phase wherein locks are acquired top-down the update tree nodes and the update phase, wherein the aggregate tree is updated bottom-up and the nodes release locks after updating aggregate. Root node releases lock only after update over in both legs. Bottom-up propagation of updates is essential for aggregates like min and max.

4.3.2 Multi-phase Update Protocol

Compatibility Matrix
Figure 3: Compatibility Matrix
The above naive protocol faces the problem of low concurrency and higher read time because it keeps the update tree root node X-locked for the entire duration. Greater concurrency is difficult because read query comes top-down, and updates go bottom-up. Root node is the last to be updated and first to be read. To address this, we propose a highly concurrent multi-phase update protocol. The key modifications introduced in this protocol are as follows. First, the nodes are updated top-down rather than bottom up as in the naive case. This is done by splitting the update in three phases namely acquire lock phase, propagate phase and refresh phase. Second, we hold the X-lock on the nodes for a very small duration during the update phase. Concurrent read queries (using S-locks) are allowed while acquiring locks and updating other nodes. This is done by introducing a new locking mode that we call U-lock, which is compatible with the S-lock. Third, to prevent read query from overtaking top-down update and reading inconsistent values, we use crabbing protocol for acquiring X-locks to update nodes.

Comparative analysis
Figure 4: Timeline of the state of update tree’s root node under the update protocols

The new U-lock (Figure 3), is basically to lock nodes for possible future modifications. U-S modes are compatible which signifies that the read query can proceed while the update is modifying other nodes of the update tree. However U-U modes are incompatible which means that conflicting updates need to wait for each other. The first phase of the multi-phase update protocol is the acquire lock phase, wherein locks are acquired top-down the update tree, similar to the naive case. However here the nodes get locked in U-lock mode rather than in X-lock mode. Top-down order needed to ensure no deadlock between concurrent conflicting updates. The second phase is propagate phase wherein update gets propagated bottom-up from leaf nodes. But they are not reflected in the nodes, rather stored as pendingUpdates. After this ends in both update tree legs, root node starts the refresh phase. In this phase, the U-lock of the nodes get upgraded to X-locks and the stored pendingUpdates are reflected top-down. Crabbing protocol is used for acquiring and releasing the X-locks on the nodes to prevent read to overtake top-down update and read inconsistent values.

The serialization order for a read and an update transaction is the order of acquire of S-lock by read and X-lock by update. For concurrent intersecting update queries, the serial order is same as the lock points at the unique highest node of the common update tree twig pattern. Figure 4 shows the difference in the relative time for which the update tree root node remains locked in each protocol. This is an important indicator of their relative concurrency because the read query accesses the aggregate tree top down and cannot read any other intersecting node until it reads the root node.

5. UNIQUENESS OF THE APPROACH
We address the problem of scaling game AI on a P2P overlay. With respect to partitioning the virtual, though there have been some methods proposed in literature, but none to do dynamic partitioning (as the hot-spots change in the game) over a dynamic peer set with heterogeneous resource availability. For the distributed event delivery model, there is no parallel to the proposed overlap model which combines the benefit of both the push and the pull models. There has also been no work, to the best of our knowledge, for finding aggregates over distributed system with mobile date points. We have proposed a highly concurrent multi-phase update protocol for the same and have proved its serializability w.r.t. concurrent read and updates.

6. EXPERIMENTAL RESULTS
Comparative analysis of the message overhead in the three models
Figure 5: Comparative analysis of the message overhead in the three models
Figure 6: Time taken by the read query for different workloads of the (a) Naive Update Protocol (b) Multi-phase Update Protocol. F stands for the frequency of updates. Here F2 > F1

The experimental set-up consists of synthetic data set with non-uniform distribution of data points. The DHT used is FreePastry implementation of Pastry DHT. The maximum depth of the quadtree is set to 14. Every peer node specifies its threshold as count of the number of entities it can support, which can be as low as 0. The total number of data points varied from 100 to 10000, organized into 10-100 clusters of points. Each object’s data value is randomly taken from distribution (µ = 100, σ = 50) with care to avoid negative values. Figure 5 shows that the overlap model has low message overhead, comparable to that of pull model. Figure 6 shows that the average read time taken increases much less for multi-phase update protocol as compared to naive protocol as the frequency of updates increase.

7. CONCLUSION
For rich user experience in immersive virtual environments, it is important to increase the number of computer generated characters with intelligent behavior while also providing a cost effective solution. In this paper, we proposed a model to scale such computationally intensive networked virtual environment (NVE) using P2P overlay. For dynamically partitioning the space over mobile data points and heterogeneous peers, we use a quadtree based distributed data structure. Secondly, we proposed an overlap model having low latency and low overhead, for distributed event delivery. Further, we address the problem of finding aggregate over mobile point data. We extend the multi-resolution aggregate index to support mobile data and also run on a distributed system. The protocols proposed here can be used for supporting an aggregate index over mobile point data in a centralized system as well. The key function is to update the aggregate index when the location of the data objects change, while ensuring that the updates are atomic with respect to the read queries. We propose a multiphase locking update protocol which shows high concurrency for the read queries, while ensuring that read and multiple simultaneous updates to the index structure are serializable with respect to each other. The experimental results prove our claims.

8. REFERENCES
[1] Tom gibbs. avatars and grid 2.0. in grid today, july 2006. http://www.gridtoday.com/grid/720347.html.
[2] J. M. Calvin, A. Dickens, B. Gaines, P. Metzger, D. Miller, and D. Owen. The simnet virtual world architecture. In VR 1993.
[3] I. Lazaridis and S. Mehrotra. Progressive approximate aggregate queries with a multi-resolution tree structure. In SIGMOD '01
[4] H. V. Jagadish. Linear clustering of objects with multiple attributes, In SIGMOD 1990.
[5] B. Knutsson, H. Lu, W. Xu, and B. Hopkins. Peer-to-peer support for massively multiplayer games, 2004
[6] W. White, A. Demers, C. Koch, J. Gehrke, and R. Rajagopalan. Scaling games to epic proportions. In SIGMOD ’07
[7] M. R. Macedonia, M. Zyda, D. R. Pratt, P. Barham, and S. Zeswitz. Npsnet: A network software architecture for large scale virtual environments. In Presence '94.
[8] J. A. Orenstein and T. H. Merrett. A class of data structures for associative searching. In PODS '84
[9] A. I. T. Rowstron and P. Druschel. Pastry: Scalable, decentralized object location, and routing for large-scale peer-to-peer systems. In Middleware ’01
[10] P. Ganesan, B. Yang, and H. Garcia-Molina. One torus to rule them all: multi-dimensional queries in p2p systems, WebDB ’04
[11] A. Mondal, Y. Lifu, and M. Kitsuregawa. P2PR-Tree: An R-Tree-Based Spatial Index for Peer-to-Peer Environments. Springer- Verlag New York, Inc, '04.