Stress Testing the OpenSimulator Virtual World Server Introduction OpenSimulator (http://opensimulator.org) is an open source project building a general purpose virtual world simulator. As part of a larger effort to research scaling of virtual worlds, Intel Labs has been using OpenSimulator as a test case to understand the design requirements for the server portion of a multi-user virtual world system. The previous article explained OpenSimulator s architecture, and this article shows how the architecture affects the operation of various workloads. Workloads One way to measure the limits of a simulator is stress testing. This gives some idea of the upper bounds of the simulator operation and possible weaknesses of its implementation. Three stress tests were developed: scripts, physics, and avatars. Scripting is stressed by dynamically creating scripts until the CPU execution is saturated. Our particular test creates groups of scripted cubes until the creation time of the cubes exceeds some limit. While it is possible to write compute-bound scripts, in general scripts written inside objects in a virtual world tend to be small and timer- or sensor-driven. In order to measure the limits of many small scripts, we built a test which created small, timer-driven scripted objects. The actual script operation is simple (change color and rotate the object). Physics is stressed by dynamically creating physics objects which interact with other physical objects and noting how many physical objects can be interacting before the frame rate drops below an acceptable level. The dynamic creation of physical objects means the number of objects slowly increases until the frame rate limit is reached. The avatar stress test introduces active, moving avatars into a simulator until the simulator frame rate drops below an acceptable level. The active avatars move to random waypoints: they pick a destination and begin walking to that destination until it is reached, then choose another random destination and begin walking there. The operation is performed by an avatar-driving routine that simulates a user performing the walk forward command until the avatar reaches the destination. This simulates both the execution and communication load on the simulator for many active avatars. Scripts The scripted objects are created in groups of 4, and they are created until the time it takes to create the 4 objects exceeds some threshold. This is a rough measure of when the CPU is saturated.
1 24 47 7 93 116 139 162 Scripted objects Percent total CPU utilization As can be seen in Figure 1, as the number of scripted objects increases, the CPU becomes busier and busier until it reaches nearly 1% of available compute resources. This 1% of CPU is full utilization of all 16 hardware threads in a dual quad-core server (a dual Intel Xeon E554 processor-based server), so it demonstrates the multi-threading of the script engine and its ability to utilize all available processing power to execute the multiple scripts. The frame rate did not change, suggesting the script engine is scheduled independently from the main simulator heartbeat loop. 25 2 15 1 5 12 1 8 6 4 2 Objects PercentCPU Figure 1 Percent CPU utilization as number of scripted objects is increased. Physics A Galton box (http://wikipedia.org/wiki/galton_box) is a regular arrangement of pins on a board where balls are dropped onto one location at the top of the board, the balls bounce down the pins and drop out the bottom of the board in a binomial distribution. To test the limits of the physics engine, we built a 3D Galton box in our virtual world and dropped hundreds of balls into the top. This created many physical interactions, many individual physics actors, and a method of testing whether all the interactions are correct. The balls are scripted to disappear when they leave the Galton box so the physical objects on the ground do not affect the test results. For stress testing, we re not interested in the correctness, but in how many physical balls can be added to the Galton box before the physics engine becomes overloaded. In OpenSimulator, the physics engine performance is measured with a scaled frame rate. As described in the previous article, the physics engine is invoked every simulator heartbeat period and, for compatibility reasons, this is scaled up and reported as 46. Some testing has shown that when the physics frame rate drops below 3, the overall performance of the simulator degrades. Thus, the physics
1 41 81 121 161 21 241 281 Simulator Frames Per Second Avatars Physical objects Total percent CPU utilization stress test metric is the number of physics enabled balls that can be interacting in the Galton box when the reported physics frame rate drops below 3. As can be seen in Figure 2, as the number of physical objects in the Galton box increases, the frame rate decreases. New physical balls are created and enter the top of the Galton box. This goes on for some period of time and then new balls stop being added. As the balls leave the Galton box, they are scripted to delete themselves. The effect is for the 1 8 6 4 2 1 9 1725334149576573 Figure 2 Simulator frames-per-second and percent CPU utilization as the number of physical objects increases and decreases. 6 5 4 3 2 1 Objects number of physically interacting balls to grow and then decrease as the balls stop being added and the exiting balls disappear. Thus the shape of the Objects curve. Inverted from that curve is the simulator frame rate, which reduces as the number of physical objects increases and then recovers as they decrease. The line at the bottom of Figure 2 is the percent of total CPU utilization (percent of 16 hardware threads). The physics engine is utilizing only one hardware thread, thus the <1% total CPU utilization. One way to interpret the graph is that physical objects are added until the one CPU thread is totally utilized, then the simulator frame rate reduces as more physical objects are added. At around 4 physics objects, the physics frame rate drops below the threshold. PercentCPU Avatars As mentioned above, the avatar stress test consists of adding wandering avatars until the simulator performance begins to falter. The avatar creation and driving routine try to mimic the operations of a human by creating and operating the avatar with the normal login and navigation mechanisms. So the move forward action is performed by making the same protocol request as if a user pressed the key which makes the avatar move forward. The avatars are active by wandering around. This means that messages are being sent from the client to the server to operate each avatar, and also update information is being sent from the server to all the clients. When an avatar moves, a position update must be sent to all clients. This partially simulates the load and networking requirements of real users. 6 5 4 3 2 1 6 5 4 3 2 1 Agents Figure 3 shows the simulator frame rate as avatars are added to a scene. As the number of avatars is increased, eventually the simulator frame rate begins to fall. Avatars are logged-in in Figure 3. Simulator frames-per-second as the number of active avatars is increased.
groups of 25 to spread out the overhead of initialization. For OpenSimulator, about 35 wandering avatars start the degradation of the simulator s responsiveness running on a quad-core server, with the simulator frame rate dropping below 3 with about 45 wandering avatars. Observations The scripting and physics workloads offer a view of contrasting architectures. Two observations can be made about the script engine. First, it is multi-threaded and uses all the CPU threads to execute the scripts, and secondly, it is not tied to the running of the heartbeat thread. Thus, the script engine can execute many scripts without affecting the responsiveness of the simulator. This is shown in Figure 1, where as the number of scripts increases, the CPU utilization goes to 1% while the simulator frame rate does not change. The physics workload, on the other hand, utilized only 7% of a dual quad-core (16 hardware threads) server, which suggests that the physics engine is single-threaded. As discussed in the previous article, the physics engine is invoked on the simulator s heartbeat loop (a central loop which invokes object updates and physics several times a second). This means that OpenSimulator physics implementation suffers from two design problems: 1) it does not take advantage of multiple available hardware threads, and 2) because execution of physics happens on the heartbeat thread, an overloaded physics engine means slow simulator execution. This has several lessons for virtual world server design. The various functions of the simulator (physics, scripts, communication, etc.) should be multi-threaded to utilize all of the hardware threads available in modern servers. Additionally, the functions must be scheduled independently so their operation does not affect other functions. This leads to a server design of multiple independently scheduled modules which rely on locking of the central data structures. Conclusion In this article, we stress tested OpenSimulator and found the need for multi-threading in the sub-systems and the utility of independent task scheduling. Both of these promote scaling of the virtual world server. The next article will explore platform power and networking as it relates to a virtual world server. About the Author Robert Adams is a software engineer in Intel Labs and is a member of the Virtual World Infrastructure team investigating systems architectures for scalable virtual environments.
INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER, AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. Intel products are not intended for use in medical, life saving, life sustaining, critical control or safety systems, or in nuclear facility applications. Intel may make changes to specifications and product descriptions at any time, without notice. Designers must not rely on the absence or characteristics of any features or instructions marked "reserved" or "undefined." Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them. The information here is subject to change without notice. Do not finalize a design with this information. This white paper, as well as the software described in it, is furnished under license and may only be used or copied in accordance with the terms of the license. The information in this document is furnished for informational use only, is subject to change without notice, and should not be construed as a commitment by Intel Corporation. Intel Corporation assumes no responsibility or liability for any errors or inaccuracies that may appear in this document or any software that may be provided in association with this document. Intel processor numbers are not a measure of performance. Processor numbers differentiate features within each processor family, not across different processor families. See www.intel.com/products/processor_number for details. The Intel processor/chipset families may contain design defects or errors known as errata, which may cause the product to deviate from published specifications. Current characterized errata are available on request. Copies of documents, which have an order number and are referenced in this document, or other Intel literature, may be obtained by calling 1-8-548-4725, or by visiting Intel's Web Site. Intel, Core, Atom, Pentium and the Intel Logo are trademarks of Intel Corporation in the U.S. and other countries. *Other names and brands may be claimed as the property of others. Copyright 21, Intel Corporation. All rights reserved.