v0.16.0 Broad Phase Benchmarks

Welcome to the first BEPUphysics development blog post!

Today's post is about broad phases and their performance characteristics.  The BroadPhase (accessible in the Space.BroadPhase property) prunes object collisions based on their axis-aligned bounding boxes.  Some also accelerate queries, like ray casts and bounding box tests.

In v0.16.0, the main broad phase has been rewritten and a few new options are available.  The data in this post was collected on the latest development version, which can always be downloaded on the development fork over on codeplex:   http://bepuphysics.codeplex.com/SourceControl/network/Forks/RossNordby/Development

The four broad phases being tested are:

  • Old DH: The old, formerly default DynamicHierarchy. Relies on a bounding box tree which is dynamically and incrementally updated for overlap detection.  Included for comparison purposes.
  • New DH:  The old dynamic hierarchy had some known inefficiencies which spurred the creation of a new version.  It's an improvement across the board compared to the old version, but is conceptually similar.
  • SAS1D: The third option being tested is SortAndSweep1D.  It uses a simple, standard implementation of the sort and sweep (sweep and prune) algorithm along a single axis.  Only the x axis is used in the implementation.  The interval list is incrementally sorted with insertion sort. 
  • Grid 2D SAS: Finally, Grid2DSortAndSweep combines one-axis sort and sweep with a 2d grid.  The grid helps prevent the clustering problem associated with fixed one-axis approaches and provides an easy route to parallelization.

Note that for every test in these benchmarks, the boxes are all 1x1x1 and randomly distributed.  This tends to give sort and sweep and grid-based approaches an advantage, but it should still be pretty close to practical situations.  The "MT" suffix means that multithreading is enabled.  The CPU used for the PC tests is a Q6600@2.4ghz.

The first test is an increasing number of boxes in a fixed volume.  

That's a bit of a line-splosion, but there's a few main things to notice.  First, SAS1D starts out great (in fact, the fastest at 100 objects), but as object density increases, it shoots upward off the graph.  Second, the New DH wins everywhere against the Old DH, moreso with multithreading enabled.  Grid 2D SAS multithreaded still wins overall by a bit, though.  New DH MT is able to push over 45,000 boxes in 1/60th of a second, and Grid 2D SAS MT hits 60,000 boxes in the same amount of time.

So, how do the multithreaded versions of the broad phases compare to their single threaded versions on my Q6600? 

All of the broad phases start below 1 due to the extra overhead introduced by using multithreading.  Most quickly overcome the overhead and scale decently.  However, the old dynamic hierarchy can't even get over 1 until a few thousand boxes are in the broad phase.  That means, for some mid-range simulations, the multithreaded version of the old hierarchy is actually a performance loss.  Yikes!

SAS1D clearly wins the scaling war for these immobile boxes, achieving a near linear speedup.  Grid 2D SAS comes close, but falls back at the higher end.

Those scaling numbers will change a bit depending on how the simulation is organized.  The SAS/Grid based methods are benefiting from the uniform distribution of tiny boxes here.

How does sparsity affect the performance?

SAS1D's problem with clustering is helped a lot as the density decreases.  Other broad phases are helped too, but to a lesser extent.

Some of the benefit associated with sparse scenes is counteracted in the Grid 2D SAS by the fact that it has to manage far more grid cells.  Increasing the grid cell size can improve performance a bit. 

Let's look at another configuration that accentuates clustering. 

New DH doesn't change at all.  SAS1D, on the other hand, explodes once again.  With all those boxes sharing axis-space, it gets closer and closer to brute force.  Grid 2D SAS manages to stay just as fast thanks to its division of space.

All of the previous tests used immobile boxes.  How about 10,000 moving boxes?  Note that these tests do not use temporally expanded bounding boxes which, if used, would probably benefit dynamic hierarchies relative to grid/SAS methods.  Under normal conditions, the engine does temporally expand bounding boxes.

All methods fare well on velocities below 10 units per second, but after that, they diverge significantly.  The formerly consistent and speedy Grid 2D SAS suffers a horrible fate on this jitter test.  The grid cell update process bottlenecks it into oblivion.  Oddly enough, the multithreaded version performs much worse than the single threaded version at high speeds due to contention on the grid cell set.

SAS1D also suffers.  It uses insertion sort, so as speeds get really high, it becomes closer to an O(n^2) operation.  Additionally, the sort is not multithreaded (the current test implementation adds too much overhead to be useful on practical numbers of objects), so its previously excellent scaling falters.

The DynamicHierarchy keeps chugging along just fine, though.

Fast jittering isn't a very common situation, though.  How about all 10,000 boxes moving uniformly?

Much better! They're practically identical.  But wait...

Ouch!  That graph is cut off, too; the actual bars for Grid 2D SAS extend about 10x higher.  What happened?

Grid 2D SAS focused on iteration performance over add/remove/search performance for the cell set.  For static scenes, this is good.  For a ton of fast moving objects, this is horrible.

SAS1D doesn't change at all though since the order of objects in the list does not change.

Enough of movement tests- how about memory usage?

The old dynamic hierarchy was quite a hog.  New DH manages to beat Grid 2D SAS, but SAS1D is extremely cheap.  If you're memory bound, SAS1D might be attractive despite its corner cases. 

QueryAccelerator Performance

Now for queries!  (Most) BroadPhases support queries through their QueryAccelerator property.  These include ray casts and bounding box tests.  Unfortunately, SAS1D does not support queries as it includes no acceleration structure.  Grid 2D SAS supports most queries, but cannot perform infinite ray casts or frustum tests.

Grid 2D SAS's cell scanning is noticeable as ray cast length increases.  Sticking to short rays is advisable.  Note, however, that the time scale is now in nanoseconds- if you really need to, spending 20 microseconds on a 100 length ray cast probably won't hurt much.

The hierachy-based approaches scale up with the number of objects that come close to being hit.  As the ray becomes longer than the volume containing boxes, the increase in time drops to almost zero.  The new hierarchy wins over the old one in queries, just like in every other test.

All three options scale up by box count decently, but the new hierarchy wins once again. 

Now for bounding box queries!

At small scales, Grid 2D SAS is very competitive.  However, the cell scanning needed at larger sizes can't compete with the hierarchies. 

The new hierachy handily wins at scaling once again.  Note that a bounding box query on a relatively normal amount of objects (625) takes less than half a microsecond. 

Other Platforms

That does it for the PC tests.  Here's a look at the Xbox360 running the very first test for comparison.

Grid 2D SAS's win is a bit more pronounced here, but the overall dynamic is similar. 

SAS1D still wins out in scaling, but Grid 2D SAS isn't far behind.

And finally, for the last test, a Samsung Focus doing the same test:
 

The new DH wins in the long term, but a phone isn't really going to be running 1,000 objects (let alone 45,000) in any reasonable simulation.  At small values, SAS1D wins.  On the other hand, at 100 objects, the slowest option (Grid 2D SAS in this test) only takes 400 microseconds.  That's not usually a dealbreaker for most simulations compared to other costs, but it's something to keep in mind.

Conclusion

The default broad phase in v0.16.0 will be the new DynamicHierarchy.  It has consistent behavior in all configurations and has speedy queries.  It doesn't always win, but it never fails explosively.

However, if your simulation has guarantees that avoid dangerous corner cases, it may be worth looking into SortAndSweep1D or Grid2DSortAndSweep.  Games targeting the phone in particular could stand to benefit from the low memory usage and speed at low object counts of SortAndSweep1D. 

If you need queries, SortAndSweep1D is not a good option unless you have your own custom query accelerator.  If you need frustum tests or infinite rays, stick with the hierarchy.

Grid2DSortAndSweep has potential.  In many scenes, it could be the best option.  The cell set implementation can most likely be improved.  Since it's the limiting factor in every test, it may become an excellent all-around option.  That will have to wait for a future version, though.

Got questions or comments? Visit the forum!