BEPUphysics in a CoreCLR World

A lot of exciting stuff has happened in the .NET world over the last year, and BEPUphysics is approaching some massive breaking changes. It seems like a good time to condense the plans in one spot.

First, expect v1.4.0 to get packaged up as a stable release in the next couple of months. At this time, I expect that v1.4.0 will likely be the last version designed with XNA platform compatibility in mind.

Following what seems to be every other open source project in existence, BEPUphysics will probably be moving to github after v1.4.0 is released.

Now for the fun stuff:


BEPUphysics v2.0.0

High Level Overview:

Performance drives almost everything in v2.0.0. Expect major revisions; many areas will undergo total rewrites. Applications may require significant changes to adapt. The revisions follow the spirit of the DX11/OpenGL to DX12/Vulkan shift. The engine will focus on providing the highest possible performance with a minimal API.

Expect the lowest level engine primitives like Entity to become much 'dumber', behaving more like simple opaque data blobs instead of a web of references, interfaces, and callbacks. The lowest layer will likely assume the user knows what they're doing. For example, expect a fundamental field like LinearVelocity to be exposed directly and without any automatic activation logic. "Safe" layers that limit access and provide validation may be built above this to give new users fewer ways to break everything.

Features designed for convenience will be implemented at a higher level explicitly separated from the core simulation or the responsibility will be punted to the user.

Some likely victims of this redesign include:
-Internal timestepping. There is really nothing special about internal timestepping- it's just one possible (and very simple) implementation of fixed timesteps that could, and probably should, be implemented externally.
-Space-resident state buffers and state interpolation. Users who need these things (for asynchronous updates or internal timestepping) have to opt in anyway, and there's no reason to have them baked into the engine core.
-All deferred collision events, and many immediate collision events. The important degrees of access will be retained to enable such things to be implemented externally, but the engine will do far less.
-'Prefab' entity types like Box, Sphere, and so on are redundant and only exist for legacy reasons. Related complicated inheritance hierarchies and generics to expose typed fields in collidables will also likely go away.
-'Fat' collision filtering. Some games can get by with no filtering, or just bitfields. The engine and API shouldn't be hauling around a bunch of pointless dictionaries for such use cases.
And more. 

Platform Support:

Expect older platforms like Xbox360 and WP7 to be abandoned. The primary target will be .NET Core. RyuJIT and the new SIMD-accelerated numeric types will be assumed. Given the new thriving open source initiative, I think this is a safe bet.

Going forward, expect the engine to adopt the latest language versions and platform updates more rapidly. The latest version of VS Community edition will be assumed. Backwards compatibility will be limited to snapshots, similar to how v1.4.0 will be a snapshot for the XNA-era platforms.

Areas of Focus:

1) Optimizing large simulations with many inactive or static objects

In v1.4.0 and before, a common recommendation is to avoid broadphase pollution. Every static object added to the Space is one more object to be dynamically handled  by the broad phase. To mitigate this issue, bundling many objects into parent objects like StaticGroups is recommended. However, StaticGroups require explicit effort, lack dynamic flexibility, and are not as efficient as they could be.

Inactive objects are also a form of broadphase pollution, but unlike static objects, they cannot be bundled into StaticGroups. Further, these inactive objects pollute most of the other stages. In some cases, the Solver may end up spending vastly more time testing activity states than actually solving anything.

Often, games with these sorts of simulations end up implementing some form of entity tracking to remove objects outside of player attention for performance reasons. While it works in many cases, it would be better to not have to do it at all.

Two large changes are required to address these problems:
-The BroadPhase will be aware of the properties of static and inactive objects. In the normal case, additional static or inactive objects will incur almost no overhead. (In other words, expect slightly less overhead than the StaticGroup incurs, while supporting inactive dynamic objects.)
-Deactivation will be redesigned. Persistent tracking of constraint graphs will be dropped in favor of incremental analysis of the active set, substantially reducing deactivation maintenance overhead. Stages will only consider the active set, rather than enumerating over all objects and checking activity after the fact.

On the type of simulations hamstrung by the current implementation, these changes could improve performance hugely. In extreme cases, a 10x speedup without considering the other implementation improvements or SIMD should be possible.

2) Wide parallel scaling for large server-style workloads

While the engine scales reasonably well up to around 4 to 6 physical cores, there remain sequential bottlenecks and lock-prone bits of code. The NarrowPhase's tracking of obsolete collision pairs is the worst sequential offender. More speculatively, the Solver's locking may be removed in favor of a batching model if some other changes pan out.

The end goal is decent scaling on 16-64 physical cores for large simulations, though fully achieving this will likely require some time.

3) SIMD

With RyuJIT's support for SIMD types comes an opportunity for some transformative performance improvements. However, the current implementation would not benefit significantly from simply swapping out the BEPUutilities types for the new accelerated types. Similarly, future offline optimizing/autovectorizing compilers don't have much to work with under the current design. As it is, these no-effort approaches would probably end up providing an incremental improvement of 10-50% depending on the simulation.

To achieve big throughput improvements, the engine needs cleaner data flow, and that means a big redesign. The solver is the most obvious example. Expect constraints to undergo unification and a shift in data layout. The Entity object's data layout will likely be affected by these changes. The BroadPhase will also benefit, though how much is still unclear since the broad phase is headed for a ground up rewrite.

The NarrowPhase is going to be the most difficult area to adapt; there are a lot of different collision detection routines with very complicated state. There aren't as many opportunities for unification, so it's going to be a long case-by-case struggle to extract as much performance as possible. The most common few collision types will most likely receive in-depth treatment, and the remainder will be addressed as required.

Miscellaneous Changes:

-The demos application will move off of XNA, eliminating the need for a XNA Game Studio install. The drawer will be rewritten, and will get a bit more efficient. Expect the new drawer to use DX11 (feature level 11_0) through SharpDX. Alternate rendering backends for OpenGL (or hopefully Vulkan, should platform and driver support be promising at the time) may be added later for use in cross platform debugging. 

-As alluded to previously, expect a new broad phase with a much smoother (and generally lower) runtime profile. Focuses on incremental refinement; final quality of tree may actually end up higher than the current 'offline' hierarchies offered by BEPUphysics.

-StaticGroup will likely disappear in favor of the BroadPhase just handling it automatically, but the non-BroadPhase hierarchies used by other types like the StaticMesh should still get upgraded to at least match the BroadPhase's quality.

-Collision pair handlers are a case study in inheritance hell. Expect something to happen here, but I'm not yet sure what.

-Wider use of more GC-friendly data structures like the QuickList/QuickSet to avoid garbage and heap complexity.

-Convex casts should use a proper swept test against the broad phase acceleration structure. Should make long unaligned casts much faster.

-More continuous collision detection options. Motion clamping CCD is not great for all situations- particularly systems of lots of dynamic objects, like passengers on a plane or spaceship. The existing speculative contacts implementation helps a little to stabilize things, but its powers are limited. Granting extra power to speculative contacts while limiting ghost collisions would be beneficial.

-The CompoundShape could use some better flexibility. The CompoundHelper is testament to how difficult it can be to do some things efficiently with it.

Schedule Goals:

Variable. Timetable depends heavily on what else is going on in development. Be very suspicious of all of these targets.

Expect the earliest changes to start showing up right after v1.4.0 is released. The first changes will likely be related the debug drawer rewrite.

The next chunk may be CCD/collision pair improvements and the deactivation/broadphase revamp for large simulations. The order of these things is uncertain at this time because there may turn out to be some architectural dependencies. This work will probably cover late spring to mid summer 2015.

Early attempts at parallelization improvements will probably show up next. Probably later in summer 2015.

SIMD work will likely begin at some time in late summer 2015. It may take a few months to adapt the Solver and BroadPhase.

The remaining miscellaneous changes, like gradual improvements to collision detection routines, will occur over the following months and into 2016. I believe all the big changes should be done by some time in spring 2016.

This work won't be contiguous; I'll be hopping around to other projects throughout.

Future Wishlist:

-The ancient FluidVolume, though slightly less gross than it once was, is still very gross. It would be nice to fix it once and for all. This would likely involve some generalizations to nonplanar water- most likely procedural surfaces that would be helpful in efficiently modeling waves, but maybe to simple dynamic heightfields if the jump is short enough.

-Fracture simulation. This has been on the list for a very long time, but there is still a chance it will come up. It probably won't do anything fancy like runtime carving or voronoi shattering. More likely, it will act on some future improved version of CompoundShapes, providing different kinds of simple stress simulation that respond to collisions and environmental effects to choose which parts get fractured. (This isn't a very complicated feature, and as mentioned elsewhere on the forum, I actually implemented something like it once before in a spaceship game prototype- it just wasn't quite as efficient or as clean as a proper release would require.)

On GPU Physics:

In the past, I've included various kinds of GPU acceleration on the development wishlist. However, now, I do not expect to release any GPU-accelerated rigid body physics systems in the foreseeable future. BEPUphysics itself will stay exclusively on the CPU for the foreseeable future.

I've revisited the question of GPU accelerated physics a few times over the last few years, including a few prototypes. However, GPU physics in games is still primarily in the realm of decoration. It's not impossible to use for game logic, but having all of the information directly accessible in main memory with no latency is just a lot easier. 

And implementing individually complicated objects like the CharacterController would be even more painful in the coherence-demanding world of GPUs. (I would not be surprised if a GPU version of a bunch of full-featured CharacterControllers actually ran slower due to the architectural mismatch.) There might be a hybrid approach somewhere in here, but the extra complexity is not attractive.

And CPUs can give pretty-darn-decent performance. BEPUphysics is already remarkably quick for how poorly it uses the capabilities of a modern CPU.

And our own game is not a great fit for GPU simulation, so we have no strong internal reason to pursue it. Everything interacts heavily with game logic, there are no deformable objects, there are no fluids, any cloth is well within the abilities of CPU physics, and the clients' GPUs are going to be busy making pretty pictures.

This all makes implementing runtime GPU simulation a bit of a hard sell.

That said, there's a small chance that I'll end up working on other types of GPU accelerated simulation. For example, one of the GPU prototypes was a content-time tool to simulate flesh and bone in a character to automatically generate vertex-bone weights and pose-specific morph targets. We ended up going another direction in the end, but it's conceivable that other forms of tooling (like BEPUik) could end up coming out of continued development.

 

Have some input? Concerned about future platform support? Want to discuss the upcoming changes? Post on the forum thread this was mirrored from, or just throw tweets at me.