GPU Particles

Introduction
Team Size: Just me
Engine: TRK (Our Custom C++ Engine)
Duration: 100 hours 
Project Information: As part of the specialization course I had 100 hours to dive into an interesting topic and create a website. So I did what I do best, make particles.

Goal
Collisions and performance,  those are the two metrics which I set out to create and improve on the old particle system. The plan from the outset was to have particles which can collide with the depth buffer to simulate cheap but effective collisions. At the outset I had a goal of having a million particles without harming performance and it was on this basis that I began planning and researching.
Background
Particles can give a lot of visual feedback, but having many particles can quickly become a drain on resources, that's why I decided to create and simulate particles on the GPU. Simulating particles on the GPU also unlocks access to resources which allow for more complex, and impressive, behaviours.
To set the stage, our custom engine used my old particle system that was separate from the game world and handled creation, simulation and sorting on the CPU side. This has two major issues, bad performance and limited interaction with the world.
While it is possible to optimize to a large extent, having to rely on the CPU is a quick bottleneck. That is where the GPU and its structure becomes a natural choice. While normal CPUs are limited to around 8 - 32 threads, graphics cards often have in the thousands. This makes efficient multi threading for simulation and sorting of millions of particles much more efficient. 
Process
Moving To The GPU
Implementing the particle system was tricky but straightforward, it required compute shader support, storing the data on the gpu and sorting all particles. Step by step computation was moved from the CPU to the GPU, first simulation then emitting particles, creating vertex buffers and lastly sorting.
Sorting was the real hurdle due to the complexity of implementing a sorting algorithm that can utilize the numerous threads of a graphics card. A bitonic sort implementation was taken from another repository by GPU-OpenLibraries.
The Pipeline
The simulation consists of four steps, emitting, simulating, sorting and creating render arguments. Emitting particles consists of pulling a dead particle index from a consume buffer and initializing this index in the particle pool. Each particle is then simulated on a thread each and then appended to a live list if it is alive. Lastly the living particles are sorted based on their distance to the camera and added to a command buffer which is used to render the particles.
Simulating using shaders enables sampling of textures, which describe the world, in an efficient way. One method used for my simulation is checking collisions against the depth buffer. As the depth buffer is already calculated every frame it can simply be reused in combination with the normal texture from the graphics buffer to get a complete representation of the depth and normals for every visible surface. Using the normalized device coordinates from the particle, the depth can be sampled and checked for if the particle is colliding, then the particles velocity can be reflected against the surface normal to simulate a collision.
Design Issues
In the end it became apparent that this was not a long term solution for a particle system due to the single particle pool rendering all at once. This meant that different particle types could not use different shaders, blend states or textures without a large and unwieldy übershader.
This example is of a particle effect which contains particles with four different materials. They all use different textures and one uses a separate pixel shader.
Buckets O' Particles
To solve the issue of multiple materials, particles are rendered in a bucket for each material. Instead of filling a list with the indices and depth of each living particle, the same is done with the index and material type. Particle indices of the same type are grouped together in buckets.
When the particles are bucketed in such a way the textures and shaders can be changed between draw calls. Now the particle effect which previously only showed one kind of particle can display many.
Due to the bucketing of particles, using sorting to handle transparency becomes impossible. Without the bitonic sorting, another solution is required.
Transparency Without Sorting
Order Independent Transparency (OIT) is a technique used to imitate the way light behaves when passing multiple transparent volumes and the important part is that it is, as the name says, order independent. To achieve this effect instead of rendering and blending the particles directly to a texture the render process is split in two, one transmittance pass and then a colour pass.
The transmittance pass works by creating a polynomial function that most closely follows the curve of light absorption for a ray of light sent through that pixel. If a very opaque object is rendered close to the camera this means that the function will shift to have a steeper dropoff and one further away will have less impact [Please consult the graph].
During the colour pass this absorption function is used to calculate how much colour is added to the final texture based on the depth of that which is rendered.
End of the line
With the limited time afforded to creating this particle system some things had to be cut, and as such the clock ran out on the partial implementation of order independent transparency. 
Results
In the end the initial goals were met and turned out very well but the expanded requirements still have some work left.
Collisions
Depth buffer collisions were deceptively simple to implement once the entire pipeline had been moved to the GPU and gave a striking result. Interactions with the environment, like these collisions, don't have to be flashy to give the particles a greater sense of belonging to the scene.
Performance Improvement
Performance is one of the key factors to particle systems and so determining the success of the new system requires comparing the new to the old.
Although hard to spot the graph contains not only the GPU system but also a much weaker CPU alternative. When aiming for the target of 144 fps the old particle system managed the steady 28 thousand particles, compared to the new with almost one million it is rather less impressive. The difference widens further at 60 fps where the GPU manages to be 47 times more performant spitting out 2.8 million compared to the 60 thousand of the CPU.
Future Work
Order Independent Transparency - Already mostly implemented, if the GPU particle system were to be used in a future game I would prioritize finishing OIT for significant modularity and visual improvement.​​​​​​​
Force Fields - Right now the only way for the particles to interact with the world is through depth buffer collisions, but force fields can be used to create entirely different types of effects. Simulating winds could lend believability to windy environments and giving a slight gravitational pull to a mystical figure could create an ominous atmosphere.
Vectorfields Besides world interaction the particles have a limited set of movement controls, vectorfields could make the movement more interesting and customizable. Effects such as jagged electricity or a slowly creeping darkness could easily be tailor made to the situation at hand
Optimization -A final improvement is that of optimization, while it is leagues better than the previous system, very little time was afforded to optimizing the new simulation pipeline. Such as reducing the size of particle vertices and instance data or culling particles which are  outside the camera frustum would be a good start.
References & Resources
Going into this project without having ever touched a compute shader I relied on three sources quite extensively. The first is a great GDC talk about GPU particles simulating collisions. The second is the open source repository accompanying the GDC talk. Lastly is the third, a very detailed paper about moment based order independent transparency. 
Without these great resources I would not have gotten nearly as far as I did, so all props to them for having shared such helpful information!
The fantastic boat models used to showcase collisions were created by Sebastian Claesson

You may also like

Back to Top