Dynamic generation of baked models
asiekierka opened this issue ยท 21 comments
See #39
The current proposal is basically:
Have a new FabricBakedModel
interface that extends the vanilla BakedModel
and provides an additional getQuads
method that takes a nullable block view and position in addition to the existing parameters.
There would be a mixin for BlockRenderManager::tesselateBlock
which, if the model is a FabricBakedModel
instance, calls the extended getQuads
with the additional parameters.
FabricBakedModel
would also provide a default implementation of getQuads(BlockState, Direction, Random)
that simply calls the extended getQuads
with a null block view and position.
The current concerns are primarily about preventing common pitfalls with threadiness, seeing as - while generally BlockState accesses in this scenario are thread-safe (I mean, for crying out loud, even Mojang does them for biomes), BlockEntity accesses are not (they are passed downwards to the underlying chunk and no copy is made).
An interesting approach would be allowing BlockEntities to provide an immutable object which is then copied to the client-side chunk cache as it is being built, but I'm not sure if it's worth the effort and how to balance performance with it.
In fact, I would even go as far as to provide a custom, wrapped block view here which (a) exposes said immutable object map and (b) more importantly, prevents certain ways of shooting yourself in the foot by handling non-thread-safe calls more gracefully?
Eh, the thread safety for BlockEntities is probably overkill. All of this, by definition, is going to run client-side on loaded chunks. A call to retrieve a BlockEntity, while perhaps not strictly supported, is unlikely to cause a fault.
Concurrency within the BlockEntity is something that mod authors would have to manage. In most cases they won't be mutating, or if they are, the result is going to be the same across multiple threads (because they are deriving values from neighbor block states or whatever) and so concurrency updates won't have any practical consequence.
In other words, typical use case is looking up static values or refreshing and storing cached attributes that will be effectively static across multiple threads that are using the same derivation logic and same inputs.
Not really. Chunk generation can easily happen, say, while a network packet is being read by the block entity, or when it's being ticked on the client side. It might be overkill, but it's definitely far from impossible.
What I'd propose is simply allow passing an object (of choice!) to the client-side chunk cache - if you want to simply pass your BlockEntity and not do any immutable cached objects or whatever, so be it. This will also mean we replace array(chunk)+hashmap(entity) lookups with array-only lookups for the cache, which is a performance win (and not a huge memory hit - the class already has at least two, one for BlockStates and one for FluidStates).
The rough way I would envision the system as it is now:
- The dynamic renderer largely mimics BakedModel but has an extra getQuads(), which receives a DynamicCacheView interface (extends BlockView) and a BlockPos.
- BlockEntities can implement DynamicDataProvider, which lets them return an arbitrary object.
- When creating a client-side chunk cache (now implementing DynamicCacheView!), all block entities in all relevant chunks are iterated (to save CPU time, we don't iterate on coordinates) and if the DynamicDataProvider itf is implemented the object (any object, including itself if you want) is copied over to the cache and accessible via DynamicCacheView (it also has a default implementation which wraps any BlockView and just gets the data from getBlockEntity() in case we end up off-cache).
As for the hash lookup on BlockState, why not simply replace it with a field mixined into the BlockState on the client side? That'd be a better place to put it and would also solve the problem for "static" models. Win-win! (In fact, I should implement it in a Fabric version of FoamFix come to think of it...)
You make some really good points.
In order for a chunk render rebuild to be initiated, the chunks have to already be loaded client side, but it is possible the client thread could update a loaded BlockEntity in response to a network packet.
I like where you are headed with this.
Yes, but it is also possible that a modder knows what they're doing and, say, builds the immutable render state on receiving a network packet, not doing it every chunk update, or ensures that everything access by the render thread is replaced with sufficient atomicity - and just returns the BlockEntity.
I think this approach is the best of all worlds.
Agreed, asie's proposed solution is good.
Though, I'd suggest the names to Render{CacheView, DataProvider} as that makes it clearer what the data is used for.
Yes. I wasn't quite sure on the names.
I'll wait on a few more eyes to go through this and I'll get to the implementation, i guess.
A bit in the weeds, but how expensive are instanceof checks for interfaces? I vaguely recall they were non-trivial, at least enough to matter in hot loops, though that may have been in earlier JVMs. I eliminated some instanceof checks a couple years back based on profiler results and it made a difference.
Would it be worth extending BakedModel so that it implements the extended getQuads method and ignores the new parameters by default? That way, no check is ever needed and the BlockTesselator always calls the extended method. Mod authors override the extended method if they need dynamic handling.
Similarly, BlockEntity could be made to implement DynamicDataProvider by default, and return null. Only non-null values would be cached, obviously. This would mean one call instead of two when the interface is actually implemented.
Would it be worth extending BakedModel so that it implements the extended getQuads method and ignores the new parameters by default?
Fabric API is meant to be relatively non-invasive; I wanted to "circumvent" most of the default BlockModelRenderer by making a junction out in the main tesselate() method, coming back to the vanilla pipeline only with tesselateQuads*(List<>). This means we can use not a Redirect (only one mod at a given time), but an Inject - and also that potential patches specific to static model rendering won't affect this pipeline.
As such, I think we'd need an instanceof check regardless.
Similarly, BlockEntity could be made to implement DynamicDataProvider by default, and return null. Only non-null values would be cached, obviously. This would mean one call instead of two when the interface is actually implemented.
That is more reasonable, but I think we might be "prematurely optimizing" a little here.
I think the current proposal should be fine. If it turns out to have performance issues, we can always change. Being on snapshots has the benefit of being able to make breaking changes more frequently.
Not to mention, designing with interfaces in mind means we can always introduce a faster path and just deprecate the slower one for a while.
I think this is something that should wait until Mojang finishes BlazeAPI since it has to do with rendering and Blaze is likely going to bring about a lot of changes. Unless, of course, you want to write a new rendering system from scratch(ish).
Citing Player
For carrying extra state: getQuads with another java.lang.Object parameter may work well, Forge's approach is overly complicated and sees a lot of abuse. The extra state should - if possible - be obtained only once for each block, before visiting the various sides or layers. It's likely most adequate to query the model instance for the state using block state+world+pos. This extra state shouldn't affect model lookup or anything else, it merely gets passed to all getQuads() calls. The semantics/creation/comsumption of the "Object" are completely internal to the baked model implementation. It nests nicely for nested baked models by putting the parent model's extra state in a field of itself.
Why the model? My proposal was to gather the rendering information from Blocks or BlockEntities, cache it, and just let the model perform a cheap array lookup on the client-side chunk cache when building a chunk.
Primary motivation is to avoid new object allocation during chunk rebuild as much as possible. Many hours with the profiler have convinced me this is key to avoiding frame rate stuttering from garbage collection in complex modded worlds with a moderate-to-high frequency of block updates.
Not to mention, the IPipelinedQuadConsumer would need to be an object with all of those fields populated, so you're not really saving much over just passing them via a method call.
That's why the consumer should be ThreadLocal or a field in ChunkRenderWorker. It has to be reused and also thread-safe or this approach doesn't solve any problem.
Keep in mind that, for static models (and, often, even dynamic models! by using a cache), the expectation is not to produce new List<>s every time, but to keep a List<> in memory and just pass it as-is. This is why Mojang designed it the way they have.
Dynamic models may cache subparts and/or geometry that aren't, by themselves, a working quad list suitable for getQuads. So. you either end up with a proliferation of redundant List objects (and the backing arrays) on the heap to avoid reinstantiating them each time, or you create a garbage factory. SubLists can help somewhat to reduce heap size because the array is used but that typically requires you to have your entire list pre-built vs lazy building.
IMO too invasive for Fabric. Fabric is supposed to hook vanilla, not rewrite the rendering pipeline - if you're doing that, you can really, really go crazy with optimizations, and half-solutions are unnecessary.
This is the best argument against it. I am rewriting the rendering pipeline, and was hoping the hooks could be similar vs look completely different. But that is perhaps selfish on my part.
Primary motivation is to avoid new object allocation during chunk rebuild as much as possible. Many hours with the profiler have convinced me this is key to avoiding frame rate stuttering from garbage collection in complex modded worlds with a moderate-to-high frequency of block updates.
Again, nothing says the render data object has to be allocated during chunk rebuild or even before every chunk rebuild; however, the model has no prior preconception of the data that needs to be acquired; the block/block entity does...
That's why the consumer should be ThreadLocal or a field in ChunkRenderWorker. It has to be reused and also thread-safe or this approach doesn't solve any problem.
Which you constantly update the fields of to match. I don't think this decision in particular solves much over simply passing all the necessary arguments in the method call.
Dynamic models may cache subparts and/or geometry that aren't, by themselves, a working quad list suitable for getQuads. So. you either end up with a proliferation of redundant List objects (and the backing arrays) on the heap to avoid reinstantiating them each time, or you create a garbage factory. SubLists can help somewhat to reduce heap size because the array is used but that typically requires you to have your entire list pre-built vs lazy building.
True. On the other hand, dynamic models may also cache complete models with the "render data object" as the key in a cache; this is my approach in most of 1.9+ Charset's content.
This is the best argument against it. I am rewriting the rendering pipeline, and was hoping the hooks could be similar vs look completely different. But that is perhaps selfish on my part.
Opened a PR; maybe we can find a common ground that works decently in both cases.
I think this is something that should wait until Mojang finishes BlazeAPI since it has to do with rendering and Blaze is likely going to bring about a lot of changes. Unless, of course, you want to write a new rendering system from scratch(ish).
Most of the Blaze3D changes are likely to involve what happen after getQuads - in the lighting and draw calls. Mojang has to ditch the ancient fixed-function render pipeline and use shaders, VAOs and uniforms. Memory-mapped buffers could also allow for chunk uploads happening off the client thread - I've implemented that on my own and it mostly works but it is fussy.
And ideally they would not be tied exclusively to OpenGL - I believe there are LWJGL bindings now for Metal and DirectX. OpenGL as an API is craaaazy. That's mostly a product of history but I pity the engineers who have to maintain OpenGL drivers.
Ultimately all of that still requires vertex data packed into a buffer, which is what getQuads is about.
Player makes a really useful point here...
The extra state should - if possible - be obtained only once for each block, before visiting the various sides or layers. It's likely most adequate to query the model instance for the state using block state+world+pos.
Passing the cache view and data provider to getQuads for each face could mean the data provider has to cache or re-derive whatever state lookups it does. And if we instead first call the model to obtain a dynamic state object we could end up causing allocation overhead and exacerbating garbage collection stutters.
For my own rendering API (which I am going to port to Fabric) it worked better to use IoC and not have getQuads return a list, but instead have the model accept a Consumer. I've put some snippets below, edited to remove Voldemappings and simplify a bit...
public interface IPipelinedBakedModel
{
/**
* If your model has a performant way to know if it may have quads
* in a given block layer, you can shortcut processing by overriding
* this method. Otherwise consumer will simply filter by quad.
*/
public boolean mightRenderInLayer(BlockRenderLayer forLayer);
/**
* Default implementation simply casts BakedModel getQuads() output and routes to consumer.
* Some model implementations could be more efficient and/or want to do different things.<p>
*
* If your model segregates quads by layer, query the provided consumer for render layer to improve efficiency.
*/
public void produceQuads(IPipelinedQuadConsumer quadConsumer)
}
public interface IPipelinedQuadConsumer extends Consumer<IPipelinedQuad>
{
/**
* If your model has quads co-planar with a block face, then you should
* exclude them if this method returns false for that face.<p>
*
* This is in lieu of the getQuads(Direction) pattern used in BakedQuad.
*/
public boolean shouldOutputSide(Direction side);
/**
* If your model already segregates quads by block layer you can reduce
* processing time by passing only quads in this layer to the consumer.<p>
*
* If your model just has one big list of quads, you can simply pass them all to the consumer.
* The consumer still checks and will skip quads not in the target layer.
*/
public @Nullable BlockRenderLayer targetLayer();
/**
* Provides access to in-world block position for model customization.
*/
public BlockPos pos();
/**
* Provides access to world view for model customization.
*/
public ExtendedBlockView world();
/**
* Provides access to block state for model customization.<br>
* Is what normally is passed to BakedModel.
*/
public @Nullable BlockState blockState();
/**
* Deterministically pseudo-random bits based on block position.<br>
* Will be same as what normally is passed to BakedModel but is computed
* lazily - will not be calculated if never retrieved.
*/
public long positionRandom();
}
Here's what good about this approach:
- The baked model can query the consumer it receives and do any neighbor state or BlockEntity lookups it needs to only 1X.
- There's a good chance all of this can be done without new object allocation. For example, if the model is truly dynamic, why build a list that will be iterated 1X and thrown away? Internally, it's possible all the state lookups can be held on the stack, depending on data types.
- The model can also shortcut per-face checks or other wasteful calls based on internal knowledge of model geometry. (A sphere model never needs to check sides - all sides will always be produced.)
Not so good:
- More invasive. We'd need to provide vanilla implementations of the consumer for both flat and AO renders and reroute the current calls to them. They would need to be ThreadLocal instances, or perhaps a field within ChunkRenderWorker. (Not particularly hard, but non-trivial.)
- As-is, this doesn't address the concern about BlockEntity concurrent access. (But could be adapted to do so.)
- The state lookups may need to be repeated if the model renders in different layers. My rendering API avoids this most of the time by rendering translucent overlay textures as part of the solid layer - passing the extra layers via an extended vertex format to a shader that handles the blending in one pass with no transparency sorting. But that's too invasive for the core Fabric API. Need to think about this one some more.
I would be willing to make an attempt at a PR if there is interest in seeing it worked out.
Passing the cache view and data provider to getQuads for each face could mean the data provider has to cache or re-derive whatever state lookups it does. And if we instead first call the model to obtain a dynamic state object we could end up causing allocation overhead and exacerbating garbage collection stutters.
the model
Why the model? My proposal was to gather the rendering information from Blocks or BlockEntities, cache it, and just let the model perform a cheap array lookup on the client-side chunk cache when building a chunk.
Anyhow, the snippets...
There's a good chance all of this can be done without new object allocation. For example, if the model is truly dynamic, why build a list that will be iterated 1X and thrown away?
That is absolutely true - we could, regardless, provide a Consumer variant of getQuads, with a default implementation that simply efficiently-ish? populates a List.
More invasive. We'd need to provide vanilla implementations of the consumer for both flat and AO renders and reroute the current calls to them. They would need to be ThreadLocal instances, or perhaps a field within ChunkRenderWorker. (Not particularly hard, but non-trivial.)
IMO too invasive for Fabric. Fabric is supposed to hook vanilla, not rewrite the rendering pipeline - if you're doing that, you can really, really go crazy with optimizations, and half-solutions are unnecessary.
Not to mention, the IPipelinedQuadConsumer would need to be an object with all of those fields populated, so you're not really saving much over just passing them via a method call.
I'd say from here we could derive two baked model interfaces that could belong in Fabric:
- FabricBakedModel extends BakedModel, which can filter by BlockRenderLayer early and have it passed properly?
- DynamicFabricBakedModel extends FabricBakedModel, which can additionally receive dynamic information and circumvents part of the vanilla pipeline.
Keep in mind that, for static models (and, often, even dynamic models! by using a cache), the expectation is not to produce new List<>s every time, but to keep a List<> in memory and just pass it as-is. This is why Mojang designed it the way they have.
The general feature is now implemented via #189