Implement Codec Support

Question

Implement Codec Support

forgetmenot13579 opened this issue a year ago · 7 comments

forgetmenot13579 commented a year ago

At the moment, (de)serialization is done using Component#writeToNbt and Component#readFromNbt, which requires users to write out their (de)serialization logic explicitly. This is not only an arduous process, but also introduces a large potential for error.

Let's look at a simple example of serialization logic.

public class Racecar {
  public enum EngineType {
      COMBUSTION, STEAM, KINETIC, ELECTRIC;
  }

  public record WheelInfo(float wheelDiameter, float tireDiameter, float tireThickness) {
  }

  private String name;
  private WheelStats frontWheels;
  private WheelStats rearWheels;
  private int primaryPaintColor;
}

The (de)serialization code for this class with the current scheme would look like this:

@Override
public void readFromNbt(NbtCompound tag) {
    this.name = tag.getString("name");
    
    this.frontWheels = new WheelInfo(
            tag.getFloat("frontWheelDiameter"),
            tag.getFloat("frontTireDiameter"),
            tag.getFloat("frontTireThickness")
    );

    this.rearWheels = new WheelInfo(
            tag.getFloat("rearWheelDiameter"),
            tag.getFloat("rearTireDiameter"),
            tag.getFloat("rearTireThickness")
    );
    
    this.engineType = EngineType.valueOf(tag.getString("engineType"));

    try {
        // Colors should be base 16 strings prefixed with #, so we trim that off
        this.primaryPaintColor = Integer.parseInt(tag.getString("primaryPaintColor").substring(1), 16);
    } catch (NumberFormatException ignored) {
    }
}

@Override
public void writeToNbt(NbtCompound tag) {
    tag.putString("name", this.name);
    tag.putFloat("frontWheelDiameter", this.frontWheels.wheelDiameter);
    tag.putFloat("frontTireDiameter", this.frontWheels.tireDiameter);
    tag.putFloat("frontTireThickness", this.frontWheels.tireThickness);
    tag.putFloat("rearWheelDiameter", this.rearWheels.wheelDiameter);
    tag.putFloat("rearTireDiameter", this.rearWheels.tireDiameter);
    tag.putFloat("rearTireThickness", this.rearWheels.tireThickness);
    tag.putString("engineType", this.engineType.name());
    tag.putString("primaryPaintColor", "#" + Integer.toString(this.primaryPaintColor, 16));
}

There are a few very obvious things that can go wrong with manual serialization logic like this:

Mistype attribute keys
Mismatching attribute keys between serializing and deserializing methods
Incorrect field references (particularly a problem with many fields of the same type, as above)
Forgetting to (de)serialize a particular attribute

And these are just for the simple example listed above. There are much more complicated component implementations out there. So what does the alternative look like, in the world of codecs?

// We make our EngineType enum StringIdentifiable to make use of Minecraft's built-in codec helpers
public enum EngineType implements StringIdentifiable {
    COMBUSTION, STEAM, KINETIC, ELECTRIC;

    @Override
    public String asString() {
        return this.name();
    }
}

public record WheelInfo(float wheelDiameter, float tireDiameter, float tireThickness) {
    static Codec<WheelInfo> CODEC = RecordCodecBuilder.create(instance -> instance.group(
            Codecs.POSITIVE_FLOAT.fieldOf("wheelDiameter").forGetter(WheelInfo::wheelDiameter),
            Codecs.POSITIVE_FLOAT.fieldOf("tireDiameter").forGetter(WheelInfo::tireDiameter),
            Codecs.POSITIVE_FLOAT.fieldOf("tireThickness").forGetter(WheelInfo::tireThickness)
    ).apply(instance, WheelInfo::new));
}

public static Codec<CarConfiguration> CODEC = RecordCodecBuilder.create(instance -> instance.group(
    Codec.STRING.fieldOf("name").forGetter(CarConfiguration::name),
    StringIdentifiable.createCodec(EngineType::values).fieldOf("engineType").forGetter(CarConfiguration::engineType),
    WheelInfo.CODEC.fieldOf("frontWheels").forGetter(CarConfiguration::frontWheels),
    WheelInfo.CODEC.fieldOf("rearWheels").forGetter(CarConfiguration::rearWheels),
    Codec.STRING.comapFlatMap(colorString -> {
                try {
                    // Colors should be base 16 strings prefixed with #, so we trim that off
                    return DataResult.success(Integer.parseInt(colorString.substring(1), 16));
                } catch (NumberFormatException ignored) {
                    return DataResult.error(() -> "String is not a valid hex color code");
                }
            }, colorValue -> "#" + Integer.toString(colorValue, 16)).fieldOf("primaryPaintColor")
            .forGetter(CarConfiguration::primaryPaintColor)
).apply(instance, CarConfiguration::new));

This can look like a lot of nonsense to somebody who hasn't worked with codecs, but I won't go into too much detail with how creating codecs works here. What's important to know is that each field of our component is mapped and can be safely serialized and deserialized. We have created our (de)serialization logic in such a way that we can be sure each field will be correctly written to and read from.

Implications

With the Component#writeToNbt and Component#readFromNbt methods being ingrained so deeply into CCA's codebase and API surface, there's no reasonable way to add codec support without a major version bump. We could hack it and have a CodecComponent whose serialization methods just threw exceptions if called, but I would not consider that an elegant solution. Instead, we should implement codec components in a way that makes sense and accept that people will have to put in some work to adapt their existing projects.

The first thing to do is remove Component#writeToNbt and Component#readFromNbt from the base Component class. Obviously allowing mods to keep their existing serialization logic is going to be important, so instead these methods will move to a new LegacyComponent class (name tbd). The base Component class will be just a marker interface from then on.

There are essentially two options for associating codecs and components. The first would be to add a Component#codec method and get the codec from existing (or default) instances of the component. Due to the limitations of Java's generic system (namely no self typing) this would be a bit ugly. My preferred implementation would be to store the codec on each components RegistryKey, and rename that class to RegistryType to be more representative of this change in behavior. We would then add two methods <C extends Component> C RegistryType#readFromNbt(ComponentContainer, NbtCompound) and void writeToNbt(ComponentContainer, NbtCompound), which would take the place in the API of the existing (de)serialization methods (and be responsible for calling those methods on LegacyComponents).

Because Component is now a marker interface, we need to do some work to ensure that people's components are serializable. Thus, we would replace the current registration method ComponentRegistry#getOrCreate

public static <C extends Component> ComponentKey<C> getOrCreate(Identifier componentId, Class<C> componentClass);

With two methods

public static <C extends Component & LegacyComponent> ComponentType<C> getOrCreate(Identifier componentId, Class<C> componentClass);
public static <C extends Component> ComponentType<C> getOrCreate(Identifier componentId, Class<C> componentClass, Codec<C> componentCodec);

Another change that would be necessary is to the ticking logic. Currently, CCA's tick methods do not take parameters, as components are expected to keep a reference to their provider around if they want to access it. There's not really a good way to pass the provider to a component created with codecs during initialization, so I'd propose that the ticking components are given a generic for their provider type, and that new tick, clientTick, and serverTick methods are created that are passed that type as an argument.

#163

forgetmenot13579 · Answer 1 · 2023-10-26T18:21:12.000Z

Just to clarify, I'll be happy to contribute a PR implementing this functionality, I just wanted to have a discussion on what the API should look like before doing so.

Pyrofab · Answer 2 · 2023-10-28T15:56:49.000Z

That is indeed a pretty big change, but I believe it to be a good one. In my experience having to hold onto the provider passed in the constructor also seems to be a bit confusing for new users; passing the provider in the tick method may be more intuitive.
This will make it harder to use the auto-sync API though, as now setters will need to take the provider as argument, unless we overhaul that API as well.
P.S. you wrote RegistryKey and RegistryType in a few places, I believe that should be ComponentKey and ComponentType ?

forgetmenot13579 · Answer 3 · 2023-10-28T16:43:18.000Z

Why would the auto sync setters need to take the provider as an argument? Shouldn't the syncing be based entirely on the data in the buffer, not doing any kind of outside logic?

P.S. you wrote RegistryKey and RegistryType in a few places, I believe that should be ComponentKey and ComponentType ?

Ah, yep, thanks for catching that. Fixed.

Pyrofab · Answer 4 · 2023-10-28T17:27:05.000Z

Why would the auto sync setters need to take the provider as an argument?

Not the writers, the components' own setters. It's a common pattern that calling e.g. component.setMana(int) triggers a sync using the held provider, which would not exist anymore.

As for the actual decisions :

I would be in favour of doing as you said with putting the Codec in ComponentKey.
I do not have much of a preference for the name of the latter. Keeping ComponentKey means slightly fewer changes, but ComponentType would probably be more in line with Minecraft's own registrable object types.
The serialisation changes mean component instances would go from constants to volatile. This may be an issue for e.g. screens, which may currently hold on those instances. Maybe we should have an isValid method in Component to have a way to tell that an instance is void ?
I would suggest SelfSerializingComponent instead of LegacyComponent, unless you want to emphasise that its use is discouraged in new code.

forgetmenot13579 · Answer 5 · 2023-10-28T18:05:58.000Z

LegacyComponent was just a stand-in name, I think SelfSerializingComponent is a fine actual name.

Regarding volatility of components, we could enforce implementation of CopyableComponent to avoid that problem. Partial syncing changes will also make that less of a problem.

Pyrofab · Answer 6 · 2023-10-29T12:13:22.000Z

Actually, serialization methods may need a World access as well, for access to e.g. dynamic registries. How would that work with codecs ?

forgetmenot13579 · Answer 7 · 2023-10-30T16:07:51.000Z

Vanilla's solution is to hold onto RegistryKeys or RegistryEntryLists and only resolve them when the value is actually needed, i.e. the tick method where the context is known and the holder passed, or any logic outside of the component class that accesses the component and therefore would have the holder (and presumably the world) for context. There's a good amount of scaffolding in Vanilla for this pattern as well.

Share to

Implications