[Proposal] - Artemis Item Encoding Standard
kristofbolyai opened this issue ยท 30 comments
EDIT: The originally proposed format can be found here. The format below is always updated to reflect the current standard.
Artemis Item Encoding Standard
The purpose of this new standard is to provide grounds for a new system, used for encoding Wynncraft items into strings, and vice-versa. Unlike the chat item system, this format is not limited to identified gear items, and can encode items of any (supported) types.
Encoding
Encoding is a 2 layer process. The first layer is responsible for translating between byte arrays and encoded values (UTF-16, base64). The second layer is responsible for translating any kind of game data to an array of bytes.
Encoding a byte array to an encoded value
This section describes how the standard encodes integers. This encoding is not to be changed once implemented.
Encoding to UTF-16:
- An UTF-16 character in the Supplementary Private Use Area-A can encode any value between
0x0
and0xFFFD
. If we borrow the first two characters (U+100000-U+100001
) from Supplementary Private Use Area-B, we can encode exactly 2 bytes of data into a single character. If our number of bytes needs to be padded, we use the Supplementary Private Use Area-B to encode the first value, with a0xEE
byte for padding, which will be ignored when decoding blocks with lengths.
Encoding to base64:
- A byte array is can be encoded into base64 easily, as base64 is made to represent bytes as text.
Format
The encoded string format is represented by different kind of blocks. The order of the blocks is unspecified, with the exception of the start and end blocks, which must be first and last, respectively. Also note that a "type" block must be present, representing the type of the item being encoded. See the format of each block below.
Stability of the format
The format of the encoding itself is not to be changed once implemented. However, the blocks themselves can change their format between versions and blocks may be added and removed. See more information about versioning below.
Block Formats
A block consists of a unique header id between the range of 0-255 (256 possible values, 1 byte). The next bytes are the block's data, which is decoded while reading, since a block's data length is not explicitly encoded. See versions for specific block formats, and the unique header id.
Encoding data
Some blocks use pre-defined ways of encoding data. These are described in this section.
Encoding a string:
Encoding a string is done by encoding the string's ASCII representation into bytes, and terminating with a null byte.
Encoding a variable sized integer:
A variable sized integer is encoded the following way: 7 bits is stored in the first byte. If the data fits in 7 bits, set the highest bit to 0
. If the data does not fit in a single byte, set the highest bit to 1
, add a next byte, with the same process. Repeat until the data encoded into the required number of bytes. Zigzag encoding is used to handle negative values.
Versions
Versions are only incremented if there is a breaking change in the format of 1 or more blocks, and/or if 1 or more blocks are added or removed.
Block Formats - Version 1.0
Start block
ID: 0
Integer size: 8 bits
Data: A single byte, encoding the version of the data.
End block
ID: 255
Type block
ID: 1
Integer size: 8 bits
Data: A single byte, representing the type of item being encoded. This character works as a mapping key, each type has an id character representing it.
Description: Each type represents a single type of item. In most cases these ids are separated in the same way as Artemis' item classes, however exceptions might be allowed in cases where it is logical.
Key to Type Mapping Table
Key | Type | Required blocks | Optional blocks |
---|---|---|---|
0 |
Gear Item | Name | Identifications, Powders, Shiny, Reroll |
1 |
Tome Item | Name | Identifications |
2 |
Charm Item | Name | Identifications |
3 |
Crafted Gear Item | Custom Gear Type, Durability, Requirements | Name, Damage, Defense, Custom Identifications, Powders |
4 |
Crafted Consumable Item | Custom Consumable Type, Uses, Requirements | Effects, Name, Custom Identifications |
5 |
Crafted Item from Recipe | TODO | TODO |
Name block
Header: 2
Data: The name of item is encoded as an encoded string.
Identifications block
Header: 3
Data: The first byte contains the number of non-pre-identified ids (referred to as N) for the item. The second byte contains the type of the identification info in the following blocks, this is the identification type flag. The following bytes contain all identification info. The size of a single identification info depends on the identification type flag.
Identification Type Flag
The purpose of this flag is to make sure ID encoding fits multiple purposes and give the clients and users control above the stability and size of the encoded data. Choosing a longer, extended id encoding flag allows clients to decode the shared data without any external sources, such as APIs, and makes the encoded data "stable", even if the current item specifications change. Choosing a shorter, normal id encoding flag is preferred in situations where data only needs to be available in shorter periods, but a shorter encoding is preferred, such as in-game chat.
Normal encoding
The byte-flag of this encoding is 0.
- Pre-identified stats: Pre-identified stats are not encoded. Injecting them back is an implementation detail for the client.
- Normal stats: Each identification takes 2 bytes to encode. The first byte is the numerical key of the ID (from a single, open-source, shared, mutually agreed upon source). The second byte is the calculated internal roll of the item.
Extended encoding
The byte-flag of this encoding is 1. The next byte is the number of pre-identified stats.
- Pre-identified stats: The first byte is the numerical key of the ID (from a single, open-source, shared, mutually agreed upon source). The following bytes are the encoded variable sized integer of the base value. Internal roll is not sent, as it does not make sense for pre-identified stats.
- Normal stats: The first byte is the numerical key of the ID (from a single, open-source, shared, mutually agreed upon source). The following bytes are the encoded variable sized integer of the base value. The last byte is the calculated internal roll of the item.
Powder block
ID: 4
Data: The first byte is the powder slots on the item. The next byte is the number of bytes. The following bytes are a binary blob, padded to fit the nearest byte with 0
bits. A powder is encoded in 5 bits, with the following math: element * 6 + tier
. The elements follow an ETWFA
order. 5 0
bits are used to represent that no powder is present at the slot.
If it is absent, all powder slots are assumed to be unpowdered.
If it is present, but it's length does not match the number of powder slots of the item, it is assumed that the rest of the slots are unpowdered.
Rerolls block
ID: 5
Data: A single byte encoding the number of rerolls.
Shiny block
ID: 6
Data: The first byte is the id of the shiny stat (from a single, open-source, shared, mutually agreed upon source). ID 0
is reserved for "Unknown". The next bytes are the encoded variable sized integer of the shiny value.
Custom Gear Type block
ID: 7
Data: The data is a single byte, containing the id of the type of the item. See the ID map below.
Gear Type map
ID | Type |
---|---|
0 | Spear |
1 | Wand |
2 | Dagger |
3 | Bow |
4 | Relik |
5 | Ring |
6 | Bracelet |
7 | Necklace |
8 | Helmet |
9 | Chestplate |
10 | Leggings |
11 | Boots |
12 | Weapon* |
13 | Accessory* |
- Fallback types
Durability block
ID: 8
Data: The first byte is the overall effectiveness of the identifications (the percentage next to the name for crafted items). The next bytes are the encoded variable sized integer of the maximum value. The next bytes are the encoded variable sized integer of the current value.
Requirements block
ID: 9
Data: The first byte is the level requirement. The second byte is the class requirement, represented with an id. The next byte is the number of skill requirements. A skill requirement encoded as an id byte, representing the skill (ETWFA
order). The next bytes are the encoded variable sized integer of the requirement values.
Class Requirement map
ID | Type |
---|---|
0 | None |
1 | Mage |
2 | Archer |
3 | Warrior |
4 | Assassin |
5 | Shaman |
Damage block
ID: 10
Data: The first byte is the id of the attack speed of the item. The next byte is the number of attack damages present on the item. An attack damage is encoded the following way: The first byte is the id of the skill (ETWFAN
, where N represents Neutral). The next bytes are the encoded variable sized integer of the minimum damage. The next bytes are the encoded variable sized integer of the maximum damage.
Attack Speed map
ID | Type |
---|---|
0 | Super Fast |
1 | Very Fast |
2 | Fast |
3 | Normal |
4 | Slow |
5 | Very Slow |
6 | Super Slow |
Defense block
ID: 11
Data: The next bytes are the encoded variable sized integer of the health value. The next byte is the number of defense stats present on the item. An defense stat is encoded the following way: The first byte is the id of the skill (ETWFA
). The next bytes are the encoded variable sized integer of the defense value.
Custom Identifications block
ID: 12
Data: The first byte is the number of identifications. The identifications are encoded the following way: The first byte is the id of the identification. The next bytes are the encoded variable sized integer of the max value. For crafted items, the max values can be used to calculate the minimum values (10% of the maximum, rounded) and the current values (from the overall effectiveness).
Custom Consumable Type block
ID: 13
Data: The data is a single byte, containing the id of the type of the item. See the ID map below.
Consumable Type map
ID | Type |
---|---|
0 | Potion |
1 | Food |
2 | Scroll |
3 | Consumable* |
- Fallback types
Uses block
ID: 14
Data: The first byte is the remaining uses for the item. The second byte is the maximum uses for the item.
Effects block
ID: 15
Data: The first byte is the number of effects. An effect is encoded the following way: The first byte is the id of the effect. The next bytes are the encoded variable sized integer of the effect's value.
Consumable Effect map
ID | Type |
---|---|
0 | Heal |
1 | Mana |
2 | Duration |
Referenced data files
Shiny Stat Table
https://github.com/Wynntils/Static-Storage/blob/main/Data-Storage/shiny_stats.json
Identification ID-map Table
https://github.com/Wynntils/Static-Storage/blob/main/Reference/id_keys.json
Ok... One thing that was a problem for us before was the order of stats. In legacy, they had a (somewhat arbitrary) way of ordering stat types, and then stats were sent as an ordered list of the values for the stats. This meant that both parties had to agree to the order. I'm not sure how you suggest to solve this. In fact, I don't see that you even address this..?
We can keep sending the stat values in order, but then we have to specify it very clearly. Or we can send stats as key-value pairs, so we give each stat kind a numeric id, and then basically, if we have Dexterity +4, we send 0x31:0x04, if 0x31 were the code for Dexterity. Or whatever. This will essentially double the amount of data needed to be transfered, though, so will cause more issues for vanilla players.
Hm, I was thinking, can we maybe inject some control characters to make it appear less bad for Vanilla players? In the "good old days", you'd have stored a ^H
(backspace) as every other character, that way all but one of the "unknown squares" characters would have been overwritten. I don't know if that trick is possible in Minecraft chat, but it's worth exploring.
Or, maybe there is some new fancy Unicode stuff we can use. I'm pretty certain there are a lot of control codes meaning "combine the following letters".
If we can minimize the visual impact on Vanilla players, I see no real need to keep the string to an absolute minimum. Then it would be better to encode things in a way that is more self-describing and thus stable.
@magicus The gear item encoding sketch is complete, perhaps it would be the time for you to look at this? I know you had some stronger opinions on this before..
This is just a draft until the V3 item API is migrated (waiting on Wynn). This gives us time to discuss even major changes, if you don't agree with some parts.
As for crafted encoding, I don't plan to work on that until the item encoding itself is fully complete.
For those following this proposal, I've updated the issue description to reflect the current state of the format. Many discussions happened outside of Github, but hopefully all the changes we've agreed upon are implemented in the format now.
I an update the format is planned, adding 2 other types: custom items (crafted gear, custom normal items) and crafted items as recipes.
xxrxxxrxrx
Hm, I was thinking, can we maybe inject some control characters to make it appear less bad for Vanilla players? In the "good old days", you'd have stored a
^H
(backspace) as every other character, that way all but one of the "unknown squares" characters would have been overwritten. I don't know if that trick is possible in Minecraft chat, but it's worth exploring.Or, maybe there is some new fancy Unicode stuff we can use. I'm pretty certain there are a lot of control codes meaning "combine the following letters".
If we can minimize the visual impact on Vanilla players, I see no real need to keep the string to an absolute minimum. Then it would be better to encode things in a way that is more self-describing and thus stable.
I realize I've never responded to this suggestions. Vanilla players seeing a lot of unknown characters is a problem, but the main reason to keep the encoding short is so multiple items can fit into the relatively short maximum chat length (128 chars) Minecraft sets.
Byte Based Encoding Proposal:
Basically I was thinking it would be nice if we could work in a more standard format.
This would then be converted to/from unicode (base-4096) and/or base64 (wynnbuilder's "native format") easily using a bit of boilerplate.
|
represents concatenation. size of data entries is written in square brackets []
.
An encoded thing is of the following form, a list of blocks:
version[1 byte] | itemtype[1 byte] | header | block | block | ... | END
(hopefully we don't need more than 256 versions...)
END
is the literal 255
. which means each itemtype
is allowed 255 legal block types.
Technically the END block is optional if you're just encoding one item, but its useful for other applications (ex. wynnbuilder full build encoding).
itemtype
can be one of the following:
0 item with optional rolls
1 crafted item description (from ingredients)
2 complete item description
Each block
has the following format:
type[1 byte] | data[0+ bytes]
where the meaning of the type
block depends on the itemtype
of the entire encoded object.
The header specification may also differ between itemtype
s.
Basically it represents the "required fields" for that kind of item.
Item with optional rolls
The header
for this is two bytes:
item_id[2 bytes]
item_id
is the numeric ID of the item.
NOTE: we need to decide on a standard item ID map. We also need to decide on a canonical ordering of the stats for any item.
Available block types:
0 rolls buffer
1 stars buffer
8 powder buffer
9 shiny data
10 rerolls
32 wynn api version
rolls buffer
A byte array, where every byte corresponds to a roll value (30-130).
The ordering of item rolls is dependent on an (external) canonical ordering, TBD.
Only the rolls that are actually on this item are stored.
NOTE: Intentionally not storing 0-100 rolls, because in the past wynn has changed this in a janky manner.
If it is absent, max rolls for every stat are assumed (or base rolls, for fixed stat items).
If it is present, its length must match the base item's number of stats.
stars buffer
A packed byte array, where each byte is formatted as follows:
star[2 bits] | star[2 bits] | star[2 bits] | star[2 bits]
where star
is a number from 0-3 indicating the number of stars.
The array is right padded with zeros to align with the byte boundary.
If it is absent, stars are computed from the rolls buffer.
If it is present, its length must match the base item's number of stats.
powder buffer
A binary blob, padded out to the nearest byte.
Every 5 bits corresponds to a powder, via the following algorithm:
// Decode a powder number.
// Accepts a number from 0 to 31.
// 0 is a special character for "No Powder".
// 31 is invalid.
// 1-30 represent the 30 powders in wynncraft.
function decode(powder_num):
if powder_num = 0:
return NULL_POWDER
if powder_num == 31:
return INVALID_POWDER
powder_num := powder_num - 1
// Element order: ETWFA
element = int(powder_num / 6)
tier = (powder_num % 6) + 1
return Powder(element, tier)
If it is absent, all powders are assumed to be NULL_POWDER
.
If it is present, its length must match the base item's number of powder slots.
shiny data
Of the following format:
type[1 byte] | counter[8 bytes]
counter
is a single unsigned 64-bit number indicating the value of the shiny data.
type
selects from a table of possible shiny entries:
Table TODO: hpp does not know what shiny stats are like.
If it is absent, we assume there is no shiny data.
rerolls
Of the following format:
rerolls[1 byte]
If it is absent, we assume 0 rerolls.
wynn api version
Of the following format:
wynn_version[2 bytes]
(can we bet on there being less than 16k api updates? I hope so...)
If it is absent, we assume the latest version of the wynn api.
Crafted item (from ingredients)
The header
for this is fifteen (15) bytes:
ing1[2 bytes] | ing2 | ing3 | ing4 | ing5 | ing6 | recipe[2 bytes] | meta [1 byte]
corresponding to ingredients
1 2
3 4
5 6
recipe
is a craft recipe ID (from a standard list) that contains information about the craft level, type, and base stats (hp, damage, number of charges).
meta
breaks down as follows:
tier1[2 bits] | tier2[2 bits] | unused[1 bit] | atkspd[3 bits]
where tier1
and tier2
are the tiers for material 1 and material 2 (from the recipe), and atkspd
is the attack speed (for weapons).
Available block types:
8 powder buffer
11 item name
12 item lore
32 wynn api version
item name
A null-terminated string. (any number of nonzero bytes, followed by a zero byte to mark the end of the string.)
item lore
Same format as item name.
Complete item description
The header
for this is three bytes:
num_ids[1 byte] | meta[2 bytes]
meta
is defined as follows:
num_powder_slots[6 bits] | unused[1 bits] | item_type[5 bits] | rarity[4 bits]
item_type
table:
0 helmet
1 chestplate
2 leggings
3 boots
4 ring
5 bracelet
6 necklace
7 wand
8 spear
9 bow
10 dagger
11 relik
12 potion
13 scroll
14 food
15 weaponTome
16 armorTome
17 guildTome
rarity
table:
0 Normal
1 Unique
2 Rare
3 Legendary
4 Fabled
5 Mythic
6 Set
7 Crafted
Available block types:
1 stars buffer
2 stat ID buffer
3 stat length buffer
4 max stats buffer
5 min stats buffer
8 powder buffer
9 shiny data
10 rerolls
11 item name
12 item lore
13 item description
14 item texture
15 current durability
stars buffer
, powder buffer
, shiny data
and rerolls
are identical to parsing an Item with optional rolls.
item name
and item lore
are identical to parsing a Crafted item (from ingredients).
stat ID buffer
A list of the stat IDs for each stat on this item.
Should contain no duplicates, or else the behavior is undefined.
TODO: need an agreed-upon list. Maybe wynn API if its stable?
NOTE: this is not optional!
stat length buffer
A list of the length (in bytes) for encoding each stat.
The min and max stats will each be encoded using the same length.
Lengths are packed (4 bits each), and the result is right padded with zero if needed.
(The actual length is one more than the value in the array; since length 0 will never be used. This allows stats to be up to 16 bytes long.)
NOTE: this is not optional!
max stats buffer
A binary blob containing data about the max stats for this item.
The order is given by the 'stat ID buffer', but each entry can have variable size in bytes.
The size is given by the 'stat length buffer'.
Unlike normal buffers, this buffer stores numbers using two's complement.
This allows negative numbers to be stored as well!
NOTE: level, attackspeed, hp, max durability/charges, stat req, are all counted as "stats".
So they can go here.
attackspeed
lookup table:
0 SUPER_SLOW
1 VERY_SLOW
2 SLOW
3 NORMAL
4 FAST
5 VERY_FAST
6 SUPER_FAST
NOTE: this is not optional!
min stats buffer
Same format as max stats, but its optional.
If left out then max stats are used and the item is assumed to be a fixed ID item.
item description
Extra string field to accomodate things like event items.
item texture
TODO
current durability
Single byte, storing the current durability/charges of this item.
magicus made excellent points, I also like the idea of sending key-value pair for stats, and if a stat order list is still required, I think it'd be feasible to use the order from v3 item API, or the order of individual item identifications. (Assume they don't change it)
The byte base encoding hpp described above is nice, especially for the crafted-item encoding, it's a straightforward way
Identification hash-check: Since the encoded values depend on other factors, mainly the API's identifications list and base values, we can't be sure that the sender and the receiver has the same understanding of the item. In a perfect world, we could send the identification names along with the base values in the encoded message. However, doing this would considerably increase the encoded message's length, making this option not practical in applications.
For this reason, identification hash character is included in the encoded data. It is highly likely that hash collisions to happen in real world, however this should still give the clients a way to catch issues in most cases.
Doing this hash-check on the receiver side is an optional implementation detail, however all senders must include this data.
This should answer the why's and why nots of doing key-value pairs. As for having both parties agree, that is what the hash is used for. It is basically an error-checking system, but not an error-correcting one.
+1 to the k-v pairs (essentially what my idea has; but like "unrolled" into a few separate buffers. I think this makes it easier to make entries of the mapping optional, or add new entries (if for whatever reason that is needed)).
Identification hash-check: Since the encoded values depend on other factors, mainly the API's identifications list and base values, we can't be sure that the sender and the receiver has the same understanding of the item. In a perfect world, we could send the identification names along with the base values in the encoded message. However, doing this would considerably increase the encoded message's length, making this option not practical in applications.
For this reason, identification hash character is included in the encoded data. It is highly likely that hash collisions to happen in real world, however this should still give the clients a way to catch issues in most cases.
Doing this hash-check on the receiver side is an optional implementation detail, however all senders must include this data.This should answer the why's and why nots of doing key-value pairs. As for having both parties agree, that is what the hash is used for. It is basically an error-checking system, but not an error-correcting one.
To be clear, when encoding for 3rd parties, I am more than happy to include ID keys. But for chat encoding itself, I do think it would be too long and/or too redundant.
To be clear, when encoding for 3rd parties, I am more than happy to include ID keys. But for chat encoding itself, I do think it would be too long and/or too redundant.
wynnbuilder internally uses implicit order too (defined in an external data file). I think it would be best if we could rely on implicit order as much as possible and use a stable "item ID lookup table" and "stat ID lookup table" to define the ordering.
The version I see being implemented may just be a third version, combining the good aspects of both proposals.
To come to an agreement, in a timely manner, there is a really simple first step to take: Agree on a mutual base class for a "character" / "data block" / "byte", basically the smallest chunk of data we share. Creating this class would give us easy ways of encoding and decoding, in a clear, unit testable and even sharable format.
I also think that we should first focus only on encoding gear items. This is the easiest case, and gives us valuable info, before working on the custom, and much more complex items, like crafted and "unique" items.
What I like from your format is the simplicity of encoding for some parts of the blocks. I would like to use it, or something similar to it. As for the "common building block" it's either should be written in base 16, as unicode encoding is basically 4 hex bits. However, thinking in hex is much harder than bytes. Since 16^4 is exactly 2^16, we could make our "common building block" 2 bytes. That would give us a really straight forward way of encoding to both base64 and Unicode. (And it would also allow Wynnbuilder to decode/convert chat items from unicode, as you would only have to do almost nothing to extract the data to a byte format).
What do you think @hppeng-wynn @RawFish69?
To be clear, when encoding for 3rd parties, I am more than happy to include ID keys. But for chat encoding itself, I do think it would be too long and/or too redundant.
wynnbuilder internally uses implicit order too (defined in an external data file). I think it would be best if we could rely on implicit order as much as possible and use a stable "item ID lookup table" and "stat ID lookup table" to define the ordering.
We do have an implicit internal order too. A "legacy" one is used for chat items, and Artemis has 3 custom orders. Any of those could be used for agreeing on a common order or id-key map.
I chose a single byte because its basically the default "bit of data" across computers in general
2 bytes could also work I guess but there's already a lot of fields that are much smaller than that (most IDs will fit in like 5 bits lol) so I think it would be wasteful
The version I see being implemented may just be a third version, combining the good aspects of both proposals.
To come to an agreement, in a timely manner, there is a really simple first step to take: Agree on a mutual base class for a "character" / "data block" / "byte", basically the smallest chunk of data we share. Creating this class would give us easy ways of encoding and decoding, in a clear, unit testable and even sharable format.
I also think that we should first focus only on encoding gear items. This is the easiest case, and gives us valuable info, before working on the custom, and much more complex items, like crafted and "unique" items.
What I like from your format is the simplicity of encoding for some parts of the blocks. I would like to use it, or something similar to it. As for the "common building block" it's either should be written in base 16, as unicode encoding is basically 4 hex bits. However, thinking in hex is much harder than bytes. Since 16^4 is exactly 2^16, we could make our "common building block" 2 bytes. That would give us a really straight forward way of encoding to both base64 and Unicode. (And it would also allow Wynnbuilder to decode/convert chat items from unicode, as you would only have to do almost nothing to extract the data to a byte format).
What do you think @hppeng-wynn @RawFish69?
for "just gear items" do you mean like, just the normal rolled items?
i mean thats pretty simple our two encoding proposals are basically identical i guess (though i separated out stars as a separate buffer to make it easier to include as an optional entry for applications that don't need it)
Also, I like the generality behind @hppeng-wynn's proposal, that we can have a common binary format, and then encode that binary string into chat using unicode characters. However:
1. it will most certainly lead to longer strings in chat 2. it is very much unclear what would be gained by anyone of us having a common binary format in the chat...
On the contrary binary format encoded in unicode is probably going to be shorter (if you design it without much padding) since there's much less wastage
ex. in @kristofbolyai 's original specification, the rolls + stars are being encoded using 12 bits (1 unicode char, 4096 values); but they really only take up 10 bits (including the 30 offset). So the binary code would be more efficient by nearly 20%
@kristofbolyai We seem to talk just past each other. Your hash check is for the stat values ("ids"). I'm talking about the stat types. Your check would help in a situation where Wynn has e.g. nerfed the base value of a certain stat. But it would not help if there is a misunderstanding in the order of stats.
I spend a sh*tload of hours trying to clean up the stat handling from Legacy to Artemis. And the "ordering" of stats was a common pain point. In the end, I had to create a special ordering just to accommodate the old "item chat protocol". And I realized it would be terribly broken for all new stat types that had been introduced since it was created.
So, I am very very skeptical towards any idea of "assumed" ordering. If you chose to go down that route, you will basically need to bump the version number each time Wynn adds a new stat type. If, on the other hand, you chose key-value pairs, and have a way to assign numeric ids to the stats (here you can use whatever order you agree on and just enumerate the stat types from that list), then you are safe for all future. If a client receives a number it does not understand, it can just say: "Unknown Stat: 7".
tbf I think adding new stat types is pretty rare and i'd be OK with bumping the version number when that happens
or honestly just make the "stat mapping" append only -- that way older parsers would just ignore the out of bounds stat ID
minor fixes to my comment proposal:
- fix
item_type
field to be 5 bits (slightly more annoying to parse, but now actually fits all the item types) - change the way string parsing is handled (standard null-terminated string, like from C)
@kristofbolyai We seem to talk just past each other. Your hash check is for the stat values ("ids"). I'm talking about the stat types. Your check would help in a situation where Wynn has e.g. nerfed the base value of a certain stat. But it would not help if there is a misunderstanding in the order of stats.
I spend a sh*tload of hours trying to clean up the stat handling from Legacy to Artemis. And the "ordering" of stats was a common pain point. In the end, I had to create a special ordering just to accommodate the old "item chat protocol". And I realized it would be terribly broken for all new stat types that had been introduced since it was created.
So, I am very very skeptical towards any idea of "assumed" ordering. If you chose to go down that route, you will basically need to bump the version number each time Wynn adds a new stat type. If, on the other hand, you chose key-value pairs, and have a way to assign numeric ids to the stats (here you can use whatever order you agree on and just enumerate the stat types from that list), then you are safe for all future. If a client receives a number it does not understand, it can just say: "Unknown Stat: 7".
just make the "stat mapping" append only -- that way older parsers would just ignore the out of bounds stat ID
What do you mean? It sounds like you are talking about values, not types? I am still mostly worried about matching the correct type.
Also, I like the generality behind @hppeng-wynn's proposal, that we can have a common binary format, and then encode that binary string into chat using unicode characters. However:
- it will most certainly lead to longer strings in chat
- it is very much unclear what would be gained by anyone of us having a common binary format in the chat...
@magicus hhpeng works on/is the creator of Wynnbuilder. This is basically full integration with parts of Wynnbuilder, and an overall format for anyone to encode items in Wynn in the future.
Also, I like the generality behind @hppeng-wynn's proposal, that we can have a common binary format, and then encode that binary string into chat using unicode characters. However:
it will most certainly lead to longer strings in chat
it is very much unclear what would be gained by anyone of us having a common binary format in the chat...
I am working on a concept that would reduce the length of the encoded strings, even with bytes.
Also:
On the contrary binary format encoded in unicode is probably going to be shorter (if you design it without much padding) since there's much less wastage
That sounds great! As I said, I am all in favor of standardizig formats. Not sure how it helps us, but as long as it doesn't pose a problem for us, just go for it.
just make the "stat mapping" append only -- that way older parsers would just ignore the out of bounds stat ID
What do you mean? It sounds like you are talking about values, not types? I am still mostly worried about matching the correct type.
The way wynnbuilder has done it, we basically have a big list of stats in an order, and whenever wynn added new stats to the game, those stats always get appended to the end of the list
(talking about the stat mapping, not the stats for any given item.)
that ensure that the old stats are always in the same order, and new stats basically get new IDs.
That's one way to create "implicit backwards compatibility" without much effort. However it comes at a small cost to readability (for example, the damage stats are not all next to each other in this list.)
Also, I like the generality behind @hppeng-wynn's proposal, that we can have a common binary format, and then encode that binary string into chat using unicode characters. However:
- it will most certainly lead to longer strings in chat
- it is very much unclear what would be gained by anyone of us having a common binary format in the chat...
I am working on a concept that would reduce the length of the encoded strings, even with bytes.
Basically I am thinking of using bytes as smallest data chunk, but with a trick to make it really efficient in chat:
Each block type would define (in the standard, not in encoding) their "requested" data size. It would either be 8 bits, 16, 32 or 64. This would work easily with both encoding formats: Unicode characters in the Supplementary Private Use Area-A can encode any value between 0xF0000
and 0xFFFFD
(and with some tricks we can encode 0xFFFFE
-0xFFFFF
too, although I am not exactly sure how at this time). As for the Wynnbuilder base64 encoding, encoding a byte-array is pretty straight forward.
Blocks would not only define their "integer" size, but their length, so there would not be a need to reserve any characters for block types, and there would be no need to reserve/use a character for separating parts. As for the block headers itself, 1 byte would represent the type, 1 byte would give us the size of the block (divided by the block's data size).
I think all of us understand the benefit of having variable sized blocks, but let me state an obvious case. If we support 64-bit integers natively in the standard, there is no black magic needed when encoding such values. Also supporting lower bit sizes, like 8 and 16 allow us to efficiently bundle information like identification key-value pairs.
And the best of all of this is that the Unicode representation would be close to being the most efficient it can be (practically, not theoretically).
What do you think? If we all agree here, we can go ahead and agree on the standard for encoding normal gear items, and implement that while getting the encoders/decoders written in the process. Once we know encoding/decoding is stable, we can move to working on the "fun" parts.
So, if you agree, please react with an emoji :)
The version I see being implemented may just be a third version, combining the good aspects of both proposals.
To come to an agreement, in a timely manner, there is a really simple first step to take: Agree on a mutual base class for a "character" / "data block" / "byte", basically the smallest chunk of data we share. Creating this class would give us easy ways of encoding and decoding, in a clear, unit testable and even sharable format.
I also think that we should first focus only on encoding gear items. This is the easiest case, and gives us valuable info, before working on the custom, and much more complex items, like crafted and "unique" items.
What I like from your format is the simplicity of encoding for some parts of the blocks. I would like to use it, or something similar to it. As for the "common building block" it's either should be written in base 16, as unicode encoding is basically 4 hex bits. However, thinking in hex is much harder than bytes. Since 16^4 is exactly 2^16, we could make our "common building block" 2 bytes. That would give us a really straight forward way of encoding to both base64 and Unicode. (And it would also allow Wynnbuilder to decode/convert chat items from unicode, as you would only have to do almost nothing to extract the data to a byte format).
What do you think @hppeng-wynn @RawFish69?
It looks good, I only do decode and it shouldn't matter since the concept @hppeng-wynn proposed is similar enough.
I would rather get around a common order list, that both parties have to agree upon, if possible.
The stat name and list size may vary as the game updates, basically causing pain like @magicus mentioned above, it would benefit 3rd party receivers to use alternatives. If there's any reason to, pick an order from the 4 existing ones is also fine, whatever is more convenient in the long run.
Each block type would define (in the standard, not in encoding) their "requested" data size. It would either be 8 bits, 16, 32 or 64. This would work easily with both encoding formats: Unicode characters in the Supplementary Private Use Area-A can encode any value between
0xF0000
and0xFFFFD
(and with some tricks we can encode0xFFFFE
-0xFFFFF
too, although I am not exactly sure how at this time). As for the Wynnbuilder base64 encoding, encoding a byte-array is pretty straight forward.
maybe I'm confused now. I was thinking of using the space you had reserved for encoded numbers
-- is that not good for chat display? if that's the case then this is a much harder problem... why did they give you only 4094 options... technically still doable with like BigInteger or something but that's much much more annoying
I don't understand why the blocks need "preferred data size". fundamentally the byte encoding would be like, just running over unicode character boundaries as follows:
there is no need to specify the "external" word size. In fact the byte word size is pretty arbitrary (as mentioned by mahakadema in discord) and honestly a pure binary format might work better. I haven't really measured the inefficiency we incur by using this word size
Powder block
ID: 4
Data: The data is binary blob, padded to fit the nearest 8 bits with 1 bits. A powder is encoded in 5 bits, with the following math: element * 6 + tier. The elements follow an ETWFA order. 5 0 bits are used to represent that no powder is present at the slot.
If it is absent, all powder slots are assumed to be unpowdered.
If it is present, but it's length does not match the number of powder slots of the item, it is assumed that the rest of the slots are unpowdered.
Here, for the Powder block
, we are missing information about the number of the powders. For a universal standard, where items are not limited to official wynncraft items (with the information of slots number), the only way to determine the stop of the powder block is the start of Rerolls block
.
This will result in a problem, with proper combination of several powders, the byte array will have a 5
in it representing the powders. Here is a piece of deliberately fabricated data where the 5
appears:
W4, W4, W4, W4, T4
-> 16, 16, 16, 16, 10
-> 10000 10000 10000 10000 01010
-> 0b10000100, 0b00100001, 0b00000101, 0b01111111
-> 132, 33, 5, 127
Powder block
ID: 4
Data: The data is binary blob, padded to fit the nearest 8 bits with 1 bits. A powder is encoded in 5 bits, with the following math: element * 6 + tier. The elements follow an ETWFA order. 5 0 bits are used to represent that no powder is present at the slot.
If it is absent, all powder slots are assumed to be unpowdered.
If it is present, but it's length does not match the number of powder slots of the item, it is assumed that the rest of the slots are unpowdered.
Here, for the
Powder block
, we are missing information about the number of the powders. For a universal standard, where items are not limited to official wynncraft items (with the information of slots number), the only way to determine the stop of the powder block is the start ofRerolls block
.This will result in a problem, with proper combination of several powders, the byte array will have a
5
in it representing the powders. Here is a piece of deliberately fabricated data where the5
appears:W4, W4, W4, W4, T4 -> 16, 16, 16, 16, 10 -> 10000 10000 10000 10000 01010 -> 0b10000100, 0b00100001, 0b00000101, 0b01111111 -> 132, 33, 5, 127
I've thought about this being an issue, but I've shrugged it off, and I've only written the encoding part. A simple solution is to have a "null" byte at the end of the list, or to send the powder count. Both solutions use a single byte. I would lean toward sending a single, which is common in the standard.
Do you have a better idea perhaps?
Do you have a better idea perhaps?
My first thought is to move the powder block to the last block which will naturally gives it a termination and then I realize this is a terrible solution without any robustness. And then by the entropy the only way is to send one more byte. I prefer sending the count, cause this avoids culling the 1 bits
for padding.