Text page can't handle text written in scripto continua (i.e. text without whitespace as word divider)

Question

Text page can't handle text written in scripto continua (i.e. text without whitespace as word divider)

3TUSK opened this issue 6 years ago · 7 comments

Synopsis

Text page can't handle scriptio continua. That said, given a entry with a single text page, if the content of that text page does not use whitespace as word divider, the page will not be able to wrap line when necessary.

Reproduction

Environment is Forge 14.23.5.2772 & Patchouli 1.0-7.9.
assets/[modid]/patchouli_books/[book_name]/en_us/entries/preface/preface.json - the page used for comparison:

{
  "name": "Preface",
  "icon": "minecraft:knowledge_book",
  "category": "preface",
  "pages": [
    {
      "type": "text",
      "text": "Cuisine is a Minecraft Mod about cuisine, and to a certain extent, gastronomy. In this mod, you can cook food in a realistic-esque manner. Think about how would you prepare food in your kitchen - this is what this Mod provides."
    }
  ]
}

assets/[modid]/patchouli_books/[book_name]/zh_cn/entries/preface/preface.json - the translated version of the page mentioned above; it is also the page in question:

{
  "name": "序",
  "icon": "minecraft:knowledge_book",
  "category": "preface",
  "pages": [
    {
      "type": "text",
      "text": "Cuisine 是一个关于烹饪及（在某种意义上）美食学的 Minecraft Mod。这个 Mod 实现了一套相对写实的烹饪玩法。想想看平时你在厨房是怎么做菜的——这就是这个 Mod 提供的内容了。"
    }
  ]
}

Adjust the json if necessary (category, ...). Set language to zh_cn (Simplified Chinese) and restart the game to reload the book. Should be able to see the following:

As comparison, the expected behavior will look like the following (screenshot taken in English (US) locale):

where all text are correctly wrapped to new line, and everything is fitted into the page on the left.

Analysis

BookTextParser::parse seems to "wrap" the content string based on occurrence of whitespace:
https://github.com/Vazkii/Patchouli/blob/028953b044fbae5a31ea0ac2f5632aaa5b0f463c/src/main/java/vazkii/patchouli/client/book/text/BookTextParser.java#L169-L173
The assumption that text uses whitespace as word divider may not be true for scripts in some languages (specifically speaking, Chinese (both zh_cn and zh_tw) and Japanese (ja_jp), if only languages supported by Vanilla Minecraft are considered).

Final words

Give the complexity of BookTextParser, I am not sure how complex will a proper fix look like, but I am willing to open a Pull Request if PR is welcome.

stanhebben · Answer 1 · 2018-11-11T09:09:59.000Z

还没注意中文😀. I can take care of this, but what would be the best way to determine if text is scriptio continua?

We could do so based on the characters and recognize character ranges. Maybe we can use Character.isIdeographic() for that, which can check for Chinese, Japanese, Korean and Vietnamese, but would that handle all cases of scriptio continua?

Alternatively, it could be done according to the book's language, but then foreign language snippets inside such books would not be formatted properly.

3TUSK · Answer 2 · 2018-11-11T09:47:52.000Z

Off-topic: that BookTextParser is more like a BookTextTokener to me.

Character.isIdeographic might work - but...

~~Modern Vietnamese uses a romanized script. So the issue described does not apply to Vietnamese.~~ One less thing to consider, which is good.
~~Modern Korean uses hangul - usually it uses whitespace to separate words; otherwise there is no readability at all.~~ Another thing crossed out.
Within the set of languages that Vanilla Minecraft supports - the only case that Character.isIdeographic may not work is probably Thai. After all, modern languages rarely use scriptio continua. Unfortunately, I have zero knowledge on how Thai language actually works, and more worse - I am not sure if vanilla Minecraft can handle the rendering of Thai scripts...

I personally believe that this issue must be solved via boundary analysis (for example java.text.BreakIterator can do that; Mojang also uses icu4j, but I can't get that work).
Few months ago, I wrote this because I feel that vanilla isn't doing line wrapping correctly either, and I also wrote the explanation on what I did and why I did so. I was thinking adapting my work into BookTextParsr, but I soon realized that command handling makes the situation even trickier...
At least, I hope that can provide some hints.

stanhebben · Answer 3 · 2018-11-11T10:08:44.000Z

Yeah, command handling makes it difficult to implement a solution with BreakIterator, since text is expanded on the fly (and styles need to be applied to words) so we can't simply feed the input text to a BreakIterator. Even a custom CharacterIterator may be difficult to implement, but I'm thinking about it.

I don't understand your reply concerning Hangul; is it crossed out because Korean does use spaces; or because Hangul isn't assumed Ideographic by Character.isIdeographic?

stanhebben · Answer 4 · 2018-11-11T10:17:07.000Z

I think I may have a solution in mind with the BreakIterator; if processing of commands and positioning of text is performed in separate steps: command handlers can first generate a list of annotated spans. Once these spans are determined, I can have a custom character iterator iterate over these spans, splitting them into lines and performing positioning. Since the BreakIterator doesn't insert or delete characters, I can look up the spans in the original list using the character indices, applying styles appropriately.

3TUSK · Answer 5 · 2018-11-11T10:22:39.000Z

I don't understand your reply concerning Hangul; is it crossed out because Korean does use spaces; or because Hangul isn't assumed Ideographic by Character.isIdeographic?

You can safely ignore my comments regarding hangul - all I want to say is that "Modern Korean does not have this issue because of the use of whitespace". Apologize for the confusion.

stanhebben · Answer 6 · 2018-11-12T00:45:07.000Z

A fix for this has been implemented and should be available in the next release.

3TUSK · Answer 7 · 2018-11-13T18:19:57.000Z

For future reference: fixed by #17.

Share to

Synopsis

Reproduction

Analysis

Final words