Can the graphics.library/Text() routine please be modified for the following...
Testing for UTF8 Encoded MultiByte Character where bit-8 is set for the octets in the character and the first character has a leading set of 1 bits (highest to lowest) for the same number of octets as comprise the character.
CodePoint 4E00 is 0xE4B880 and Ideographic meaning "1" in Chinese and Japanese
CodePoint 4E09 is 0xE4B889 and Ideographic meaning "3" in Chinese and Japanese
CodePoints 3041 through 30FF is Hiragana and Katakana Unicode ranges
Hex E38181 through E381BF, E38280 through E38296 for valid Hiragana display.
Hex E382A0 [CodePoint 30A0] through E382BF continued, E38380 to E383BA [ CodePoint 30FA ] for valid Katakana.
Validation will always have the first octet with bit 8 set as "CF"/"DF" masked as comparison identical to the original octet for a two-octet sequence,
"EF" mask-comparable as equal for a 3-octet encoding and "F7" mask-compare-equivalent for a 4-octet encoded sequence.
Some means of ?optional? rendering of detected UTF8 sequences?
I have several text files I need to edit and I would like to at least clearly see the Ideographs I am working with.
UTF8 Glyph/Ideograph CodePoint validation and Rendering...
-
- Posts: 314
- Joined: Mon May 14, 2012 10:26 pm
- Location: 日本千葉県松戸市 / Matsudo City, Chiba, Japan
- Contact:
- colinw
- AmigaOS Core Developer
- Posts: 207
- Joined: Mon Aug 15, 2011 9:20 am
- Location: Brisbane, QLD. Australia.
Re: UTF8 Glyph/Ideograph CodePoint validation and Rendering.
Me too, join the club, it's hard to read the words to my favorite Klingon Opera....Belxjander wrote:Can the graphics.library/Text() routine please be modified for the following...
[...]
I have several text files I need to edit and I would like to at least clearly see the Ideographs I am working with.
It's going to take more than a couple of mods to the Text() function and friends, i'm afraid.
We need a propper rendering engine to provide unicode support, just decoding a UTF-8 byte stream is not going
to do it, that's the really easy part and it's already built-in to the latest beta version of utility.library.
You'll just have to wait until the powers that be, raise unicode rendering to a higher priority, there are somwhat
more pressing issues to solve ATM.
-
- Posts: 314
- Joined: Mon May 14, 2012 10:26 pm
- Location: 日本千葉県松戸市 / Matsudo City, Chiba, Japan
- Contact:
Re: UTF8 Glyph/Ideograph CodePoint validation and Rendering.
Well I am currently working with TimberWolf for the rendering through Cairo and have inconsistent results from my IME work so far.colinw wrote:Me too, join the club, it's hard to read the words to my favorite Klingon Opera....Belxjander wrote:Can the graphics.library/Text() routine please be modified for the following...
[...]
I have several text files I need to edit and I would like to at least clearly see the Ideographs I am working with.
It's going to take more than a couple of mods to the Text() function and friends, i'm afraid.
We need a propper rendering engine to provide unicode support, just decoding a UTF-8 byte stream is not going
to do it, that's the really easy part and it's already built-in to the latest beta version of utility.library.
You'll just have to wait until the powers that be, raise unicode rendering to a higher priority, there are somwhat
more pressing issues to solve ATM.
I'm just trying to pin down a large dataset into something I can work with reversibly at the moment.
After that I'm hoping to make it a lot more reliable (and the system only requires the rendering in graphics library so far)
One issue I am running into is entirely down to timing and have a couple of workarounds I need to check out.
The initial Hiragana Encoding for example...
[ぁ][あ] [ぃ][い] [ぅ][う] [ぇ][え] [ぉ][お]
[か][が] [き][ぎ] [く][ぐ] [け][げ] [こ][ご]
[さ][ざ] [し][じ] [す][ず] [せ][ぜ] [そ][ぞ]
[た][だ] [ち][ぢ] [っ][つ][づ] [て][で] [と][ど]
[な] [に] [ぬ] [ね] [の]
[は][ば][ぱ] [ひ][び][ぴ] [ふ][ぶ][ぷ] [へ][べ][ぺ] [ほ][ぼ][ぽ]
[ま] [み] [む] [め] [も]
[ゃ][や] [ゅ][ゆ] [ょ][よ]
[ら] [り] [る] [れ] [ろ]
[ゎ][わ] [ゐ] [ゑ] [を]
[ん]
[ゔ] [ゕ] [ゖ]
The main delay at the moment is the 2nd translation from Hiragana to Kanji,
with the following example in Unicode order from 4E00 (non-consecutive iteration)
[一]=[ひとつ] [丁]=[ひのと] [七]=[ななつ] [万]=[よろず] [丈]=[たけ] [三]=[みっつ]
Re: UTF8 Glyph/Ideograph CodePoint validation and Rendering.
I'm still waiting for that IME which installs into the input stream and outputs UTF-8 sequences.Belxjander wrote:Can the graphics.library/Text() routine please be modified for the following...
I would appreciate it if you focused only on the IME as we agreed.
The rest will follow but we must have that IME.
ExecSG Team Lead
-
- Posts: 314
- Joined: Mon May 14, 2012 10:26 pm
- Location: 日本千葉県松戸市 / Matsudo City, Chiba, Japan
- Contact:
Re: UTF8 Glyph/Ideograph CodePoint validation and Rendering.
I've been focusing on the IME, and looking at Unicode CodePoints and UTF8 encodings trying to think of a short method to index Japanese Kanji by readings.
I specifically have "chord"ing of input strings now completed and an initial alpha test build is on os4depot.net for anyone to check out.
there is "KDEBUG()" output on whatever channel Sashimi listens to ...
Just don't modify or change the IME files in LIBS: or LOCALE: while it is running...that causes an immediate crash and I don't yet know why.
the [Chord=XXXXXXXX] values are the TagItem search key used for a TagItem array to be used for lookup of Unicode CodePoints right now.
Just put Perception.Library into LIBS: and the Japanese.Language file into Locale:Languages/
Select Japanese in the "Locale" preferences for a preferred language...and it will be active on the next restart.
I also have code set aside for the UTF8 conversion.
Do I need to present the UTF8 as a modified "deadkey" InputEvent? or by some other method?
P.S. I still have further code to add on top of what is already there, I've been focused on getting layered chording happening properly.
I've deliberately offloaded the IME into it's own process separate from input.device so that it is possible to push events back through input.device if that is the way to go.
I specifically have "chord"ing of input strings now completed and an initial alpha test build is on os4depot.net for anyone to check out.
there is "KDEBUG()" output on whatever channel Sashimi listens to ...
Just don't modify or change the IME files in LIBS: or LOCALE: while it is running...that causes an immediate crash and I don't yet know why.
the [Chord=XXXXXXXX] values are the TagItem search key used for a TagItem array to be used for lookup of Unicode CodePoints right now.
Just put Perception.Library into LIBS: and the Japanese.Language file into Locale:Languages/
Select Japanese in the "Locale" preferences for a preferred language...and it will be active on the next restart.
I also have code set aside for the UTF8 conversion.
Do I need to present the UTF8 as a modified "deadkey" InputEvent? or by some other method?
P.S. I still have further code to add on top of what is already there, I've been focused on getting layered chording happening properly.
I've deliberately offloaded the IME into it's own process separate from input.device so that it is possible to push events back through input.device if that is the way to go.
Re: UTF8 Glyph/Ideograph CodePoint validation and Rendering.
Having the IME produce a valid UTF-8 sequence output to anywhere (e.g. serial, stdout, file) is sufficient for now. Nothing more is needed yet.Belxjander wrote:Do I need to present the UTF8 as a modified "deadkey" InputEvent? or by some other method?
ExecSG Team Lead
-
- Posts: 314
- Joined: Mon May 14, 2012 10:26 pm
- Location: 日本千葉県松戸市 / Matsudo City, Chiba, Japan
- Contact:
Re: UTF8 Glyph/Ideograph CodePoint validation and Rendering.
Does the Sashimi debug channel qualify...???ssolie wrote:Having the IME produce a valid UTF-8 sequence output to anywhere (e.g. serial, stdout, file) is sufficient for now. Nothing more is needed yet.Belxjander wrote:Do I need to present the UTF8 as a modified "deadkey" InputEvent? or by some other method?
Image