MobileRead Forums - View Single Post - Japanese dictionary for Pocketbook 626 Touch Lux 2

GreenAirplane · 02-14-2016, 02:26 PM

My apologies, I should have been more accurate regarding the format. I meant to write that I used Notepad++ to convert the file from a format called "UTF-8 without ROM" to UTF-8.
As for the collates.txt file, I think I know how it works now. Problem is, I haven't the faintest idea how to apply this to Japanese. You see, Japanese uses 3 different alphabets. Actually, 4 if you count the latin alphabet.
Two of these are phonetic, they represent syllables. (They're called hiragana and katakana, and they look like this: ひらがな and カタカナ) Each has about 50 characters. Then there's the Kanji (漢字). There's literally thousands of these rascals.
The entries in the dictionary can be pure hiragana:

Code:

<ar><k>である</k>
(v5r) to be (formal, literary)</ar>

pure katakana:

Code:

<ar><k>ディーゼルエンジン</k>
(n) diesel engine</ar>

or probably the most frequent, Kanji with pronunciation written in hiragana in the parentheses:

Code:

<ar><k>電波 [でんぱ]</k>
(n) electro-magnetic wave (P)</ar>

How do I make a collate file for this?

02-14-2016, 02:26 PM	#5
GreenAirplane Junior Member Posts: 4 Karma: 10 Join Date: Feb 2016 Device: Pocketbook 626 Touch Lux 2	My apologies, I should have been more accurate regarding the format. I meant to write that I used Notepad++ to convert the file from a format called "UTF-8 without ROM" to UTF-8. As for the collates.txt file, I think I know how it works now. Problem is, I haven't the faintest idea how to apply this to Japanese. You see, Japanese uses 3 different alphabets. Actually, 4 if you count the latin alphabet. Two of these are phonetic, they represent syllables. (They're called hiragana and katakana, and they look like this: ひらがな and カタカナ) Each has about 50 characters. Then there's the Kanji (漢字). There's literally thousands of these rascals. The entries in the dictionary can be pure hiragana: Code: <ar><k>である</k> (v5r) to be (formal, literary)</ar> pure katakana: Code: <ar><k>ディーゼルエンジン</k> (n) diesel engine</ar> or probably the most frequent, Kanji with pronunciation written in hiragana in the parentheses: Code: <ar><k>電波 [でんぱ]</k> (n) electro-magnetic wave (P)</ar> How do I make a collate file for this?