View Single Post
Old 09-14-2023, 10:28 AM   #11
kandwo
Addict
kandwo ought to be getting tired of karma fortunes by now.kandwo ought to be getting tired of karma fortunes by now.kandwo ought to be getting tired of karma fortunes by now.kandwo ought to be getting tired of karma fortunes by now.kandwo ought to be getting tired of karma fortunes by now.kandwo ought to be getting tired of karma fortunes by now.kandwo ought to be getting tired of karma fortunes by now.kandwo ought to be getting tired of karma fortunes by now.kandwo ought to be getting tired of karma fortunes by now.kandwo ought to be getting tired of karma fortunes by now.kandwo ought to be getting tired of karma fortunes by now.
 
Posts: 360
Karma: 10703708
Join Date: Dec 2020
Device: Kindle Paperwhite 3
Quote:
Originally Posted by nezih View Post
Code:
python .\add_inflections.py --dict-file '.\dicts\Wiktionary Russian-Russian\Wiktionary Russian-Russian.ifo' -j .\inflection_data\Russian.json.gz
I was able to add inflections to the dictionary with the code above. Notice also that I put the files closer to the script, you can try that.

With the unmunched data in inflection_data folder, 1,940,291 synword has been added to dictionary. Here is the ifo file of the output:
Spoiler:
StarDict's dict ifo file
version=3.0.0
bookname=Wiktionary Russian-Russian
wordcount=430770
idxfilesize=12875217
synwordcount=1940291
description=
I didn't realize I had to target the .ifo file specifically. Doing that it all worked. It created a new folder containing the old files and a .syn file in addition.

I've tested it briefly and it seems to work rather well in most cases. The dictionary itself isn't the best due to bad formatting and lack of word stress.

I realized that the Russian wiktionary that can be downloaded from within Koreader itself seems better with an even bigger .syn file (almost twice as big). However, some words just aren't found for whatever reason. Since that dictionary already comes with a .syn file I suppose it would be superfluous to run this script on it, too?

I'll have to experiment further when I have the time.
kandwo is offline   Reply With Quote