09-07-2021, 02:30 AM | #1 |
Connoisseur
Posts: 59
Karma: 10
Join Date: Mar 2019
Device: Kindle 3 Paperwhite
|
soft hyphens in docx conversion output
Soft hyphens marks (characters U+00AD, or entitities #173 or shy), originally existing in html, are exported to docx (again) as shy characters (code 00AD).
Which is not quite desired behaviour, cause MS Word implements optional word breaks differently, and characters 00AD itself are simply displayed (visually simillary as standard hyphens). Exported docx document containing shy characters can be repaired by searching shy characters (using symbol ^0173), and replacing them: either by Word "optional word break" (^-), or (mostly in my case) just deleting them by replacing by nothing... Anyway: Is such export behaviour intentional? Or - mayby - is for some reason inevitable? Is there any way how to achieve replacing shy characters to MS Word "optional word break" as part of conversion? Last edited by quinta@ebf.cz; 09-07-2021 at 02:40 AM. |
09-07-2021, 05:13 AM | #2 |
creator of calibre
Posts: 44,019
Karma: 22669822
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Well just use the search and replace feature in the conversion dialog to replace the soft hyphen character with whatever you like, (IIRC the zero width non-joiner is what word uses for optional spaces).
Last edited by kovidgoyal; 09-07-2021 at 05:16 AM. |
Advert | |
|
09-07-2021, 05:52 AM | #3 |
creator of calibre
Posts: 44,019
Karma: 22669822
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
And note that in DOCX softhyphens are represented as a special tag not as a unicode character. From the next release calibre will convert soft hyphens to that tag automatically. https://github.com/kovidgoyal/calibr...c9948658be0db8
|
09-07-2021, 10:04 AM | #4 | ||
Connoisseur
Posts: 59
Karma: 10
Join Date: Mar 2019
Device: Kindle 3 Paperwhite
|
Quote:
My attempts to create optional word brakes using suggested Calibre "search and replace" export feature was (yet) not succesfull. Well, all I tried was replacing using expression \u200C (which is unicode value of suggested "zero width non-joiner"), and using expression \u001F (hexa value of 31)... Excuse my naive approach. : ) Possible good reason for converting soft hyphens to OWB as default Calibre export behaviour: MS Word itself is behaving that way. OWB are converted to SHY when exported to HTML, and vice versa (just tested in Word 2010). Quote:
|
||
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Calibre Conversion: Inconsistent Font Output When Converting From Epub to Docx | IndiePublisher | Conversion | 2 | 06-16-2020 02:17 AM |
How to preserve soft hyphens in MOBI output | bronger | Conversion | 2 | 08-27-2019 01:36 AM |
Soft hyphens lost on conversion to EPUB | David Booth | Conversion | 4 | 06-23-2017 06:33 AM |
Soft hyphens on Windows | Styx | Calibre | 4 | 02-13-2015 04:26 AM |
Soft Hyphens | wallcraft | Workshop | 29 | 06-12-2012 04:21 AM |