05-30-2015, 12:44 PM | #1 |
Connoisseur
Posts: 89
Karma: 190508
Join Date: May 2014
Device: Android
|
Q: Regex Find and Replace delete surrounding tags
Hi all,
I have no clue what I'm doing, and now I'm stuck. I apparently understand the find part of the Find and Replace function. It finds what it I want it to find, but the Replace section won't cooperate. So there's something I don't understand. Any help would be appreciated. I converted a docx to epub using Calibre. I am now using the Edit Book app in Calibre to do final polishing. Calibre created a ton of <span class="calibre5"> tags. I'm learning to use css to format ebooks (and learning css). What I'm attempting to do is delete the opening span tag and the closing span tag and leave what's inside the tag alone. I am using css to do what the <span class="calibre5"> tag is already doing. So I want to remove the span tags surrounding a string and leave the string untouched. In the code I have this: Code:
<span class="calibre5">The law is inevitable.</span> Code:
The law is inevitable. Code:
<span class="calibre5">[^<>]+</span> So if I put: Code:
(?<=<span class="calibre5">)(.*?)(?=<\/span>) Code:
The law is inevitable. I get: Code:
(?<=<span class="calibre5">)(.*?)(?=<\/span>) And if i put Code:
/1 Code:
/1 How can I learn what I'm doing wrong? |
05-30-2015, 01:24 PM | #2 |
Grand Sorcerer
Posts: 27,595
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
Your slash is backwards in your replace expression.
Should be \1 and not /1 I liked your first expression best: Code:
<span class="calibre5">[^<>]+</span> Code:
<span class="calibre5">([^<>]+)</span> **Note that your expression will ignore any nested span situations (or any situations with <i> or <b> or the like included in the span). In other words it won't match anything in: Code:
<span class="calibre5">The law is <span>oops</span> inevitable. My suggestion (and I know this going to sound like rampant self-promotion, but this is exactly why I created it the first place)?? I'd use my editor plugin that allows you to remove an opening span and and it's matching closing tag without having to worry about nesting and the like. Last edited by DiapDealer; 05-30-2015 at 01:36 PM. |
Advert | |
|
05-30-2015, 01:47 PM | #3 | |||
Connoisseur
Posts: 89
Karma: 190508
Join Date: May 2014
Device: Android
|
Thanks for responding.
I was just about to do a Eureka! I got it kind of post but I really appreciate you responding. Taught me some things I wasn't seeing properly for some reason. Quote:
Quote:
Here's what I ended up doing: Find: Code:
<span class="calibre5">(?<mygroup>([^<>]+))</span> Code:
\g<mygroup> Now I have to sit and think about: Quote:
Why would you do that to someone? Why? JK Thank you much. What's your tool btw? I'm going to click your profile and find out but I felt it would probably be polite to ask you first. |
|||
05-30-2015, 01:49 PM | #4 |
Connoisseur
Posts: 89
Karma: 190508
Join Date: May 2014
Device: Android
|
Didn't notice that you'd linked it.
Nevermind. Frustration and euphoria have combined to make me drunk. Such it always is. |
05-30-2015, 04:01 PM | #5 | |
Grand Sorcerer
Posts: 27,595
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
Quote:
Anything (OK not anything--but for the purpose of this exercise) in parentheses is a capture group. So in my expression: Code:
<span class="calibre5">([^<>]+)</span> Your way (named groups) will work, but there's no need to go to all that trouble, IMO. I learned nearly everything I know about regex from http://www.regular-expressions.info/ Last edited by DiapDealer; 05-30-2015 at 04:03 PM. |
|
Advert | |
|
05-31-2015, 12:14 AM | #6 | |
Wizard
Posts: 2,091
Karma: 8796704
Join Date: Jun 2010
Device: Kobo Clara HD,Hisence Sero 7 Pro RIP, Nook STR, jetbook lite
|
Quote:
bernie |
|
06-03-2015, 10:52 AM | #7 |
The Fumbler
Posts: 66
Karma: 10
Join Date: Jun 2015
Device: android 4.2/fbreader
|
You have probably already resolved this, but I simply use:
Search: <span class="calibre5">(.*?)</span> Replace: \1 Make sure you have selected "regex" for searsh method and that is it. Now, nested <span> tags could cause a problem, but that is for another discussion. Best of Luck |
06-03-2015, 11:22 AM | #8 | |
Grand Sorcerer
Posts: 27,595
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
Quote:
Code:
<span class="calibre5">([^<>]+)</span> Code:
<span class="calibre5">(.*?)</span> |
|
06-03-2015, 12:12 PM | #9 |
The Fumbler
Posts: 66
Karma: 10
Join Date: Jun 2015
Device: android 4.2/fbreader
|
Fumbling
Interesting, I am new to this and was not familiar with ^. The concern I have is that your search would not find:
<span class="calibre5">Now is the time for all <i>good men</i> to enjoy a coke.</span> or: <span class="calibre5">Now is the time for all <em>good men</em> to enjoy a coke.</span> This search would: <span class="calibre5">(((?!<span).)*?)</span> Code:
Unrelated, how do you get those "code boxes" to show up in your comment? In case I sound stupid, remember that I am just learning. |
06-03-2015, 02:56 PM | #10 |
Wizard
Posts: 1,079
Karma: 412718
Join Date: Jan 2009
Location: Valley Forge, PA, USA
Device: Kindle Paperwhite
|
I really like DiapDealer's plug in
https://www.mobileread.com/forums/sho...d.php?t=251365 You might look into it since I think it might handle even complex cases like that |
06-03-2015, 05:19 PM | #11 | |
Grand Sorcerer
Posts: 27,595
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
Quote:
But it will also not match: Code:
<span class="calibre5">Now is the time for all <span class="italic">good men</span> to enjoy a coke.</span> <span class="calibre5">Now is the time for all <span class="italic">good men</span> to enjoy a coke.</span> and create malformed html when replaced. I'd rather have to make a second pass for the calibre5 spans that didn't get picked up with the first search than have the first search botch my html. I usually try to avoid using regex for matching ending tags with opening tags entirely (especially in potentially nested situations). |
|
06-03-2015, 08:44 PM | #12 | |
The Fumbler
Posts: 66
Karma: 10
Join Date: Jun 2015
Device: android 4.2/fbreader
|
So you are suggesting multiple passes to clear <span> tags. I like the idea.
In your example: Quote:
Code:
<span class="italic">(((?!<span).)*?)</span> Code:
<span class="calibre5">(((?!<span).)*?)</span> Great idea, I will give it a try. I think it might take more that two passes in some cases. |
|
06-03-2015, 08:52 PM | #13 |
Grand Sorcerer
Posts: 27,595
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
|
06-16-2015, 09:59 PM | #14 |
Ex-Helpdesk Junkie
Posts: 19,421
Karma: 85397180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
|
@Thom*-- See: s&r for paired tags
I recommended something like: Code:
<span(?: class="calibre5")?>((?:(?!<span).)*?)</span> But really, the OP should avoid the problem to begin with, by opening his DOCX in the editor directly. It will be converted into an EPUB for editing -- without having the CSS flattened! Or use Toxaris' Word addin. calibre's conversion is meant for finished products, to move to another device. And usually by end-users, not content creators. That is why the editor was designed to avoid the creation of tag soup. Last edited by eschwartz; 06-16-2015 at 10:01 PM. |
06-16-2015, 11:16 PM | #15 |
The Fumbler
Posts: 66
Karma: 10
Join Date: Jun 2015
Device: android 4.2/fbreader
|
I thought I was smart but you had this figured out a year ago.
Foiled again. |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
RegEx to replace only inside certain tags? | phossler | Editor | 6 | 03-03-2015 08:24 PM |
Regex find and replace | SanatyrZeo | Sigil | 5 | 10-29-2012 07:03 AM |
Regex problem: Trying to replace surrounding text without effecting the middle | ghostyjack | Workshop | 3 | 10-09-2012 04:26 PM |
RegEx find and replace | iblesq | Sigil | 1 | 01-10-2011 09:26 PM |
REGEX find and replace help please | potestus | Sigil | 13 | 09-18-2010 04:14 PM |