Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Editor

Notices

Reply
 
Thread Tools Search this Thread
Old 05-30-2015, 12:44 PM   #1
hidden.platypus
Connoisseur
hidden.platypus can program the VCR without an owner's manual.hidden.platypus can program the VCR without an owner's manual.hidden.platypus can program the VCR without an owner's manual.hidden.platypus can program the VCR without an owner's manual.hidden.platypus can program the VCR without an owner's manual.hidden.platypus can program the VCR without an owner's manual.hidden.platypus can program the VCR without an owner's manual.hidden.platypus can program the VCR without an owner's manual.hidden.platypus can program the VCR without an owner's manual.hidden.platypus can program the VCR without an owner's manual.hidden.platypus can program the VCR without an owner's manual.
 
hidden.platypus's Avatar
 
Posts: 89
Karma: 190508
Join Date: May 2014
Device: Android
Q: Regex Find and Replace delete surrounding tags

Hi all,

I have no clue what I'm doing, and now I'm stuck.

I apparently understand the find part of the Find and Replace function. It finds what it I want it to find, but the Replace section won't cooperate. So there's something I don't understand. Any help would be appreciated.

I converted a docx to epub using Calibre.

I am now using the Edit Book app in Calibre to do final polishing.

Calibre created a ton of <span class="calibre5"> tags. I'm learning to use css to format ebooks (and learning css). What I'm attempting to do is delete the opening span tag and the closing span tag and leave what's inside the tag alone. I am using css to do what the <span class="calibre5"> tag is already doing.

So I want to remove the span tags surrounding a string and leave the string untouched.

In the code I have this:

Code:
<span class="calibre5">The law is inevitable.</span>
I want to do a search and replace that leaves only this

Code:
The law is inevitable.
My Search string is:
Code:
<span class="calibre5">[^<>]+</span>
But anything I put in the Replace box gets literally replaced.

So if I put:
Code:
(?<=<span class="calibre5">)(.*?)(?=<\/span>)
Then instead of getting
Code:
The law is inevitable.
when I push Replace

I get:
Code:
(?<=<span class="calibre5">)(.*?)(?=<\/span>)

And if i put

Code:
/1
I get
Code:
/1
So you get the picture.

How can I learn what I'm doing wrong?
hidden.platypus is offline   Reply With Quote
Old 05-30-2015, 01:24 PM   #2
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 27,595
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
Your slash is backwards in your replace expression.

Should be \1 and not /1

I liked your first expression best:
Code:
<span class="calibre5">[^<>]+</span>
You just forgot to capture the span contents...
Code:
<span class="calibre5">([^<>]+)</span>
... so you could stick them back in without the span tags using just \1 for the replace.

**Note that your expression will ignore any nested span situations (or any situations with <i> or <b> or the like included in the span). In other words it won't match anything in:
Code:
<span class="calibre5">The law is <span>oops</span> inevitable.
Nested tags can cause problems when trying to use regex alone to alter them if you're not careful.

My suggestion (and I know this going to sound like rampant self-promotion, but this is exactly why I created it the first place)?? I'd use my editor plugin that allows you to remove an opening span and and it's matching closing tag without having to worry about nesting and the like.

Last edited by DiapDealer; 05-30-2015 at 01:36 PM.
DiapDealer is offline   Reply With Quote
Advert
Old 05-30-2015, 01:47 PM   #3
hidden.platypus
Connoisseur
hidden.platypus can program the VCR without an owner's manual.hidden.platypus can program the VCR without an owner's manual.hidden.platypus can program the VCR without an owner's manual.hidden.platypus can program the VCR without an owner's manual.hidden.platypus can program the VCR without an owner's manual.hidden.platypus can program the VCR without an owner's manual.hidden.platypus can program the VCR without an owner's manual.hidden.platypus can program the VCR without an owner's manual.hidden.platypus can program the VCR without an owner's manual.hidden.platypus can program the VCR without an owner's manual.hidden.platypus can program the VCR without an owner's manual.
 
hidden.platypus's Avatar
 
Posts: 89
Karma: 190508
Join Date: May 2014
Device: Android
Thanks for responding.

I was just about to do a
Eureka! I got it kind of post but I really appreciate you responding.

Taught me some things I wasn't seeing properly for some reason.

Quote:
Originally Posted by DiapDealer View Post
Your slash is backwards in your replace expression.

Should be \1 and not /1
So in my frustration I wouldn't have noticed that. Thank you for that. I'll burn it into my memory which slash is proper.

Quote:
Originally Posted by DiapDealer View Post
You just forgot to capture the span contents...
Code:
<span class="calibre5">([^<>]+)</span>
I don't understand what that means. Where do I learn that?

Here's what I ended up doing:

Find:
Code:
<span class="calibre5">(?<mygroup>([^<>]+))</span>
Replace:
Code:
\g<mygroup>

Now I have to sit and think about:

Quote:
Originally Posted by DiapDealer View Post
**Note that your expression will ignore any nested span situations (or any situations with <i> or <b> or the like included in the span). In other words it won't match anything in:
Code:
<span class="calibre5">The law is <span>oops</span> inevitable.

Why would you do that to someone?
Why?

JK

Thank you much.

What's your tool btw? I'm going to click your profile and find out but I felt it would probably be polite to ask you first.
hidden.platypus is offline   Reply With Quote
Old 05-30-2015, 01:49 PM   #4
hidden.platypus
Connoisseur
hidden.platypus can program the VCR without an owner's manual.hidden.platypus can program the VCR without an owner's manual.hidden.platypus can program the VCR without an owner's manual.hidden.platypus can program the VCR without an owner's manual.hidden.platypus can program the VCR without an owner's manual.hidden.platypus can program the VCR without an owner's manual.hidden.platypus can program the VCR without an owner's manual.hidden.platypus can program the VCR without an owner's manual.hidden.platypus can program the VCR without an owner's manual.hidden.platypus can program the VCR without an owner's manual.hidden.platypus can program the VCR without an owner's manual.
 
hidden.platypus's Avatar
 
Posts: 89
Karma: 190508
Join Date: May 2014
Device: Android
Didn't notice that you'd linked it.

Nevermind. Frustration and euphoria have combined to make me drunk.
Such it always is.
hidden.platypus is offline   Reply With Quote
Old 05-30-2015, 04:01 PM   #5
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 27,595
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
Quote:
Find:
Code:
<span class="calibre5">(?<mygroup>([^<>]+))</span>
Replace:
Code:
\g<mygroup>
Not really worth going to the trouble of creating a named group in this case.

Anything (OK not anything--but for the purpose of this exercise) in parentheses is a capture group.

So in my expression:
Code:
<span class="calibre5">([^<>]+)</span>
Everything matched within the parentheses (a capture group) can be substituted in the Find expression using \1. If you had more than one capture group, they would be \2 \3 ... etc.

Your way (named groups) will work, but there's no need to go to all that trouble, IMO.

I learned nearly everything I know about regex from http://www.regular-expressions.info/

Last edited by DiapDealer; 05-30-2015 at 04:03 PM.
DiapDealer is offline   Reply With Quote
Advert
Old 05-31-2015, 12:14 AM   #6
gbm
Wizard
gbm ought to be getting tired of karma fortunes by now.gbm ought to be getting tired of karma fortunes by now.gbm ought to be getting tired of karma fortunes by now.gbm ought to be getting tired of karma fortunes by now.gbm ought to be getting tired of karma fortunes by now.gbm ought to be getting tired of karma fortunes by now.gbm ought to be getting tired of karma fortunes by now.gbm ought to be getting tired of karma fortunes by now.gbm ought to be getting tired of karma fortunes by now.gbm ought to be getting tired of karma fortunes by now.gbm ought to be getting tired of karma fortunes by now.
 
Posts: 2,091
Karma: 8796704
Join Date: Jun 2010
Device: Kobo Clara HD,Hisence Sero 7 Pro RIP, Nook STR, jetbook lite
Quote:
Originally Posted by hidden.platypus View Post
Hi all,

I have no clue what I'm doing, and now I'm stuck.

I apparently understand the find part of the Find and Replace function. It finds what it I want it to find, but the Replace section won't cooperate. So there's something I don't understand. Any help would be appreciated.

I converted a docx to epub using Calibre.

I am now using the Edit Book app in Calibre to do final polishing.

Calibre created a ton of <span class="calibre5"> tags. I'm learning to use css to format ebooks (and learning css). What I'm attempting to do is delete the opening span tag and the closing span tag and leave what's inside the tag alone. I am using css to do what the <span class="calibre5"> tag is already doing.

So I want to remove the span tags surrounding a string and leave the string untouched.
Spoiler:

In the code I have this:

Code:
<span class="calibre5">The law is inevitable.</span>
I want to do a search and replace that leaves only this

Code:
The law is inevitable.
My Search string is:
Code:
<span class="calibre5">[^<>]+</span>
But anything I put in the Replace box gets literally replaced.

So if I put:
Code:
(?<=<span class="calibre5">)(.*?)(?=<\/span>)
Then instead of getting
Code:
The law is inevitable.
when I push Replace

I get:
Code:
(?<=<span class="calibre5">)(.*?)(?=<\/span>)

And if i put

Code:
/1
I get
Code:
/1
So you get the picture.

How can I learn what I'm doing wrong?
I just do a search and replace on the opening span tag, then under tools use Fix Html- all files, to remove the closing span tag.

bernie
gbm is offline   Reply With Quote
Old 06-03-2015, 10:52 AM   #7
Thom*
The Fumbler
Thom* began at the beginning.
 
Posts: 66
Karma: 10
Join Date: Jun 2015
Device: android 4.2/fbreader
You have probably already resolved this, but I simply use:

Search: <span class="calibre5">(.*?)</span>
Replace: \1

Make sure you have selected "regex" for searsh method and that is it.

Now, nested <span> tags could cause a problem, but that is for another discussion.

Best of Luck
Thom* is offline   Reply With Quote
Old 06-03-2015, 11:22 AM   #8
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 27,595
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
Quote:
Originally Posted by Thom* View Post
You have probably already resolved this, but I simply use:

Search: <span class="calibre5">(.*?)</span>
Replace: \1

Make sure you have selected "regex" for searsh method and that is it.

Now, nested <span> tags could cause a problem, but that is for another discussion.

Best of Luck
Code:
<span class="calibre5">([^<>]+)</span>
Is infinitely less potentially destructive than
Code:
<span class="calibre5">(.*?)</span>
precisely because of nested spans.
DiapDealer is offline   Reply With Quote
Old 06-03-2015, 12:12 PM   #9
Thom*
The Fumbler
Thom* began at the beginning.
 
Posts: 66
Karma: 10
Join Date: Jun 2015
Device: android 4.2/fbreader
Fumbling

Interesting, I am new to this and was not familiar with ^. The concern I have is that your search would not find:

<span class="calibre5">Now is the time for all <i>good men</i> to enjoy a coke.</span> or:
<span class="calibre5">Now is the time for all <em>good men</em> to enjoy a coke.</span>

This search would:

<span class="calibre5">(((?!<span).)*?)</span>

Code:
Unrelated, how do you get those "code boxes" to show up in your comment?
Ah, I figured it out.

In case I sound stupid, remember that I am just learning.
Thom* is offline   Reply With Quote
Old 06-03-2015, 02:56 PM   #10
phossler
Wizard
phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.
 
Posts: 1,079
Karma: 412718
Join Date: Jan 2009
Location: Valley Forge, PA, USA
Device: Kindle Paperwhite
I really like DiapDealer's plug in

https://www.mobileread.com/forums/sho...d.php?t=251365

You might look into it since I think it might handle even complex cases like that
Attached Thumbnails
Click image for larger version

Name:	Capture.JPG
Views:	493
Size:	79.7 KB
ID:	138971  
phossler is offline   Reply With Quote
Old 06-03-2015, 05:19 PM   #11
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 27,595
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
Quote:
Originally Posted by Thom* View Post
Interesting, I am new to this and was not familiar with ^. The concern I have is that your search would not find:

<span class="calibre5">Now is the time for all <i>good men</i> to enjoy a coke.</span> or:
<span class="calibre5">Now is the time for all <em>good men</em> to enjoy a coke.</span>
Correct.
But it will also not match:
Code:
<span class="calibre5">Now is the time for all <span class="italic">good men</span> to enjoy a coke.</span>
Whereas the first <span class="calibre5">(.*?)</span> might match:

<span class="calibre5">Now is the time for all <span class="italic">good men</span> to enjoy a coke.</span>

and create malformed html when replaced.

I'd rather have to make a second pass for the calibre5 spans that didn't get picked up with the first search than have the first search botch my html.

I usually try to avoid using regex for matching ending tags with opening tags entirely (especially in potentially nested situations).
DiapDealer is offline   Reply With Quote
Old 06-03-2015, 08:44 PM   #12
Thom*
The Fumbler
Thom* began at the beginning.
 
Posts: 66
Karma: 10
Join Date: Jun 2015
Device: android 4.2/fbreader
So you are suggesting multiple passes to clear <span> tags. I like the idea.

In your example:
Quote:
<span class="calibre5">Now is the time for all <span class="italic">good men</span> to enjoy a coke.</span>
I could first run
Code:
<span class="italic">(((?!<span).)*?)</span>
to get the inner tags, then run
Code:
<span class="calibre5">(((?!<span).)*?)</span>
to clear the outer tags.

Great idea, I will give it a try.
I think it might take more that two passes in some cases.
Thom* is offline   Reply With Quote
Old 06-03-2015, 08:52 PM   #13
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 27,595
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
Quote:
Originally Posted by Thom* View Post
Great idea, I will give it a try.
I think it might take more that two passes in some cases.
Do it for the practice/experience ... then try my editor plugin (which would get all the calibre5 spans in one go--regardless of any nesting).
DiapDealer is offline   Reply With Quote
Old 06-16-2015, 09:59 PM   #14
eschwartz
Ex-Helpdesk Junkie
eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.
 
eschwartz's Avatar
 
Posts: 19,421
Karma: 85397180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
@Thom*-- See: s&r for paired tags

I recommended something like:
Code:
<span(?: class="calibre5")?>((?:(?!<span).)*?)</span>
to match all content inside those span tags, unless there is another span tag in the content.

But really, the OP should avoid the problem to begin with, by opening his DOCX in the editor directly. It will be converted into an EPUB for editing -- without having the CSS flattened!
Or use Toxaris' Word addin.


calibre's conversion is meant for finished products, to move to another device. And usually by end-users, not content creators. That is why the editor was designed to avoid the creation of tag soup.

Last edited by eschwartz; 06-16-2015 at 10:01 PM.
eschwartz is offline   Reply With Quote
Old 06-16-2015, 11:16 PM   #15
Thom*
The Fumbler
Thom* began at the beginning.
 
Posts: 66
Karma: 10
Join Date: Jun 2015
Device: android 4.2/fbreader
I thought I was smart but you had this figured out a year ago.

Foiled again.
Thom* is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
RegEx to replace only inside certain tags? phossler Editor 6 03-03-2015 08:24 PM
Regex find and replace SanatyrZeo Sigil 5 10-29-2012 07:03 AM
Regex problem: Trying to replace surrounding text without effecting the middle ghostyjack Workshop 3 10-09-2012 04:26 PM
RegEx find and replace iblesq Sigil 1 01-10-2011 09:26 PM
REGEX find and replace help please potestus Sigil 13 09-18-2010 04:14 PM


All times are GMT -4. The time now is 11:20 PM.


MobileRead.com is a privately owned, operated and funded community.