Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 06-16-2012, 09:38 AM   #1
abrazor
Junior Member
abrazor began at the beginning.
 
Posts: 5
Karma: 10
Join Date: Nov 2009
Device: Kindle2
Talking Aggregate Alternative News

Can someone please make a recipe for www.sott.net? I have absolutely now skill at this kind of thing. I would really appreciate it! Pretty please? Thank you in advance maybe?
abrazor is offline   Reply With Quote
Old 06-17-2012, 02:03 PM   #2
terminalveracity
Member
terminalveracity got an A in P-Chem.terminalveracity got an A in P-Chem.terminalveracity got an A in P-Chem.terminalveracity got an A in P-Chem.terminalveracity got an A in P-Chem.terminalveracity got an A in P-Chem.terminalveracity got an A in P-Chem.terminalveracity got an A in P-Chem.terminalveracity got an A in P-Chem.terminalveracity got an A in P-Chem.terminalveracity got an A in P-Chem.
 
Posts: 18
Karma: 6000
Join Date: Jun 2012
Device: Kindle
Here's a recipe for Signs of the Times using their main feed

Code:
import re
from calibre.web.feeds.news import BasicNewsRecipe
from calibre.ebooks.BeautifulSoup import BeautifulSoup

class SignOfTheTimes(BasicNewsRecipe):
    title          = u'Sign of the Times'
    language       = 'en'
    __author__     = 'TerminalVeracity'
    oldest_article = 31#days
    max_articles_per_feed = 50
    use_embedded_content = False

    extra_css             = """
                               h2{font-size: large; margin: .2em 0; text-decoration: none;}
                               .image-caption{font-size: medium; font-style:italic; margin: 0 0 1em 0;}
                               .article-info{font-size: small; font-style:italic; margin: 0 0 .5em 0;}
                            """

    remove_stylesheets = True
    remove_tags = [
       dict(name='div', attrs={'class':['article-icon','article-print','article-footer']}),
       dict(name='span', attrs={'class':['tiny']}),
    ]

    feeds          = [('Signs', 'http://www.sott.net/xml_engine/signs_rss'),]

    def preprocess_html(self, soup):
        story = soup.find(name='div', attrs={'class':'article'})
        soup = BeautifulSoup('<html><head><title>t</title></head><body></body></html>')
        body = soup.find(name='body')
        body.insert(0, story)
        return soup
The only niggling problem is that there's a stray break between the image and the caption text that I haven't figured out how to remove. Any hints?
Code:
<div class="image-caption"><br /><span class="caption">Microsoft has to buy patents for new Operating system? </span>
terminalveracity is offline   Reply With Quote
Old 06-17-2012, 11:41 PM   #3
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 44,006
Karma: 22669822
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Code:
for all div in soup.findAll(attrs={'class':'image-caption'}):
   for br in div.findAll('br'): br.extract()
kovidgoyal is offline   Reply With Quote
Old 06-22-2012, 06:04 AM   #4
abrazor
Junior Member
abrazor began at the beginning.
 
Posts: 5
Karma: 10
Join Date: Nov 2009
Device: Kindle2
doesn't work

I pasted the codes above but it does not accept it. Where should I insert the code by Kovidgoyal?
abrazor is offline   Reply With Quote
Old 06-22-2012, 11:57 AM   #5
NotTaken
Connoisseur
NotTaken is fluent in JavaScript as well as Klingon.NotTaken is fluent in JavaScript as well as Klingon.NotTaken is fluent in JavaScript as well as Klingon.NotTaken is fluent in JavaScript as well as Klingon.NotTaken is fluent in JavaScript as well as Klingon.NotTaken is fluent in JavaScript as well as Klingon.NotTaken is fluent in JavaScript as well as Klingon.NotTaken is fluent in JavaScript as well as Klingon.NotTaken is fluent in JavaScript as well as Klingon.NotTaken is fluent in JavaScript as well as Klingon.NotTaken is fluent in JavaScript as well as Klingon.
 
Posts: 65
Karma: 4640
Join Date: Aug 2011
Device: kindle
Just stick it after the body.insert(0, story) in preprocess_html. Remember to respect pythons indentation rules.
NotTaken is offline   Reply With Quote
Old 06-28-2012, 12:48 PM   #6
terminalveracity
Member
terminalveracity got an A in P-Chem.terminalveracity got an A in P-Chem.terminalveracity got an A in P-Chem.terminalveracity got an A in P-Chem.terminalveracity got an A in P-Chem.terminalveracity got an A in P-Chem.terminalveracity got an A in P-Chem.terminalveracity got an A in P-Chem.terminalveracity got an A in P-Chem.terminalveracity got an A in P-Chem.terminalveracity got an A in P-Chem.
 
Posts: 18
Karma: 6000
Join Date: Jun 2012
Device: Kindle
Here's the fixed recipe. Thanks for the hint Kovid.


Code:
import re
from calibre.web.feeds.news import BasicNewsRecipe
from calibre.ebooks.BeautifulSoup import BeautifulSoup

class SignOfTheTimes(BasicNewsRecipe):
    title          = u'Sign of the Times'
    language       = 'en'
    __author__     = 'TerminalVeracity'
    oldest_article = 31#days
    max_articles_per_feed = 50
    use_embedded_content = False

    extra_css             = """
                               h2{font-size: large; margin: .2em 0; text-decoration: none;}
                               .image-caption{font-size: medium; font-style:italic; margin: 0 0 1em 0;}
                               .article-info{font-size: small; font-style:italic; margin: 0 0 .5em 0;}
                            """

    remove_stylesheets = True
    remove_tags = [
       dict(name='div', attrs={'class':['article-icon','article-print','article-footer']}),
       dict(name='span', attrs={'class':['tiny']}),
    ]

    feeds          = [('Signs', 'http://www.sott.net/xml_engine/signs_rss'),]

    def preprocess_html(self, soup):
        story = soup.find(name='div', attrs={'class':'article'})
        soup = BeautifulSoup('<html><head><title>t</title></head><body></body></html>')
        body = soup.find(name='body')
        body.insert(0, story)
        for div in soup.findAll(attrs={'class':'image-caption'}):
           for br in div.findAll('br'): br.extract()
        return soup
terminalveracity is offline   Reply With Quote
Reply

Tags
alternative, news


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
News is e-mailed as a simple text, yet sends to the device as a news issue alvareo Devices 2 12-29-2011 09:33 PM
New Fairbanks Daily News-miner News Recipe -- Need Date inclusion only rogerx Recipes 5 08-24-2011 09:12 AM
Recipes for two alternative Chilean news sites: The Clinic Online and El Mostrador XaleM Recipes 2 08-18-2011 05:35 PM
Unutterably Silly The alternative alternative Friday Question 21 August 2009 Wetdogeared Lounge 13 08-26-2009 09:47 AM
How can I aggregate News/Rss feeds to .mobi tinybilbo Bookeen 5 11-08-2008 02:07 PM


All times are GMT -4. The time now is 12:55 AM.


MobileRead.com is a privately owned, operated and funded community.