Aggregate Alternative News

abrazor · 06-16-2012, 09:38 AM

Can someone please make a recipe for www.sott.net? I have absolutely now skill at this kind of thing. I would really appreciate it! Pretty please? Thank you in advance maybe?

terminalveracity · 06-17-2012, 02:03 PM

Here's a recipe for Signs of the Times using their main feed

Code:

import re
from calibre.web.feeds.news import BasicNewsRecipe
from calibre.ebooks.BeautifulSoup import BeautifulSoup

class SignOfTheTimes(BasicNewsRecipe):
    title          = u'Sign of the Times'
    language       = 'en'
    __author__     = 'TerminalVeracity'
    oldest_article = 31#days
    max_articles_per_feed = 50
    use_embedded_content = False

    extra_css             = """
                               h2{font-size: large; margin: .2em 0; text-decoration: none;}
                               .image-caption{font-size: medium; font-style:italic; margin: 0 0 1em 0;}
                               .article-info{font-size: small; font-style:italic; margin: 0 0 .5em 0;}
                            """

    remove_stylesheets = True
    remove_tags = [
       dict(name='div', attrs={'class':['article-icon','article-print','article-footer']}),
       dict(name='span', attrs={'class':['tiny']}),
    ]

    feeds          = [('Signs', 'http://www.sott.net/xml_engine/signs_rss'),]

    def preprocess_html(self, soup):
        story = soup.find(name='div', attrs={'class':'article'})
        soup = BeautifulSoup('<html><head><title>t</title></head><body></body></html>')
        body = soup.find(name='body')
        body.insert(0, story)
        return soup

The only niggling problem is that there's a stray break between the image and the caption text that I haven't figured out how to remove. Any hints?

Code:

<div class="image-caption"><br /><span class="caption">Microsoft has to buy patents for new Operating system? </span>

kovidgoyal · 06-17-2012, 11:41 PM

Code:

for all div in soup.findAll(attrs={'class':'image-caption'}):
   for br in div.findAll('br'): br.extract()

abrazor · 06-22-2012, 06:04 AM

I pasted the codes above but it does not accept it. Where should I insert the code by Kovidgoyal?

NotTaken · 06-22-2012, 11:57 AM

Just stick it after the body.insert(0, story) in preprocess_html. Remember to respect pythons indentation rules.

terminalveracity · 06-28-2012, 12:48 PM

Here's the fixed recipe. Thanks for the hint Kovid.

Code:

import re
from calibre.web.feeds.news import BasicNewsRecipe
from calibre.ebooks.BeautifulSoup import BeautifulSoup

class SignOfTheTimes(BasicNewsRecipe):
    title          = u'Sign of the Times'
    language       = 'en'
    __author__     = 'TerminalVeracity'
    oldest_article = 31#days
    max_articles_per_feed = 50
    use_embedded_content = False

    extra_css             = """
                               h2{font-size: large; margin: .2em 0; text-decoration: none;}
                               .image-caption{font-size: medium; font-style:italic; margin: 0 0 1em 0;}
                               .article-info{font-size: small; font-style:italic; margin: 0 0 .5em 0;}
                            """

    remove_stylesheets = True
    remove_tags = [
       dict(name='div', attrs={'class':['article-icon','article-print','article-footer']}),
       dict(name='span', attrs={'class':['tiny']}),
    ]

    feeds          = [('Signs', 'http://www.sott.net/xml_engine/signs_rss'),]

    def preprocess_html(self, soup):
        story = soup.find(name='div', attrs={'class':'article'})
        soup = BeautifulSoup('<html><head><title>t</title></head><body></body></html>')
        body = soup.find(name='body')
        body.insert(0, story)
        for div in soup.findAll(attrs={'class':'image-caption'}):
           for br in div.findAll('br'): br.extract()
        return soup

06-16-2012, 09:38 AM	#1
abrazor Junior Member Posts: 5 Karma: 10 Join Date: Nov 2009 Device: Kindle2	Aggregate Alternative News Can someone please make a recipe for www.sott.net? I have absolutely now skill at this kind of thing. I would really appreciate it! Pretty please? Thank you in advance maybe?

06-22-2012, 06:04 AM	#4
abrazor Junior Member Posts: 5 Karma: 10 Join Date: Nov 2009 Device: Kindle2	doesn't work I pasted the codes above but it does not accept it. Where should I insert the code by Kovidgoyal?

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
News is e-mailed as a simple text, yet sends to the device as a news issue	alvareo	Devices	2	12-29-2011 09:33 PM
New Fairbanks Daily News-miner News Recipe -- Need Date inclusion only	rogerx	Recipes	5	08-24-2011 09:12 AM
Recipes for two alternative Chilean news sites: The Clinic Online and El Mostrador	XaleM	Recipes	2	08-18-2011 05:35 PM
Unutterably Silly The alternative alternative Friday Question 21 August 2009	Wetdogeared	Lounge	13	08-26-2009 09:47 AM
How can I aggregate News/Rss feeds to .mobi	tinybilbo	Bookeen	5	11-08-2008 02:07 PM

06-17-2012, 11:41 PM	#3
kovidgoyal creator of calibre Posts: 44,006 Karma: 22669822 Join Date: Oct 2006 Location: Mumbai, India Device: Various	Code: for all div in soup.findAll(attrs={'class':'image-caption'}): for br in div.findAll('br'): br.extract()

06-22-2012, 11:57 AM	#5
NotTaken Connoisseur Posts: 65 Karma: 4640 Join Date: Aug 2011 Device: kindle	Just stick it after the body.insert(0, story) in preprocess_html. Remember to respect pythons indentation rules.