|
|
Thread Tools | Search this Thread |
07-15-2023, 07:15 AM | #1 |
Connoisseur
Posts: 81
Karma: 10
Join Date: Aug 2022
Device: PC
|
Bloomberg Weekly also failed, unable to crawl content, full of invalid icons
Please take the time to check it out, thank you very much
|
07-17-2023, 12:01 AM | #2 |
Evangelist
Posts: 465
Karma: 82692
Join Date: May 2021
Device: kindle
|
https://github.com/unkn0w7n/calibre/...ef128ea65a5d3c
https://github.com/unkn0w7n/calibre/...ebf0c8faa5580e Fixed both for now, but there must be a better way to do this. I'm not able to get graphs/data images like before or the lists/tables from json, hyperlink tags are also missing. If someone knows ways to make it better, feel free to make those changes to the recipe and submit it here or on github. |
07-17-2023, 06:03 AM | #3 |
Connoisseur
Posts: 81
Karma: 10
Join Date: Aug 2022
Device: PC
|
At least it can be extracted, thanks thanks
|
07-18-2023, 01:10 AM | #4 |
Evangelist
Posts: 465
Karma: 82692
Join Date: May 2021
Device: kindle
|
|
07-18-2023, 06:47 AM | #5 |
Connoisseur
Posts: 81
Karma: 10
Join Date: Aug 2022
Device: PC
|
There is a problem, it is easy to fail in the middle of the download, increase the delay to 20 seconds, still the same problem, can not extract all the articles, only a few articles, please take a look again, thank you very much
|
07-18-2023, 09:08 AM | #6 |
Evangelist
Posts: 465
Karma: 82692
Join Date: May 2021
Device: kindle
|
I've also faced it when testing, so i increased the delay, it seems to be working now. 20 seconds is too much but I think delay is the only tool we have.. maybe we could include random pause.
it also became a lot faster than before as html is made locally. Last edited by unkn0wn; 07-18-2023 at 09:28 AM. |
07-18-2023, 09:39 AM | #7 |
Connoisseur
Posts: 81
Karma: 10
Join Date: Aug 2022
Device: PC
|
Bloomberg’s detection ability is very strong. When it captures about 20%, it will show a failure, and the subsequent content cannot be captured.
|
07-18-2023, 09:52 AM | #8 |
Evangelist
Posts: 465
Karma: 82692
Join Date: May 2021
Device: kindle
|
test this once, use vpn or some other new IP.
|
07-18-2023, 10:18 AM | #9 |
Connoisseur
Posts: 81
Karma: 10
Join Date: Aug 2022
Device: PC
|
It worked, thanks a lot. There is no failure in the middle, but I don't know if it is stable or not
|
07-18-2023, 10:44 AM | #10 |
Evangelist
Posts: 465
Karma: 82692
Join Date: May 2021
Device: kindle
|
hmm, it also worked for me. I'll implement this. Here the delay is random, takes longer time for articles with more text and images.
|
07-19-2023, 01:30 AM | #11 |
Evangelist
Posts: 465
Karma: 82692
Join Date: May 2021
Device: kindle
|
https://github.com/unkn0w7n/calibre/...c43dcccd1a4570
Don't know how this is working, as delay is removed, 5 articles are downloaded simultaneously, but still somehow the pauses makes things so random that bloomberg fails to detect. |
07-19-2023, 01:33 AM | #12 |
creator of calibre
Posts: 44,019
Karma: 22669822
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Note that you can make simultaneous_downloads lower to reduce the 5 as well.
|
07-19-2023, 02:25 AM | #13 |
Evangelist
Posts: 465
Karma: 82692
Join Date: May 2021
Device: kindle
|
I did try that, but this is working and the whole recipe takes less than 15 minutes to fetch.
Earlier it used to take half an hour to fetch and still bloomberg would somehow detect it half way through. Last edited by unkn0wn; 07-19-2023 at 02:33 AM. |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
bloomberg Failed to download articles | fengli | Recipes | 0 | 07-15-2023 06:08 AM |
Los Angeles Times Crawl failed, only title | fengli | Recipes | 2 | 03-24-2023 04:27 AM |
PC word Crawl failed | fengli | Recipes | 4 | 01-06-2023 03:08 AM |
Focus (DE)Only the title, content crawl failure | fengli | Recipes | 0 | 12-20-2022 09:02 PM |
LA Weekly - Trouble - Full articles? | kidblue | Recipes | 21 | 10-09-2010 04:16 PM |