MobileRead Forums - View Single Post - Advice on how to scrape or use an api for thousands of books?

Ico · 01-05-2024, 10:26 AM

Quote:

Originally Posted by kovidgoyal

calibre does not use google apis it queries the same urls as you do when you use a browser. And google rate limits these queries to 50 odd a day.

That is great. That is what i was meaning to actually ask, sorry i kind of got confused i am trying to finish several projects.

That is what i meant.

Could you please tell me the workflow in a sentence or two.
What websites do you use and would headless browser and selenium be enough?

I wanted to also use a browser but was worried that that site or the search engine might block me.
Someone praised my project and said it would help them with 50000 books.

I immediately thought that my project couldn't accomplish that task and have spent days trying to make it happen as it would be a nice feature as I am working on my portfolio.

I thought about the complexity of the algorithm but even if it were O(n ^ n) even a million operations isn't much and if i get into trouble i could port that python code to go and revert back when Python supersedes the GIL.

The logic i found here:
https://github.com/kovidgoyal/calibr...mazon.py#L1094
https://github.com/kovidgoyal/calibr...ngines.py#L177

goes way over my head.

I don't know if i should be focusing exclusively on cached pages and instant searches or if i could just do a search for {Title} AND {Author} with {publisher}, {rating}, {rating_count} IN.