01-28-2011, 10:05 PM | #31 |
Calibre Plugins Developer
Posts: 4,653
Karma: 2162064
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
|
Actually just had another thought on the "scope" of the duplicates search. Presumably you could have an option allowing you to choose "books added today", "this week", "this month", "all books" and use that as your "start set" for comparison, rather than comparing every book in your database against every other book every time...
|
01-28-2011, 10:55 PM | #32 |
Well trained by Cats
Posts: 29,981
Karma: 56143930
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
|
I worry about a title "Only" matching. I have 2 or 3 'duplicate titles' that Are Not (different authors-different books.
Then there is the case of different 'Editions' of a book, when it changes publisher and get a edit job I prefer an 'always ask' option (toss, make new entry, Merge), issues that could be held in a queue so as to not interrupt the rest of the batch and presented to the user near the end (like the current problem status, only allow browsing the library before marking what to do. |
Advert | |
|
01-28-2011, 11:27 PM | #33 | |
US Navy, Retired
Posts: 9,865
Karma: 13806776
Join Date: Feb 2009
Location: North Carolina
Device: Icarus Illumina XL HD, Nexus 7
|
Quote:
What they're talking about here is well past the "title only" matching level. |
|
01-29-2011, 08:25 AM | #34 |
Well trained by Cats
Posts: 29,981
Karma: 56143930
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
|
|
01-30-2011, 10:56 AM | #35 | |
Addict
Posts: 239
Karma: 237
Join Date: Jun 2010
Location: OH USA
Device: Sony PRS 900(gave it to my sister); Sony PRS-T1; onyx book note air
|
we all have our addictions
Quote:
My goal is to eventually have every one of those books plus whatever I buy new on my reader so 30,000 does not seem unreasonable to me. |
|
Advert | |
|
02-01-2011, 01:36 PM | #36 | |||||||
Wizard
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Quote:
Quote:
Quote:
Quote:
Quote:
Quote:
Quote:
You could just as easily check one of three options stored near the automerge option, and handle all incoming books according to that option (ignore, overwrite, or add as new dupe record) or you can present that question for each book (preferably with an option to do the selected thing for all the rest of the books). It's not too hard, as each book is being handled individually. Duplicate detection seems to me to be the harder case. All books are compared against all other books. You have to make groups of duplicates. You may have 3 copies of book 1, two copies of book 2, 4 copies of book 3, but one of the 4 copies of book 3 isn't really a dupe and needs to be excluded from the merge, etc. I suppose you could do duplicate detection the same way - individually check each book against the entire dataset, but that would be comparable to adding the entire library to itself - that does take a lot of time. |
|||||||
02-01-2011, 04:20 PM | #37 | |||||
Calibre Plugins Developer
Posts: 4,653
Karma: 2162064
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
|
Quote:
Personally I think popup dialog window will be the way to go to focus the dialog on the task at hand, custom colouring to indicate the groups of duplicate books, a few columns more useful to duplicate resolution etc. However my point was that doing that will mean a lot of functionality users may take for granted on the library view (such as customisable column displays, right-clicks for other actions etc) will not be available, initially at least. Quote:
Quote:
(1) if people wanted it (and Kovid etc was too busy on other things) I could develop it completely independently of any changes to Calibre source, unlike changes to automerge require. (2) There will be many users out there who have never found or intentionally not used the automerge option and have a library with duplicates they want help with identifying (3) Once (if) the automerge suboptions get added and a user chooses the "duplicate format" suboption, they will be creating duplicates and not have a tool to help them identify them. Of course if you and Kovid happened to like the proposal enough to implement the automerge changes so they appeared in Calibre first, that would be just marvellous . As you say those changes are far less work to implement. Quote:
Quote:
And quite frankly if it is just you and me showing any interest in the idea here it won't be very high in my priority list to implement it. I would love more people to comment on whether they think it is a flawed/bad idea, or they would love to see it in Calibre. I won't be offended if they think it's a rubbish idea - on the contrary it would save me many hours of wasted effort. There is always "another way" - but today with Calibre your only choice for ensuring you don't accidentally throw away a better format of a book when adding is to either have automerge off (with various issues that creates) or intentionally give it a different name (requiring you to "know" it was a duplicate first). |
|||||
02-01-2011, 04:37 PM | #38 |
creator of calibre
Posts: 44,019
Karma: 22669822
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Just so you know my current development priorities are unlikely to include working on automerge/duplicates development, so dont wait for me.
And I vote for a separate dialog for duplicate detection, but my vote is not a veto for doing it in the book list, I just think it will be cleaner to code and have more functionality in a separate dialog. |
02-01-2011, 04:44 PM | #39 | |
Wizard
Posts: 3,450
Karma: 10484861
Join Date: May 2006
Device: PocketBook 360, before it was Sony Reader, cassiopeia A-20
|
Quote:
I can imagine that an initial search of each book against all other books would take forever and a bit, but with large libraries users could do this in batches (marking all already checked ebooks) or let it run overnight. The real challenge would be inventing user interface that would offer user groups of identified duplicated letting user accept or reject merging. Also "fuzziness" of the search would have to be carefully balanced so it finds duplicates where author name and title differs somewhat Stephen_King_-_Pet_cemetery_The vs. King_s._-_The_pet_cemetery and yet it doesn't come up with too many false positives. |
|
02-01-2011, 07:22 PM | #40 | ||
Calibre Plugins Developer
Posts: 4,653
Karma: 2162064
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
|
Quote:
Quote:
If you have any further thoughts on what you would/would not want to see on this feel free to drop me an email or PM here if not on the thread. I'll be looking for further comments and feedback before I start coding anything anyways. |
||
02-03-2011, 07:57 AM | #41 | |
Wizard
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Quote:
Any comments? |
|
02-03-2011, 10:03 AM | #42 |
creator of calibre
Posts: 44,019
Karma: 22669822
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Fine by me
|
02-03-2011, 03:54 PM | #43 | |
Calibre Plugins Developer
Posts: 4,653
Karma: 2162064
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
|
Quote:
(1) Kovid's legacy code is an interactive prompted option - and which I thnk it has to be if you are only matching on title. Personally I would never use it due to all the false positives from not comparing authors but fair enough if others find it useful. However my comment is are you saying it will be an "automerge" option to automatically merge on title, or an "automerge" option to not actually automerge and instead be interactively prompted? (2) So am I right in saying your list will *not* (as yet at least) include the option that sparked this thread and several others of creating a duplicate book entry for when a duplicate format is encountered, but merge formats where they are missing? |
|
02-03-2011, 04:54 PM | #44 | ||
Wizard
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
OK
Quote:
(The interface is done, and overwrite and ignore are done - I've still got some work to do on New Record creation for duplicate records.) Quote:
|
||
02-03-2011, 05:01 PM | #45 |
Wizard
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
I'm amazed you can follow all these threads and still write code!
A question: If I write the automerge option box this way: Code:
choices = [(_('Ignore'), 'ignore'), (_('Overwrite'), 'overwrite'), (_('New Record'), 'new record')] r('automerge', gprefs, choices=choices) Code:
if gprefs['automerge'] == 'overwrite': |
Tags |
duplicate |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Duplicate Detection | albill | Calibre | 2 | 10-26-2010 02:21 PM |
Help with Chapter detection | ubergeeksov | Calibre | 0 | 09-02-2010 04:56 AM |
Device Detection doom | Alberto Franches | Calibre | 6 | 06-24-2010 05:38 PM |
Device detection? | totanus | ePub | 1 | 12-17-2009 07:05 AM |
Structure detection v5.5 and v6.2 | AlexBell | Calibre | 2 | 07-29-2009 10:11 PM |