04-15-2011, 09:52 AM | #61 |
Wizard
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Just a short note to say I finally had time to run against my full library (16K books). It's great, and definitely should go into the trunk when it's done.
It makes me wonder if automerge should be changed or coordinated with Find Duplicates when it's available to all: 1) Perhaps remove it from "Copy to Library" or make it optional. I was always a bit leery of including it there. It's one thing to automerge new entries for the Calibre Library, but it's different to automerge entries that are already in one library and are merely being copied into another library. I keep worrying that I'll get a post asking why the new library only has 99 new entries when 100 were selected and copied. I was partly swayed into adding it because it provided a way to do automatic duplicate detection on existing entries. 2) Perhaps mirror the multiple "fuzzy match" options in Find Duplicates into automerge. 3) Remove it entirely? Keep Kovid's original warning about duplicate titles, and offer to run Find Duplicates if duplicates are added. The user can always Merge those he wants, or mark them as not duplicates. Any thoughts? Personally, I think I need to do more playing with Find Duplicates, but what I've seen on my real data is great! |
04-15-2011, 10:35 AM | #62 |
Calibre Plugins Developer
Posts: 4,644
Karma: 2162064
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
|
Hey Starson17, sounds like you had a pretty positive experience which is great to hear. I am sure you will like the next version even more with making that search restriction/highlighting/sort stuff all happen automatically.
It is an interesting comment about automerge. For myself I would be sad to see it removed completely. I've never been a fan of the "match on title only" default implementation so to have to go back to that would be a step backwards imho. Re Copy to Library. What would worry me about automerge with this would be the situation of doing "copy and delete", then find that actually it never made it to the destination because of an automerge setting. I don't have an immensely strong opinion on it because I can't see myself being in a scenario of using it I'm afraid. I use Copy to Library a lot but I am migrating unique sets of books author by author so unless I screwed up by independently adding a book to my newer target library I won't hit that situation. As for the algorithms, that's another interesting question. I can see why you are bringing this up . For myself, I like the fairly conservative approach that automerge takes, and know that "worst case" I will end up with some "duplicates" from a slight variation of author name or whatever. So I don't think you would want to offer "Similar Name, Similar Author" as an option in case it was a bit too aggressive? At least once this functionality is put into Calibre as a user you will know that at any time you can periodically check to see what duplicates you have (or do so after adding a bunch of new books). For new formats of a book, having automerge automatically sort that out for me is brilliant and I don't want to lose that. For duplicate formats of a book, that is where (personally) I will be wanting to be creating new book records and manually reviewing by comparing the EPUBs side by side or whatever before making my merge decision. I think even if you made no changes to automerge it does everything I would like from it, but that is just my opinion/needs of course |
Advert | |
|
04-15-2011, 11:21 AM | #63 | ||||||
Wizard
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Quote:
Quote:
Quote:
Quote:
Quote:
Quote:
|
||||||
04-15-2011, 11:41 AM | #64 | ||
Calibre Plugins Developer
Posts: 4,644
Karma: 2162064
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
|
Quote:
Quote:
I guess there are other ideas I could throw into the mix as random thoughts. Like prompting the user during CTL when it detects there are duplicates and asking the user what to do (defaulting to your automerge settings but at least reminding the user what they currently are before doing damage). When CTL/automerge creates a new book record from a duplicate, what date does it give it? The date of creating that duplicate new book, or the date it was imported into the original library? Just curious how "difficult" it would be for a user using "Find Duplicates" to identify which record is their original and which was from the CTL action. |
||
04-15-2011, 12:10 PM | #65 |
Grand Sorcerer
Posts: 11,773
Karma: 7029857
Join Date: Jan 2010
Location: Notts, England
Device: Kobo Libra 2
|
@kiwidude: both the multisort and positionAtCenter changes are in trunk, and will be in today's release.
|
Advert | |
|
04-15-2011, 12:25 PM | #66 | |
Wizard
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Quote:
If Automerge is off, the old date arrives unchanged. (Sorry for hijacking your Find Duplicates thread a bit.) |
|
04-15-2011, 12:42 PM | #67 | |
Calibre Plugins Developer
Posts: 4,644
Karma: 2162064
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
|
Quote:
Thanks for all your Calibre changes supporting this plugin. Here's the new look find options dialog. |
|
04-15-2011, 12:57 PM | #68 |
Grand Sorcerer
Posts: 11,773
Karma: 7029857
Join Date: Jan 2010
Location: Notts, England
Device: Kobo Libra 2
|
I clicked on the OK button of this dialog box and got an error. Whats wrong?
On a more serious note: you are welcome, I like the dialog, and thank you (!) for taking this on. As for the discussion of automerge vs this plugin (built-in eventually), I fall into the 'leave automerge off and check later' camp. Being naturally paranoid, I almost never enable options that automatically combine things. Merging from the results of this plugin are fine with me. Even better would be a merge that tells me what the result will look like before it does it, showing me which formats and metadata end up where, and what (if anything) will be deleted. |
04-15-2011, 01:15 PM | #69 |
Wizard
Posts: 3,450
Karma: 10484861
Join Date: May 2006
Device: PocketBook 360, before it was Sony Reader, cassiopeia A-20
|
I have installed the very first version of plugin.
Then I installed v. 0.2.0 kiwidude says that you have to remove "Find Duplicates.json" file. https://www.mobileread.com/forums/sho...5&postcount=33 I couldn't find the file (and was lazy to use "search" tool on my filesystem, because I have a few large testing Calibre libraries). So I didn't remove Find Duplicates.json Bad Things (TM) started to happen. Like crash of Calibre. So, if you run on Linux, like I do, go to the ~/.config/calibre/plugins and remove "Find Duplicates.json" file before installing v. 0.2.0. You can also remove file *after* Calibre starts to crash on startup. Like I did ;-) |
04-15-2011, 01:20 PM | #70 | |
Calibre Plugins Developer
Posts: 4,644
Karma: 2162064
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
|
Quote:
For instance, I would like to make it a little easier to handle the whole issue of things like: Book 1 has EPUB, MOBI Book 2 has EPUB, PDF Now Book 1 might be your "master" record that you prefer the metadata on. But book 2 has a better EPUB. But you need to open both the EPUB for book 1 and book 2 to even find that out. Then delete the format of EPUB from book 1, then do a merge of Book 2 into book 1. And that is all after you have run the cursor up and down between the books to even see which formats overlap. I'm sure there must be a nicer way to help reduce the steps involved in that with a nice merge gui... |
|
04-15-2011, 04:36 PM | #71 | |
Wizard
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Quote:
Automerge started as a simple: OMG, I'm never going to finish getting all my ebooks into Calibre. I had too many duplicates from global conversion to a new format. It met my personal needs, but it's showing its age as Calibre forges ahead. I traded control for quick and dirty. I really couldn't face opening multiple files, trying to decide which had better formatting, etc. I decided I was just going to keep originals, fix any formatting I didn't like, and go back to the originals if the formatting was so bad I couldn't fix it. I agree with kiwidude that a better Merge interface is needed (but I want it optional so it doesn't interfere or slow down the keyboard based "M" Merge (delete others) and Alt-M Safe Merge (keep others), which I use constantly). I also think we need more control in the Automerge for CTL. (I kind of like Automerge now for direct Add Books). Unfortunately, it's not going to happen soon for me. I've got a lot on my plate for the next few months. |
|
04-15-2011, 07:48 PM | #72 |
Calibre Plugins Developer
Posts: 4,644
Karma: 2162064
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
|
v0.3 Beta
Ok folks, here's the latest. This has one notable omission - the new "Manage exemptions for book" dialog that Charles and I were posting about earlier. I know what needs to be done, just haven't had time and wanted some feedback on a huge number of other changes/additions.
This version will need Calibre 0.7.55 (released today). Main changes:
Definitely been an interesting plugin to work on - albeit utterly consuming my week. Once I get the manage screen sorted then other than any further suggestions that come up I think it is close to done. I would like to release it as a standalone plugin for a week or two before looking at merging it into Calibre - both to let it get thrashed a bit and to give me a break. Other outstanding possible ideas for it I have had:
Enjoy, look forward to your feedback as always... Last edited by kiwidude; 04-19-2011 at 03:57 AM. Reason: Removed attachment as later version on thread |
04-16-2011, 04:29 AM | #73 |
Grand Sorcerer
Posts: 11,773
Karma: 7029857
Join Date: Jan 2010
Location: Notts, England
Device: Kobo Libra 2
|
Really getting there. This version works very well.
Some comments, none of which is very important: - Did "find dups". It found the expected 4 groups. I switched to "show non_dups", selected them all, and said to remove exemptions. It did. I then pressed "find dups", expecting to see the now 6 groups. I saw the original 4. I suggest that pushing the find dups button after removing exemptions should redo the alg. - Same as above, but pressed manually selected find dups. Saw all six, but the sort order is wrong. The two new groups are sorted at the bottom. This is probably caused by the multisort thinking it has already sorted the screen. I suggest that you force a multisort (don't set the ignore flag) after every find_dups. - The 'are you sure' messages should offer the checkbox to not show the message again. In particular, the mark exempt dialog should have this checkbox. The confirm method in gui2.dialogs.confirm_delete.py should be able to do the job. - A hygiene-factor thing. When the user selects 'mark group exempt', it might be a good idea to check if any books not in the group are selected. If they are, then the user is probably confused and thinks that the selection will be used in lieu of the group. (Guess who almost did that ... ) - Idea: if you connect to gui.search.cleared, you will be notified if the user clicks the clear button on the search bar. That would permit clearing the search to do a "clear duplicate results". Should it? |
04-16-2011, 05:19 AM | #74 | |||||
Calibre Plugins Developer
Posts: 4,644
Karma: 2162064
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
|
Quote:
Quote:
Quote:
Quote:
Quote:
I've been thinking more about the title/author independent algorithm thing, the more I think about it the more I am tempted. I would like to see two sliders with labels on the tickmarks (which sadly Qt cannot do out of the box). The slider would have a range values something like: "Identical", "Similar", "Vaguely Similar", "Ignore" One slider for each of title and author. If you set both title and author to "Ignore", then it does an ISBN match. A descriptive text box would summarise the combination you had selected a little bit like it does now. A first time user would get it set to "Identical Title", "Identical Author". The "Vaguely Similar" (or "Fuzzy" or some better name!) author and title selections would do the fuzzier algorithms I suggested above. Any thoughts? I'm just concerned about the permutations - break them apart and the problem goes away. |
|||||
04-16-2011, 05:49 AM | #75 | |||
Grand Sorcerer
Posts: 11,773
Karma: 7029857
Join Date: Jan 2010
Location: Notts, England
Device: Kobo Libra 2
|
Quote:
Quote:
Quote:
Putting aside the above concern, I am not convinced that sliders are the right interface. They imply a level of 'analog' behavior that isn't there, and also don't support tool tips and the like well. I would lean toward radio buttons, with two groups. Group 1 would have ISBN, then the title choices, with the first choice being ignore. Group 2 would have the author choices with the first choice being 'ignore', which would line up horizontally with the title group's ignore (nothing beside the ISBN choice). Choosing ISBN would force group 2 to ignore and disable it. Choosing any title option would enable group 2. Choosing ignore for both options can be an error, or can make one big group. |
|||
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Duplicate Detection | Philosopher | Library Management | 114 | 09-08-2022 07:03 PM |
[GUI Plugin] Plugin Updater **Deprecated** | kiwidude | Plugins | 159 | 06-19-2011 12:27 PM |
Duplicate Detection | albill | Calibre | 2 | 10-26-2010 02:21 PM |
New Plugin Type Idea: Library Plugin | cgranade | Plugins | 3 | 09-15-2010 12:11 PM |
Help with Chapter detection | ubergeeksov | Calibre | 0 | 09-02-2010 04:56 AM |