Respell pitches does not work for keys with many flats or sharps
Reported version
3.x-dev
Type
Functional
Frequency
Once
Severity
S4 - Minor
Reproducibility
Always
Status
active
Regression
No
Workaround
No
Project
1: Take a very tonal professional well-spelled piece with 4 or 5 sharps or flats
2: respell pitches
3: save result
4: compare 2 scores and observe that the algorithm is only about 85% accurate for all notes.
An example is attached, from the open well tempered clavier project prelude and fugue n.18 in G# minor. There are well over 200 mistakes made, or 83% accuracy over all pitches.
In the paper the algorithm that is implemented promises >95% accuracy.
Attachment | Size |
---|---|
Prelude_and_Fugue_1_-_OpenWTC.mscz | 38.94 KB |
Prelude_and_Fugue_18_-_OpenWTC_respelled.mscz | 46.88 KB |
Comments
You seem to have attached Prelude and Fugue 1 by mistake. Prelude and Fugue 18 can be found at https://musescore.com/opengoldberg/scores/541016.
I am guessing that you are only respelling pitches on a professional well-spelled piece in order to prove a point, since doing so makes no sense otherwise.
The abstract of the paper which (I presume) you are referencing contains the following sentence:
"The algorithm was evaluated on 8 complete piano sonatas by Mozart and had a success rate that is greater than 96 % (10476 pitches were spelled correctly out of 10900 notes that required accidentals – overall number of pitches in 8 sonatas is 40058)."
To me, this does not suggest a promise of >95% accuracy for any given piece.
In reply to You seem to have attached… by mattmcclinch
Marc and I went through this recently (posted here). It seems that (we agreed) "respell pitches" makes decisions based upon the direction of motion chromatic and diatonic and borderin notes, and does not do what is expected, i.e., consider circle-of-fifths difference from the key (or, better "keys, matched major and minor"). This does not reconcile with your reported result. is there a clear description of what the algorithm does somewhere (there is always the code).
If you want to read the paper, have at it. I provided a link to where you can download it.
(delete)
See this test case with very few flats or sharps (and discussion with Marc). https://musescore.org/en/node/291546
In reply to See this test case with very… by [DELETED] 1831606
I routinely encounter (or am asked to comment on) works posted by beginners in either MuseScore or music or both, which routinely misspell easy accidentals in easy keys, such as my example attached -- five notes, Gb, A D D# D in two flats, where clearly F# and Eb were intended. You'd think that'd be a very easy test for Respell Pitches, but no; it prefers correctly-published Mozart. Reposting example here.
To be clear, this issue is about the default spelling of any given pitch at any given time. The "Respell Pitches" command does exactly what it is supposed to do, which is to revert all pitches back to their default spellings.
In reply to To be clear, this issue is… by mattmcclinch
Really? Is Eb not the "default spelling" of the semitone between D and E when the key signature is not only two flats, but includes that very flat? I have not read the paper yet (I grabbed it), but it's possible that it needs more context than the five notes in this example, too, which is not clearly a merit.
Is Eb not the "default spelling" of the semitone between D and E when the key signature is not only two flats, but includes that very flat?
It certainly should be. I am starting to see that the "Respell Pitches" command doesn't always behave predictably after all. There is certainly more going on here than I thought at first.
I'm glad that my fervent "pitch" reached you!
A command that enforced the 7 spellings demanded by the (regnant) key signature, used some simple heuristics for one sharp or flat in either direction, and occasionally made errors between (say) G# and Ab in a natural key signature, would be of more use than a command that agrees that Mozart is cool but can't conclude Eb in the key of Bb, and whose real effect no one could say without a day of study.
In reply to A command that enforced the… by [DELETED] 1831606
There are much better well documented and reasonable simple pitch spelling algorithms than what musescore currently has (irregardless of bugs). I had a longer reply that got caught by the spam filter but PS13 seems very accurate and still pretty easy. you can look up ' The ps13 pitch spelling algorithm' on google scholar.
To the best of my knowledge, respell is intended to do something very different from just restoring the default spelling. The default is static - the spelling of the note between D & E is always D# for certain keys, always Eb for other keys. Respell looks at context. However, as far as I know it only looks at horizontal context - previous and following notes on the same staff. As such I have a hard to believing it could really achieve anything close to 95% success on music with much harmonic complexity. But still, I can easily believe the algorithm could have been implemented incorrectly and that the proposed change in https://github.com/musescore/MuseScore/pull/5257 may well be an improvement.
In reply to To the best of my knowledge,… by Marc Sabatella
You are correct and the paper is quite clear I think, I suspect there is an implementation error. My change does not fix any of the bugs talked about here, although it does fix a more trivial bug.
Would you guys accept a pull request containing an implementation of ps13? I might work on that. I think it could be a lot smaller in footprint than the current implementation, it has better runtime performance, and higher accuracy. I would obviously need some help with review & testing.
In reply to You are correct and the… by boblucassen
Have you implemented it yet? You should certainly try, and experiment with it...
I'm certainly open to replacements, to me the existing one seems all but useless.
I contacted Mr. David Meredith about the possibility of using his ps13 algorithm in MuseScore, and this was his response.
Hi Matt,
Thanks for your interest in the ps13 algorithm. I’d be very happy for you to use the algorithm in MuseScore. Let me know if you want any assistance in implementing it. If you want to use another implementation as a reference, you can look at the Java implementation in my OMNISIA software, which is available on GitHub here:
https://github.com/chromamorph/omnisia-recursia-rrt-mml-2019/blob/80fc4…
You should probably use PS13s1 rather than the full PS13 algorithm. PS13s1 performs better and is much simpler. In OMNISIA, I call the Notes.pitchSpell method with kPre = 10 and kPost = 42. There are other pairs of values that also seem to work very well (see p.145 of my 2006 JNMR paper attached or my PhD thesis here (p.312) for some examples). In OMNISIA, the pitch speller is called in line 194 of Notes.java to spell a MIDI file when it is loaded into a Notes object.
I’ve attached my 2006 JNMR paper.
Good luck with implementing the algorithm and don’t hesitate to get in touch if you need any further advice or assistance.
Kind regards,
Dave
Ok did a quick port to get things started, the original code was a bit odd in places, it's about half the size now and ported to c++, https://gist.github.com/boblucas/1b0b6bf68528708d6be1a91dff1ef3d2 I threw out the whole "Make morphetic pitch list" part which seems to just do octavation, which can be one line, but I might have missed something.
I do get the +90% accuracy on some things I tested but it's still quite naïve IMO.
In reply to Ok did a quick port to get… by boblucassen
Wow, great work! What about my trivial test case? I assume you tried both correctly-notated scores, and the same with deliberate enharmonic errors introduced? Can you characterize its performance? That massive thing boils down to 60 lines of C++?!?!
In reply to Wow, great work! What about… by [DELETED] 1831606
I currently run it outside musescore, because musescore builds to slow for easy testing. I only input midi pitches, so no existing information is available for the algorithm. Your notes are spelled as follows : "Gb A D Eb D". The performance is 'instant', It's pretty much just basic counting, I don't think it's even worth optimizing although you could. Yea, I know right.
I'll try and write my own pitch spelling algo this weekend, I think we can do better it still makes mistakes because it doesn't understand major and minor voice leading, and the static window size is a problem too I think. Among other things.
In reply to I currently run it outside… by boblucassen
I thought awareness of the key signature was key (so to speak) to this algorithm.
In reply to I thought awareness of the… by [DELETED] 1831606
It will try to figure out the local key by counting the frequency of all nearby pitches, and then it aligns notes to fit with that key. This is better than just trusting the key in many cases because most notes will follow the key anyway, and now you can handle cases of modulation much better. You can 'set the key' by not doing this initial counting but using values reminiscent of a specific key.
It's definitely worse for very few notes, like trivial test cases, as there will be too little information.
In reply to I thought awareness of the… by [DELETED] 1831606
Gb is even more wrong with no key signature.
In reply to Gb is even more wrong with… by [DELETED] 1831606
Another problem I noticed is that when you are say solidly in C major it will highly prefer say Eb over D#, and while Eb is way closer circle of fifth speaking it doesn't make much sense in all those nice Mozart sonates.
Some initial thoughts on the key finding part of spelling: https://gist.github.com/boblucas/b78024e96e68005bdf1abdb341fa3af3 Once you have a vector that tells you the probability of all keys at a given time spelling becomes a voice leading problem. I think even just avoiding reusing staff positions for different pitches while following the key as much as possible is a heuristic that will work better than either algorithm.
That's an interesting observation about not working well in short test cases. It makes me wonder if we don't need two algorithms, one for the "big picture" case of importing an entire score via MIDI, and another for respelling a single brief highly-chromatic passage in a more readable fashion. The latter is frankly the only use case I had ever considered, but I realize there are others and different algorithms might be better suited to different cases.
D# isn't "way closer" in C major, esp. when you consider that it doesn't know the difference between A minor and C major (or D Dorian or E Phr etc). D# is far more likely in A minor than is Eb. In either A minor or C major, or D Dorian, etc. F# and Bb are likely, Eb/D# and C#/Db iffier, and G#/Ab a tossup without the mode knowledge.
In reply to D# isn't "way closer" in C… by [DELETED] 1831606
I agree, I was speaking 'as if' I was the algorithm. C F Bb Eb, 3 steps, C G D A E B F# C# G# D# 9 steps, so far away, much bad. This algorithm effectively only 'trains' itself on its local context, it knows nothing about D# being common in a minor, there are no tables.
In reply to I agree, I was speaking 'as… by boblucassen
Of course, I'm not speak of A minor in particular, but the third of the secondary dominant V of V in any minor keys. There is such a note in every minor key (indistinguishable from its relative major in this context).
In reply to Of course, I'm not speak of… by [DELETED] 1831606
I think we are talking past each other, I apologize. The model of this algorithm makes the spelling D# bad and unlikely in a piece that contains a bunch a white keys. This is a bad thing we can all agree, you are correct on the role of D# in such contexts, I'm just not sure how the code as it is could be modified to include this notion.
In reply to I think we are talking past… by boblucassen
Is F# unlikely in a piece full of white keys? Or, in this algorithm, more likely than Gb?
I've never actually written an algorithm for this, but to me it doesn't appear to make sense to count as D# as nine steps removed from the key of C. I don't think we should be counting from the tonic, but from the closest note - in this case B. Or, think in terms of key signatures.
I see Eb as two steps removed, D# as four. An obvious argument in favor of this type of thinking is to ask how closely F# is to the key of C. To me, it seems about as closely related a note as there is, and in fact if I had to guess, I'd pick F# as the single most likely accidental to occur in the key of C major (surely that data exists, though, so need to guess). Furthermore, most of the sharps seem more likely in a harmonic context than most of the flats, because most of them occur in secondary dominants (F# in V/V, C# in V/ii, G# in V/vi, D# in V/iii). On the flat side, only Bb occurs in a secondary dominant. Eb and Ab occur primarily because of chords borrowed from the parallel minor, which I'm betting are not as common as secondary dominants in general, and I might guess Ab (used in iv) is actually more common than Eb despite being further removed on the circle.
So anyhow, to me, if we are considering harmonic context, I think the closeness to the key on the circle counts, but there should perhaps also be a bias in favor of the sharp side. But none of this should be seen as discounting the role of voice leading, which could easily lead to Gb being chosen over F#, Db over C#, etc.
In reply to I've never actually written… by Marc Sabatella
(to Marc) For what it's worth, that's the way I think, too.
You guys are preaching to the choir, the choir knows his tonality.
We can just pick up music21, look at a whole lot of music, build a table of most common accidentals for each key signature and just apply it. It will surely be 99% accurate as most notes are within the key signature anyway.If you don't want data-driven & tables this algorithm isn't half bad. It follows the key, it does do F# correctly, it's fine, it's 98 something% accurate as the paper claims I'm sure (again most notes are within the key or easy accidentals)
If we want better than either we need to do some actual mathematical modeling, and I think even phrasing a given moment as being 'in C' is wrong, each moment of time can be expressed as a chroma vector that has some correlations to various keys. Let's talk about what might be good ways to measure these distances, and about defining voice leading etc.
I quite like my function for weighing pitches at various time distances (from the second gist), combined with some voice leading logic I'm sure we could properly spell D# in a Mozart's twinkle star variations.
There's something to be said for being able to explain what the thing does, and that has to be balanced against the potential gains from an elaborate data-driven algorithm. There's something to be said for a small test sample being able to predict what a larger input would do, even if that is impossible in a "match broad surroundings" algorithm; it speaks against the latter. People who do not have broad or deep knowledge will use this and wonder why it occasionally gets simple things wrong. I guess that's the whole story of getting used to machine learning in the world.
The thing is, I would actually like an algorithm that did a better job of what I thought this was doing - a purely voice-leading analysis. It would be useful hen you have a highly-chromatic passage in one or more parts in the score and you've spelled it "correctly" harmonically but then when looking at the transposed parts, you realize there are things like diminished thirds and what-not that should be respelled. But this is a totally different beast from what we're talking about here.
The ability I would like is to take scores entered with MIDI keyboard help by beginners or other people not competent in the meaning and history of tonality and notation, or who say "don't really care about the difference between F# and Gb, it's the same note anyway, isn't it [sic]", music usually not highly chromatic, and say "fix". Although I could forgive it if it "missed some", I would stop using it if I saw it "correcting" things that are right by making them wrong at a sufficient rate. The task of "spelling pitches" from MIDI is, in that way, very different from that of trying to locate and correct misspellings in a score with possible errors. Saying it "got 90% of a Mozart movement reduced to MIDI right" is impressive, but "it only corrupted 10% of Mozart movement it was asked to correct" is ... not.
Absolutely, that's valuable too. Which is why I am thinking perhaps two different algorithms are called for.