Human Touch (#1)

• Jun 25, 2019 - 23:34

Quoting Marc Sabatella on a recent thread:
"Playback has been enhanced in some way pretty much each and every release. If you have specific ideas for further improvement, please discuss them in a new thread."

I agree that playback has seen substantial improvements with each release of MuseScore. Thank you to Marc and to all the coders (and testers) who are working to make MuseScore the music tool we all want it to be.

As to the second part of Marc's comment, I spend a lot of time thinking of how MuseScore can be improved. I have no coding skills whatsoever, and I struggle with finding the best words to express my ideas, so I (usually) keep my opinions to myself.

I have been mulling two ideas which could help bring a live or "human" feel to MuseScore playback.

The first of these is what was called "humanize" in the days of Atari sequencers.
"This function adds and subtracts a random amount of time to and from notes' time positions. Notes can thereby receive a certain amount of 'quasi-human imprecision'.... For example, '8/768ths' (768 was Notator's minimum time division) would mean that all the events would be randomized up to 4/768ths forward and 4/768th backward." (Notator SL manual)

A tiny randomization of note-on events breaks the rigidity of computer quantization just enough to add a bit of a human feel to the playback. Even the finest players in the finest orchestras do not play with the precision of a computer clock, nor should they. I do recognize that a whole generation of musicians has grown up with their musical sensibilities shaped by "perfect" computerized timing, so this function may not be appealing to all.

An added bonus to this function is that it displaces note-on events so that large groups of notes are not all starting at precisely the same instant, which could reduce strain on the CPU, which often presents itself as stuttering in playback.

How difficult is this to achieve? Given that the entire Atari Notator program fit very nicely on a 720 disk, I don't think a lot of code went into this function. (But as I said, I have absolutely no knowledge in this department.) On the Atari, the effect was applied on a per-track basis; I can imagine a simple dialogue to apply the effect globally, to a specific part or a section of music.

My second idea? That will have to wait for another day.

Attachment Size
Humanize.jpg 39.81 KB


I imagine one could make a plugin to this extent. However, I'm not entirely sure that simply applying random displacement would sufficiently "humanize" a score. As someone who spends a great deal of time doing exactly that, generally tempo fluctuations make a larger difference than just modulating onTime (at least, depending on context, though I feel that it only really matters if the change is significant).

It'd probably be worth at least trying out such a plugin to see exactly how much of a difference it makes. I suspect it'd be too small to notice, though. It's also worth noting that you want this to be relative both to the length of the note and the speed of the piece- a small variation in a slow part is extremely noticeable (and possibly even incorrect), while in a fast part is utterly indiscernible.

In reply to by LuuBluum

I agree that a "directed" tempo goes a long way towards improving the human sound of playback. Still, having used humanization many times, I can attest that these small imperfections in timing really do sound more human than perfectly synced notes. (especially when used on stacked notes in various instruments or chords...
I have another thought as regards playback of chords, but that will have to wait for another day.)

It should be noted that on the Atari, humanization was generally used to RESTORE imperfections on live performances that had then been subject to error correction functions.

Your suggestion that this could be employed as a plugin rather than part of the MuseScore core makes sense. Though it probably wouldn't require a lot of code, it is likely to be a function that is only of value to a relatively small section of users.

Making random deviations from the theoretical timing of each note would require a redesign of the way MIDI and score relates to each other. I'm not quite an expert on the internals of MuseScore, but I think currently each note duration (whole note, quarter note, ...) is translated to a corresponding and fixed number of MIDI ticks. To accomplish random deviation, each score should have a double version, one for the notation, as currently does, and one for its MIDI realization.
An alternative solution that doesn't need such redesign, is one that many of us use to get improved interpretation of the music: adding frequent tempo changes. This can be accomplished with the resources currently available. The only drawback would be that it is impossible to get the desyncronization you mention, since there is only a single tempo for all the notes within a system.
It would be possible to stay within the limits of the current structure and still get an independent timing for each part/voice by providing a second invisible score with a lot of awkward figures, fo instance a whole note played slightly off time could be replaced by a 64th note rest followed by a 64th note tied with a 32nd note tied with a 16th note tied with an 8th note... This is what really happens when one imports a MIDI generated by a human rendering in a MIDI keyboard if no or very little quantization is applied.
I guess both solutions could be relatively easily implemented as a plugin.

In reply to by fmiyara

I see what you are saying, but I'm not suggesting a timing deviation anywhere near something that would make a difference in the notated values in the score. Changing the note-on point by +/- 4 ticks is well within the normal range for human performance. This, I suppose, is why the function is called "humanize". (Notator also allowed the option of quantizing note display, while leaving the MIDI timing as recorded.)

In the 30 or so years since Notator was first released, the sophistication of computers and music software has grown substantially, so I'm sure that a function like this could be designed to be a far more subtle and useful tool than it once was. For example, the bass notes or percussion could be assigned less deviation so that the timing of the music is less skewed, while other notes receive more humanizing.

In reply to by toffle

I think this idea might break down a bit when working with an orchestra score. It's highly unlikely that the entire viola section would be early or late altogether. I think more realism comes from over use of dynamics and hairpins. You wouldn't write them normally. That and tempo changes. Seems to me that if you're going to "humanize" something, than it ought to be done by a human, not a random setting.

In reply to by bobjp

I agree that the humanizing function is best performed by a human, at first by trial and error and after some experience with a confident working knowledge. I recall when several years ago I used Sibelius, which had a humanizing feature: It was completely disappointing!
I also agree that the same random displacement of all instruments of a section is not what happens, but each one of them plays at a slightly different time (except when the conductor lags, but it is a completely different situation)
I don't say it isn't possible to automate some decisions, but I think it is not as easy as adding random deviations to the timing of each note or to its velocity. Serious work would requiere artificial intelligence. An area where automation could help a lot is in trills, tremolos and drumset rolls, but it would require some research (by analyzing the timing and velocity deviations of human players).

Even if not completely automatic, humanizing playback (by a human!) could be facilitated if some tools were provided:

1) A relative tempo feature. Tempo indications are currently absolute and there is only one way of inputting them either with a humanizing purpose or not: the use of frequent invisible tempo marks (whose visibity and size reduction to prevent overcrowding the score with them must be adjusted one by one, or copying and pasting ). Tempo marks should be reserved for general or average tempo of a movement, section or passage, and a new type of mark, invisible and unprintable by default (though this could be changed as a preference), should be introduced. This would be relative tempo, for instance, as a percentage of the general tempo of the passage (the previous general tempo mark). It could be called expression tempo. Currently they are absolute, which makes it a nightmare to change the general tempo (I don't mean the tempo feature in the play panel but the case where I originally wrote a tempo of 120 and then I change my mind and prefer 116).

2) A tempo envelope drawing canvas. Tempo variations in human playing are at least of two types: general tempo variation, and agogic accent. In the first case, simply the general tempo accelerates or decelerates a bit. In the second case an individual note (or a few consecutive notes) is slowed down. If one could just draw the envelope or profile of the tempo it wouldn't be necessary to input the numbers one by one. The piano roll editor is where this could be impemented. Currently the tempo cannot be edited, even manually, from the piano roll view.

3) A different approach to articulation parameter control. If one reads any book on musical notation, such as Garden Read's or Elaine Gould's, there is a certain agreement as to the duration of articulations like staccato, but nothing is said about the intensity. However, for at least three reasons intensity (reflected as velocity) is or may be part of an articulation: a) Short sounds at the same intensity elicit less loudness than longer sounds of the same intensity (full loudness requires about 200 ms to 500 ms to develop, this is psychoacoustic fact); b) Because of attack slope in the soundfont the peak intensity may not be reached in short sounds so it may be necessary to compensate to prevent loss of loudness; c) Often a staccato carries not only a smaller duration but also some accent. This may be subtle and not explicitly notated. So the inspector for articulations should allow to change duration (as a fraction of the nominal duration) and velocity.

4) A different (or selectable alternative) approach to velocity offset. Currently, the velocity of a note can be controlled from the inspector in two ways: offset and absolute. Absolute is the value of the velocity itself. To my knowledge, Its only use is when importing a MIDI file. Offest offers something different from what is expected. An offset is a fixed increment, such as 10 units of velocity. When selecting 10, it means a 10 % relative increment in velocity, so if the dynamic is p, we have a general velocity of 49, so 10 % means an increment of about 5, that is an absolute velocity of 54. The difference is bearly audible. If the dynamic is f, the general velocity is 96, so we end with an increment of 10, and the final velocity is 106. The difference is more audible. Some tests I've performed with the piano patch reveal that the soundfont is calibrated so that equal velocity increments imply equal loudness (in the psychoacoustic sense) increments. That's why an increment of about 16 yields roughly equal dynamic increase by one mark: p ---> mp, mp ---> mf and so on. Currently, a fixed percent increment (so called "offset") is bearly audible at pp and quite a jump at ff. This is counterintuitive. My solution is to apply 50% at pp and 15 % at f, but clearly is a workaround, one would expect that the same number yield the same perceived accent.

5) A velocity envelope drawing canvas. This could be implemented in the piano-roll window. Many MIDI sequence editors (such as the free open software Sekaiju) have this feature, allowing slight crescendos and diminuendos that aren't worth notating.

6) This may be the most controversial and difficult-to-implement feature: A new aproach to tempo, allowing different tempos for different staves (or even voices). Note that there would still be a general (vertical) tempo ruling the general flow of the music, but there would be deviations.
I am aware that this is complicated in a notation system based on measures and governed by synchronism of the beats of all homologous measures. One way to handle this is by an average synchonism (sort of a phase locked loop, PLL, analogy) so that if an istrument or voice lags, eventually it will have to rush to compensate (so if there is sort of a ritenuto somewhere, then there must be a stringendo a bit later. How to do this consistently is a challenge, but would allow true rubatos in its most pure definition: one hand has a different tempo from the other hand. A parameter to control this would be the maximum desynchronization allowed, measured either in MIDI ticks or a suitable short musical figure like a 32nd note. This would be an equivalent of the maximum allowed phase displacement in the PLL analogy

In reply to by LuuBluum

I'm afraid this is not what I was saying. The fermata suspends the beat for a while and, as you say, it is in force until the next note. The attached score contains an example. The first version contains a fermata x3, so the tempo should be reduced by a factor of 3. (33.33 % according to my relative tempo suggestion). However, only the first half of that note is at 33.33 %. The second version is the desired result, where the tempo for the whole subsequent passage is 33.33% of the original. To fix this using fermatas every single note from then on would have to carry a fermata x3, which is impractical.

Attachment Size
Test_fermata_as_temp_change.mscz 5.8 KB

In reply to by fmiyara

Fermutas worked the way you described back in 2.x. However, there was a massive drawback- you could cause relative changes to your already-relative tempos, resulting in the program failing to properly return to the original tempo.

Say I have a whole note in voice 1 with a half rest and then half note in voice 2. If I place a 2.0 fermuta over the first whole note in 2.x, it would slow the entire measure. If I then placed a 2.0 fermuta over the half note, the second half of the passage would then play 4x slower. Depending on the configuration of fermutas, often enough upon leaving the measure the tempo would remain 2x slower than before. Furthermore, this made consistent changes within a measure outright impossible without pulling out a calculator to do all the calculations.

With the current setup, everything is relative to the base tempo, and no fermuta can overlap. This makes things far easier to handle, and gives much more fine control.

Besides, it's not like this takes more fermutas- at least in my scores, there's an invisible fermuta over practically every single note. This was true in 2.x and remains true in 3.x.

In reply to by LuuBluum

Fermatas are a bad idea for "humanizing" a performance. The purpose of stretching a note is to make the instruments sound imperfect, that is the note has a different start and/or stop time than other instruments. Fermatas affect the stop time of every instrument that has a note on that beat, defeating the purpose of the fermata. Fermatas are fine for orchestra wide notes being extended, like in a rubato section where you can use the fermatas to fine tune the orchestra's tempo note by note.

In reply to by LuuBluum

Ispil, you are right, thanks for the clarification. Anyway, besides being an awkward workaround since, as I commented, what one wants to control is tempo, and only very ocasionally duration, it doesn't really implement what I requested (relative tempo increase) but duration, which is the reciprocal of tempo. If I need a 10 % tempo increase (1.10 times the tempo) I need to reduce the duration to 1/1.10 = 0.909 times the original duration. The fermata is meant to provide a momentaneous suspension of the musical discourse,

In reply to by toffle

toffle, using what jeetee has said, yes. But I'm not completely sure deviations are just random. Strictly random means uncorrelated, i.e., no correlation between the deviation of one note and the next. This would be using random numbers (such as gaussian or uniformly distributed white noise) added to each note's duration. Probably it would be more realistic to add deviations governed by brownian noise instead of white noise.

In reply to by fmiyara

@fmiyaraThank you for your detailed and insightful analysis of this.
When I was posing my original question, I began to wonder how far into musical AI this may lead. You are quite correct that the randomness of human performance is not truly random. And though the method I proposed is applied to each note individually, in fact, it may be more effective to apply to groups of notes, measures or phrases. A human performer may lead the beat slightly or at times trail the beat, but it is unlikely that they would be randomly ahead/behind on consecutive notes. I'm not a coder or even well-schooled in maths, so my proposal was set in terms that were easiest for me to describe. My thought, though, was to suggest a function that required the least changes to the way MS interprets timing.

I purposely left off the idea of note emphasis/velocity, as this is something about which I have another proposal to make. (If I can figure out how to frame the idea) Suffice it to say that note emphasis provides huge potential towards bringing a human feel to computer playback.

A soloist might speedup and get a tad louder on an ascending passage, softer and slower at the peak, and pick up again on the way back down. Maybe. An orchestra might do this at the bidding of the conductor. But these are two different situations. And not random. I think humanizing would be more like making playback more... uh...musical, emotional, and satisfying. Rather than introducing random mistakes.
But musical playback is the never ending quest of any notation software. People that use DAWs do this all the time.
We want to be able to do, on our computers, that which costs hundreds of thousands of dollars in a recording studio. Or at least come close. I'm not sure that is the goal of notation software.

I compose for the fun of it. Years ago, I wrote on paper. I stopped altogether because I had no outlet for my work, or any way to hear it. For decades I wrote nothing. Then my music teacher wife bought v 4 of Sibelius. I was amazed. I've been writing all kinds of stuff ever since. It is very therapeutic for me. It makes a huge difference that MuseScore now reads hairpins better. For me, next would be rits and accel. And, of course, the never-ending sound font improvement search.

In reply to by bobjp

As I said earlier, the finest musicians in the finest orchestras do not hit all their notes at the same instant, (as they are currently played in MS.) They strive for a unified sound, but invariably there will be sounds which are out of place by the merest of milliseconds. I'm not sure that I would call those slight difference in note attacks "mistakes"; they contribute to the richness of the music. This is a part of live music. I'm just wondering out loud what it might take to achieve something more human sounding from MuseScore.

In reply to by jeetee

It would be useful to have a version without such requirement (even if it had no interactive graphics at all). As I can grasp from the images, the real graphic requirements are quite low, since there are many programs that have similar graphical control and don't need any advanced graphic card. For instance, Audacity, or even CoolEdit, a program 20 years old.

In reply to by fmiyara

It's not the "graph" in there that needs it. It's everything graphical: the label texts, buttons, dropdown, …
Supporting OpenGL2 is a QML UI requirement.

It's also a windows choice to only support up to 1.4 in their default windows 10 driver, whilest the default driver for Win7 did support 2.1 for a lot of cards...

Se yes, unfortunately, this is entirely out of our control.

I can tell that MuseScore has improved a lot its playback capabilities in the last two years or so, to a point where the real limitations are now on the side of the user.
Of course there is still room for improvement, particularly making certain actions easier or more intuitive or improving the efficiency in the resource usage, but the main features to get an awesome performance are all there (single note dynamics, freedom to choose or even adapt third party soundfonts, tempo and velocity control, piano roll, hairpin control with a variety of methods, effects).
Particularly I am grateful to the team for having had the ability and willingness to listen to the users' requests to improve playback. It is also very good that the formerly oft-cited phrase that "MuseScore is mainly a music notation application" has been steadily leaving the discourse from version to version.

In reply to by fmiyara

"MuseScore is mainly a music notation application" has been steadily leaving the discourse from version to version

To be very clear: that is still the going mantra. What has changed is that we've had some wonderful contributions on this front. As equally often stated, we're not against better playback, so if the contribution is solid, there's no reason to not include it.
And as notation improves, this indeed opens up some wiggle room for 'secondary' longstanding feature requests to become addressed.

You might be right in certain situations. But I doubt that anyone will notice if one of the second trumpet players in a concert band is a millisecond early. Maybe in a completely dry room (no reverb), but even then I don't think so. Where I think this might make more difference is in pizz. But even then in the case of orchestra, you would have to have an entire section off. Not realistic. Might work for small ensemble. But you would want to usde it sparingly. Too much isn't real, either.

In reply to by bobjp

Indeed, one of the problems with the pizzicato of the MuseScore_General_HQ soundfont is that some notes (not all) have sort of an echo, as if some players lagged by a noticeable fraction of a second.
The most noticeable effect of the subtle lack of syncronism and difference in tuning in the case of instruments such as bowed strings or winds is a smoothing of the timbre, because of partial cancellation of higher harmonics due to phasing. Certainly, it isn't possible to detect differences of a few milliseonds due to a psychoacoustic effect called masking, where the onset of the sound coming from the instrument that is lagging is made inaudible by the fully develped sound of the instrument sounding earlier. So a single onset will be perceived. This is not true for shorter sounds like the pizzicato, only for bowed sounds or wind instrument sounds.

In reply to by bobjp

"I doubt that anyone will notice if one of the second trumpet players in a concert band is a millisecond early.... But you would want to use it sparingly."

That is exactly what I'm suggesting. An effect such as this should NOT be apparent as separate attacks on notes. Applied in moderation, we hear a more realistic performance than perfectly synchronized notes.

I want to return to an earlier point, which may not be relevant with today's computers, but certainly made a difference with earlier MIDI applications. Distributing note-on events over a number of MIDI ticks was a way of reducing the strain on the CPU. Instead of 24 notes sounding at precisely, you may have only four at that specific point, several, at, (or even some at 0.4.4.x - the last tick in the previous measure)

I know that some users are noticing "stuttering" in MS3 playback. (It happens on my 32-bit laptop, but not my 64-bit machine) This may be a way of improving playback where there are too many note-on events for the CPU to process efficiently. (Or it may be totally irrelevant with today's computers.)

In reply to by toffle

I'm just concerned that a "random" introduction of timing variations is the way to go. Even if I was interested in it, I'd want more control. I would think that there are far better ways to imitate realness. When I think of live performance, I find it very difficult to attribute why it sounds live to timing variations. Phrasing, articulation, dynamic and tempo change, acoustics, presence, seem much more important. Sure. someone being off might be in there and may effect some of those other things. But I think that is more down the line. Besides, not much way to do it in a large group.
On my laptop playing a large score, CPU=15-19%, ram=40%. If I only had 4 gb of ram, I could see a problem. CPU doesn't seem to be taxed at all. This a 5 year old run of the mill box. Slow i5 cpu and 8 gb of ram. To run anything any more I think 64 bit and a minimum of 8 gb is a must. And it's just going to get worse. Years ago, when I first looked into music production on a computer, the salesman shook his head. "Oh, you're going to need 64 mb of ram and at least a one gb hard drive." Almost unheard of in a personal computer back then. And very expensive.

In reply to by bobjp

As stated above, randomness in human reaction is much more closely described by brownian noise than by purely random numbers (white noise). Brownian noise has a touch of "memory", that is, the next deviation is influenced by the previous one, and even from several earlier deviations. I think that a first order solution can be completely deterministic, but under the control of a user in possession of a trained musical taste. A further approximation could be attained adding some brownian noise.

Technical note: Brownian noise is the integration of white noise, a mathematical operation that enhances low frequency variations. Using brownian noise works better because phase is also the integration of instntaneous frequency. Tempo is sort of a frequency (the frequency of beat events, or of MIDI ticks), and the timing of such events is equivalent to phase. Human control is predictive through internal clock, so what one controls is the rate of future avents, which is equivalent to tempo. That's why controlling tempo is so much effective than controlling timing.

In a large group there are other effects which could be considered, such as the reaction time to indications of the coductor, which has individual variability, and the distinct delay due to sound propagation, which causes differences in the arrival time of sound to the conductor, which can vary from 10 ms to 30 ms or more. The sound arrival is the only feedback the conductor has to influence the next indications which in turn will influence the next notes the players will play. This effect is deterministic, since it depends on the (fixed) distance of each player both to the director and to a particular listener in the audience. Modern miking and mixing techniques allow to compensate these delays in a recording, generating a far more accurate sound than experienced in a live performance. But I believe we don't need to worry about these phenomena, since the conscious perception of music is, once more, concerned with tempo and not with timing. A timing error is quite perceivable because of the unexpected tempo change to which it is equivalent, not because the time error in itself.

In reply to by fmiyara

"In a large group there are other effects which could be considered, such as the reaction time to indications of the coductor, which has individual variability, and the distinct delay due to sound propagation, which causes differences in the arrival time of sound to the conductor, which can vary from 10 ms to 30 ms or more."

Exactly. A trombone player awating his entry in the last movement of The Pines of Rome (usually from the balcony, or sometimes offstage) knows that he cannot wait to hear his cue, or even necessarily rely on the conductor's baton. He will have to anticipate the beat somewhat to compensate for the distance between his position and the rest of the ensemble.

It is because of the logistical problems of live performance that I don't like to call such differences in timing as "mistakes" as has occurred a couple of times in this thread.

You are also correct that modern recording techniques compensate and correct for timing deviations. (Sometimes at the expense of the integrity of the performance)

Some people are quite hung up on the word "random" in my original post. I only used that term as a direct quote from the Notator manual from 30 years ago. I think a more intelligent/Brownian form of variance would be more effective for reasons discussed in other posts.

The point is that in true live performance, perfection is not possible. The question is, do we want our playback as precise as electronically (mechanically?) possible, or as precise as"humanly" possible? I know what my answer would be.

In reply to by fmiyara

Perhaps. But there is more going on than just the individual players reacting to the conductor. Players are trained also to listen to, react to, and move with each other. Plus, if the group has rehearsed together, what the director does is not so much a surprise. Yes, mistakes happen, but not so much with groups that have played together for a while.
I think the best way to make playback musical is to......make it more musical. Not introduce random mistakes that aren't all that likely to happen. Is it not the goal of every performance to be as musical as possible? Notice I didn't say perfect. Seems to me that IF our goal is to make a musical recording, then we should strive to to introduce the aspects of music that make that recording musical, and not just human. That's what musicians do.
Let's face it, we've pretty much removed the human from the process. That gives us the opportunity to concentrate on the music.

Do you still have an unanswered question? Please log in first to post your question.