Volume in Fluidsynth

Posted 3 years ago

in this blog post I want to write about how the volume is calculated in Fluidsynth and describe why I think this solution is problematic.

Lets start at Fluidsynth's heart - the write method of a voice. That is the essential part that generates the actual audio. It generates n samples and adds them to the output. This method gets called in a real time context by the main synth process. Lets have a look at the part that deals with volume. (There is actually a quite similar statement to this right next to this one - only difference is that in the envelope volume of the attack phase is treated somewhat different (begin a linear ramp))
fluid/voice.cpp Voice::wirte(unsigned n, float* out, float* reverb, float* chorus)

            target_amp = fluid_atten2amp (attenuation)
               * fluid_cb2amp (960.0f * (1.0f - volenv_val)
               + modlfo_val * -modlfo_to_vol);

As you can see the volume of the voice depends on attenuation, a value of the amplitude envelope and the modlfo. We're interessted in attenuation since that is what gets set by velocity, CC11 and CC7. But first - what does that call to atten2amp do?


 * fluid_atten2amp
 * in: a value between 0 and 1440, 0 is no attenuation
 * out: a value between 1 and 0
 * Note: Volume attenuation is supposed to be centibels but EMU8k/10k don't
 * follow this.  Thats the reason for separate fluid_cb2amp and fluid_atten2amp.
float fluid_atten2amp(float atten)
      if (atten < 0)
            return 1.0;
      else if (atten >= FLUID_ATTEN_AMP_SIZE)
            return 0.0;
            return fluid_atten2amp_tab[(int) atten];

As you can see it basically looks up a value from a table. This table transforms a cdB value into a factor that can be used as a gain value. As you might know db is calculated like this: v_cdB = 200 * log(v/v_ref) with v_ref in digital systems being 0dBFs or the maximum value can be represented by the system or just 1.0 if you're expecting values between -1.0 and 1.0 for your floating point audio system. Fluidsynth works with values between -1.0 and 1.0.

Lets look up how that table is constructed in Fluidsynth:
fluid/conv.cpp fluid_conversion_config()

      for (int i = 0; i < FLUID_ATTEN_AMP_SIZE; i++)
            fluid_atten2amp_tab[i] = (float) pow(10.0, (double) i / FLUID_ATTEN_POWER_FACTOR);

FLUID_ATTEN_AMP_SIZE is 1441 and FLUID_ATTEN_POWER_FACTOR is -200.0. So it is basically just the equation above solved for v/v_ref. The power factor is negative because we expect the cdB values to be actually negative because we want to generate attenuation (values smaller than 1.0).

But how is the attenuation value determined?
fluid/voice.cpp Voice::update_param(int _gen)

            case GEN_ATTENUATION:
                  attenuation = gen[GEN_ATTENUATION].val * ALT_ATTENUATION_SCALE + gen[GEN_ATTENUATION].mod + gen[GEN_ATTENUATION].nrpn;
                  /* Range: SF2.01 section 8.1.3 # 48
                   * Motivation for range checking:
                   * OHPiano.SF2 sets initial attenuation to a whooping -96 dB
                  attenuation = qBound(0.0f, attenuation, 1440.0f);

So it is the sum of the generator value (weighted with some factor), generator modulation and generator nrpn modulation. What does that mean? In the SoundFont specification we have parameters of a voice or channel, that are controlled by generators which are then changed by modulators. The generator value is the one that is described in the SoundFont (the file) itself. The modulation is changed by parameters like velocity or midi CCs. NRPN are special Midi messages that can change generators as well.

Lets have a look how these modulators work!
fluid/voice.cpp Voice::modulate(bool _cc, int _ctrl)

void Voice::modulate(bool _cc, int _ctrl)
      for (int i = 0; i < mod_count; i++) {
            Mod* m = &mod[i];
            /* step 1: find all the modulators that have the changed controller
             * as input source.
            if (m->has_source(_cc, _ctrl)) {
                  int g = m->get_dest();
                  float modval = 0.0;
                  /* step 2: for every changed modulator, calculate the modulation
                   * value of its associated generator
                  for (int k = 0; k < mod_count; k++) {
                        if (fluid_mod_has_dest(&mod[k], g)) {
                              modval += mod[k].get_value(channel, this);
                  /* step 3: now that we have the new value of the generator,
                   * recalculate the parameter values that are derived from the
                   * generator

So it loops through all defined modulators, checks if the event that was received (like a CC message or velocity) shall change the modulator, looks for the corresponding generator, loops through all modulators again to sum up its values and changes the generator. After the generator was changed the parameters get updated.

Lets have a look at the defined modulators:
fluid/fluid.cpp struct attributes are: dest, src1, flags1, src2, flags2, amount

static const Mod defaultMod[] = {
         -2400 },
      { GEN_PITCH,
        12700.0 },

I highlighted the interesting ones but have a closer and aligned look.


So the attenuation gets influenced by the velocity, CC7 and CC11 - as expected by the SF spec. The first constant in the flag tells whether it is a Midi CC or an internal parameter. The other three say something about in which shape or curve the parameters change. The last one is the amount in which the parameters change. As you can see none of the defined modulators actually uses the second source.

To complete our overview lets have a look how the values of a modulator are gathered and calculated. Lets start with gathering:
fluid/mod.cpp Mod::get_value(Channel* chan, Voice* voice)

      if (mod->src1 > 0) {
            if (mod->flags1 & FLUID_MOD_CC) {
                  v1 = chan->getCC(mod->src1);
            else {  /* source 1 is one of the direct controllers */
                  switch (mod->src1) {
                        case FLUID_MOD_NONE:         /* SF 2.01 8.2.1 item 0: src enum=0 => value is 1 */
                              v1 = range1;
                        case FLUID_MOD_VELOCITY:
                              v1 = voice->vel;

I just copied the lines that are important for attenuation. If it is a Midi CC it is quite simple, it just gets the last CC value that was send to the channel. If it is an internal parameter it gets it from the internal data. Now how do these Midi values get translated to the actually parameters?
fluid/mod.cpp Mod::get_value(Channel* chan, Voice* voice)

                  case 5: /* concave, unipolar, negative */
                        v1 = fluid_concave(127 - v1);

It all depends on which flags are set. In the attenuation case it is always a negative (negative actually means reversed in this case) concave shape. The function above just makes sure the given value is in a sane bound (0 to 127) and looks up the value from a table. (very similar to the atten2amp() above)

At the end this method just returns the product of the amount and the two sources (if there is no first source it gets 0 if there is no second source the second parameter becomes one).
fluid/mod.cpp Mod::get_value(Channel* chan, Voice* voice)

      return mod->amount * v1 * v2;

I wrote a very simple python script which constructs tables in the same way as in fluid/conv.cpp and plotted these values with matplotlib - so we can get an idea what these shapes actually look like!

So here are the basic shapes:
concave.png convex.png concave_neg.png convex_negative.png concave_bipolar.png convex_bipolar.png
And here is how they would add up:
convex_concave_plus_minus.png convex_concave_plus_minus2.png
As you can see reversed concave and convex are actually made up of the same formula - one is just one minus the value of the other. Have a look at the piece of code that construct them:

fluid/conv.cpp fluid_conversion_config()

      /* There seems to be an error in the specs. The equations are
         implemented according to the pictures on SF2.01 page 73. */
      for (int i = 1; i < 127; i++) {
            double x = -20.0 / 96.0 * log((i * i) / (127.0 * 127.0)) / log(10.0);
            fluid_convex_tab[i] = (float) (1.0 - x);
            fluid_concave_tab[127 - i] = (float) x;

I wonder what kind of error in the spec this codes refers to? The main problem is - even if we would use one of the bipolar shapes for CC11 (to realize both crescendos and decrescendos without modifying the velocity) it would either that the crescendo or decrescendo shape wouldn't be a concave one (because in the bipolar ones one half is always actually concave and the other half is convex - you may have a look at the bipolar images again). You could think of a concave-convex shape that mixes both of them and is always concave like this one:
But even this would give us headaches if we want to reach the same volume level that we would reach with a certain velocity, because you actually would need to look up the values in these tables, add them up and find the closet one to the value in the velocity table.

I would propose dropping that particular part about CC11 in the SF spec and instead have CC11 as a second source for the attenuation changed by the velocity and a new flag that would modify the velocity value before being translated via its shape. This way you could simply add up Midi values.

Calculation would be like: v1 = v1+v2 with v2 = (v2-63) * 2

So 63 would not change the current velocity and 127 would change a velocity from 0 to 127.

What do you think of this idea?

EDIT: Other variants I could think of is just dropping volume change via velocity and just change it via CC11 or having linear shapes which would make the math much simpler! I think all of these three variants should work ok with external synths as well.


Do you have access to the Soundfont 2.04 spec?

If not I can send you PDFs.

My immediate reaction is that you have misunderstood the way CC#11 works.

It is not a bi-polar controller in the same way that say Pitch Wheel is.

And the Concave Negative curve is how I would expect it to behave in terms of attenuation - ie the lower the CC#11 value the more attenuation is applied to the signal, and, conversely (of course), the higher the CC#11 value the less attenuation is applied to the signal.

Unless I am missing something here?

Incidentally - is this code from the latest Fluid source? Or is it from MuseScore's somewhat outdated version? I do feel that at some point it would be advisable to try and bring our version more into line with the mainstream version, but whether it should be part of your remit here is questionable.

In reply to by ChurchOrganist

I have this pdf.

Yes I know that it is working as described in the SF definition. I just thought we could make it a bipolar controller to have the proper (de)crescendo via CC11 and maintain also the velocity values.

This code is from MuseScore's Version. I think also that it is a good idea to bring it up to the mainstream Version. Have you read this post?


EDIT: this gdoc I created is about the differences in upstream vs. MuseScore

In reply to by hpfmn

Great post. I believe I understood the problem but I'm not sure. Let me rephrase it and tell me if I'm wrong.

MuseScore currently only uses velocity and volume to impact the actual volume of the output. The volume is set in the mixer, and it's the right place for CC7, so let's not talk about it. The velocity is a property of each note and it's currently changed by dynamics and cresc/decresc lines. Expression would come on top of it and if I understand correctly it would be the resulting volume would be volume * (expression/127) * (velocity/127) (and that what you call non bipolar, a bipolar would be ((expression-64)/127).

So the problem is that if we keep on using velocity and the velocity is v1 at the start of a crescendo, and we use expression to increase the volume during the crescendo from expression e1 to e2, we need to find out what the velocity v2 would be at the end of the crescendo for the next note. Of course we need to reset the expression to e1 before playing the next note (if not the next crescendo would start at e2 and we would be at 127 very fast...). Finding the right velocity for the next note means finding the intersection. That's the headache you are talking about because if the curve shapes, and why you propose an alternative with linear lines.

Please tell me if I have it right so far...

Now, another alternative you mention is to drop the velocity and use the expression controller all the way. So all the notes would have a 64 velocity, and we would start the track with 64 expression. Crescendo/diminuendo and dynamic marking would just change the expression value continuously or not. Eventually, accents or other articulations would still use a velocity increase to trigger different articulations. I kind of like this approach but I wonder if we should generalize it to all instruments or if we should have a configuration in the instruments.xml to switch between the two mode of operation "velocity based" and "expression based". I feel like a piano would use the velocity based approach (and so wouldn't support crescendo on one note, and that's fine, a piano doesn't do that in real life), while a flute would use the expression based approach.

What do you think? @churchorganist, does that make any sense according to you?

In reply to by Nicolas

I think you got it pretty good!

What I also thought instead of dropping velocity generating altogether (what might be not so nice for some external synthesizers) I thought we could just drop the volume impact of the velocity from Fluidsynth. That would also allow proper velocity switching (which is actually pretty standard for synths today!) and keep other aspects that the velocity changes intact.

Maybe there could be something to disable or enable these features on a specific channel!

I'm also very interested in what churorganist thinks!

In reply to by Nicolas

"Now, another alternative you mention is to drop the velocity and use the expression controller all the way. So all the notes would have a 64 velocity, and we would start the track with 64 expression."

Now this would be the way I would handle it as a MIDI programmer. The main dynamic control on that channel would be CC#11 and velocity would remain the same all the way through.

I believe this is the simplest way of handling things, but only for appropriate instruments. As you you say such a system would be totally inappropriate for a piano or a guitar.

Maybe the way forward would be to add a tag in Instruments.xml which would indicate which form of dynamics to use? This could be part of the definition along the lines of...
<dynamic>expression</dynamic> and <dynamic>velocity</dynamic> it would also be possible to set defaults for velocity and expression this way in the same way that they are used in the Articulation definitions.

The problem with changing the curve, as I see it, would be that it may impact use with external synths, and whilst we are in the position to modify FluidSynth if we wish, we do not have that luxury with hardware synths and the plethora of VST instruments out there. Most of which have the ability to route any controller you wish to instrument gain.

In reply to by ChurchOrganist

But there is actually no curve standard as I know?

It might be that many synths use curved CC11 but not all of them. I played a little with Native Instruments Session Strings
And by default they are using velocity and CC11 for Volume (and of course velocity switching for changing samples) but they add up in a linear fashion. You are able to change the overall velocity curve - so it something like volume=curve[(velocity+CC11)]
and the volume curve is flat by default but you have a switch to change from concave over linear to convex - have a look: image

In reply to by hpfmn

Well TBH I'm more used to the expression controller working in a linear fashion - you don't usually get an option to change it with hardware synths, although there have been occasions when it would have been handy.

There is, however, a big difference between a convex/concave curve and the bipolar curve you are suggesting.

In the end it will come down to what works, and there is no harm in experimenting with these things.

My only concern is that the CC#11 controller will behave the same no matter what synth it is being used for.

IMO you should forget about velocity - if necessary we can do this in the soundfont by telling that instrument to play a fixed velocity whatever is being input. In this case all dynamics would be controlled by CC#11 whatever velocity was being sent to the preset.

I must confess I have never tried this when designing a soundfont, so some experimentation may be necessary to get it right, but I have programmed hardware synths in the past to work on a fixed velocity most notably for organ and other pad sounds, and simply using CC#11 for dynamics works fine.

In reply to by hpfmn

But there is actually no curve standard as I know?

The SF04 standard says

The MIDI Continuous Controller 11 data value is used as a Negative Unipolar source; thus the input value of 0 is mapped to
a value of 127/128, an input value of 127 is mapped to 0 and all other values are mapped between 127/128 and 0 in a concave fashion. There is no secondary source for this modulator; thus its effect is the same as the effect of multiplying the amount by 1. The amount of this modulator is 960 cB (or 96 dB) of attenuation.

Is that the curve you are talking about?