Suite of Test Scores

• Nov 24, 2020 - 03:36

I've got an idea:

For the purposes of testing, both in advance and prior to a new version being issued, could the development team elect a dozen scores (with different typesetting features)? Both they and internal testers would put the desktop and mobile apps through their paces using these, and this would take place at any new release to catch regressions and other issues (helping maintain standards and functionality); some of which have also elicited bug-fix releases.

Comments

chen lung

• Nov 24, 2020 - 03:36

I've been using my sole composition (produced especially with various 'bling') to try import/export and layout, for example.

Marc Sabatella

• Nov 24, 2020 - 05:55

We have a test suite involving hundreds of scores already, carefully designed to exploit as many different layout features as possible, with automatic comparison of differences in results on each and every change to the code.

chen lung

• Nov 24, 2020 - 06:21

Are you referring to the V-Tests, Marc?

Jojo-Schmitz

• Nov 24, 2020 - 07:41

And mtests. But for layout indeed mainly vtests

chen lung

• Nov 24, 2020 - 15:52

Can you remind me what mtests do?

jeetee

• Nov 24, 2020 - 15:57

Test logic.

For example there are some 60 test files about repeats/voltas/jumps where we let MuseScore process the input test score and compare the playback measure order with the expected result.

ecstrema

• Nov 26, 2020 - 05:38

While there is a lot of vtests, there definetely could be more.

chen lung

• Nov 27, 2020 - 21:17

I'm wondering if we have tests for drumsets, tablature etc?

ecstrema

• Nov 27, 2020 - 21:23

https://github.com/musescore/MuseScore/tree/master/vtest

There are all the scores. Feel free to add some. Detailed instructions are in the readme.

Jojo-Schmitz

• Nov 27, 2020 - 21:25

And even more so https://github.com/musescore/MuseScore/tree/master/mtest

ecstrema

• Nov 27, 2020 - 21:27

but like you say, there is only one vtest for tablatures and also a single vtest for drumsets. A PR with new scores would probably get merged super fast.

chen lung

• Nov 27, 2020 - 21:19

Automatic tests are great, but shouldn't they be used in conjunction with human testing? It would involve a few of us playing around with functionality and import/export, etc.

Jojo-Schmitz

• Nov 27, 2020 - 21:21

That is why we have nightly builds, alphas, betas, RCs...

chen lung

• Nov 27, 2020 - 22:44

Thanks for the details and links.

As well as these m- and v-tests (small, independent samples), I still think we should nominate scores which would then be chosen by anyone in the team for 'official' specific use/checking at these stages (giving everyone a shared focus); they would embody certain characteristics. Some examples (I'm sure you could think of more):
1. A piano score with complex typesetting (such as the Rachmaninov one featured in this comparison)
2. Rock score (tablature, synthesisers and drumsets)
3. Full orchestral score (traditional instruments ranging from strings and brass to percussion)

Marc Sabatella

• Nov 27, 2020 - 23:27

The problem with trying to do automatic testing real-world scores is they are too dependent on too many different variables - if things change, it's difficult to pin down what is going on. Controlled tests are far superior for automated testing.

But for non automated testing, yes, real world scores are great, and as mentioned, that's exactly why we have nightly builds etc.

chen lung

• Nov 28, 2020 - 00:17

Thanks for the insight regarding implementation of full scores for automatic testing, but the idea was really for human testing. They would compliment one another.

Because our testers appreciate different music, having a particular set of recommended scores (let's say 6 for now) in the library would mean all of us can familiarise ourselves with their intended appearance and sound to better-detect discrepancies as part of regular testing. Incidentally, it might inspire us to create (small) samples from those for the automation side if there's a deficit.

Marc Sabatella

• Nov 29, 2020 - 18:14

And this is what I and others are saying can already happen, just find 6 users willing to install a nightly build and test one of their own scores. Since they are the ones familiar with the scores, they will be in the best position to evaluate any changes they see or hear.

ecstrema

• Nov 27, 2020 - 21:22

That's what public betas are for. In an ideal world though, automatic tests should cover everything.