OMR in musescore

• created 1 year ago
Version
3.0-dev
Priority
normal
Status
closed
Component
Code
Category
task
Project

Hi All,

I've forked MuseScore and worked on the revival of OMR for a while. Here are some current state of arts:

1, https://github.com/liang-chen/MuseScore/tree/omr_dev/omr/data/IMSLP0845…
2, https://github.com/liang-chen/MuseScore/tree/omr_dev/omr/data/IMSLP5318…
3, https://github.com/liang-chen/MuseScore/tree/omr_dev/omr/data/IMSLP8650…

and the documentation:
https://github.com/liang-chen/MuseScore/blob/omr_dev/omr/README.md

I was using Poppler to import PDF and a graphical model to identify the system configuration (if we don't know in prior how many staves per system). There is still large room for improvement such as building a more sophisticated bar line detector. But the current work will provide an initial framework of lightweight OMR that can be directly used by MuseScore users. I've sent a PR for my updates in OMR part and hopefully it'll be merged very soon. Feel free to discuss if you're interested.

Liang


Comments

Looks good! It seems to be failing a few test at the moment, and it requires a rebase as Jojo said, but I look forward to trying it out once it's ready to be merged. It looks like it's only staves and barlines that are recognised at the moment, but that appears to be enough information to work out where you are in the score - could be useful for transcribing a PDF by playing it in the new note entry mode I'm working on. (Of course, it would be great to have the OMR engine do the transcription for you, but I imagine that is still a fair way away!)

Thanks! I've checked your GSOC project, which is very impressive! I'd like to try it out once it's incorporated into MuseScore. My work was built on the previous "alignment-after-recognition" framework (you already gave the link there), so an empty score will be automatically created along with the recognized systems. I'm now working on this OMR-to-skeleton conversion step and solving problems such as adding page breaks, adjusting margins for good alignment etc. These will be more easily to be solved compared to the harder OMR problem. Improving OMR requires to train a stronger detector using more data and annotations for the symbols of interest. In a more sophisticated version, we'll try to recognize all the symbols simultaneously and the symbols will impose useful constraints to each other. This will probably improve the accuracy a great deal. If this work can benefit some other projects or function with others, that'll be great :)

I just ran your PR on my local machine. The tracking is pretty good and works during note entry and playback too, which is awesome!

I see from your comment that you are going down the training / machine learning route. I actually submitted three proposals for GSoC, and one of them was to build up a set of training scores for OMR from the catalogue of scores on MuseScore.com. Instead of getting users to manually click on symbols to identify them, you could take an existing MSCZ file, export it as a PDF, perform OMR, and then compare the result to the original MSCZ and use the known locations of the symbols in the MSCZ to train the OMR engine. Of course, all of the scores on MuseScore.com use the same fonts and styles, but you could programatically pick a random font and style before exporting as a PDF, and then add some random noise and skew to the PDF to make it looked like a scanned image. This way you can do training without manual intervention. Here's the proposal if you are interested. It wasn't picked for GSoC - probably because the MIDI proposal was considered a "safer option" - but you are welcome to borrow my ideas.

It's a very interesting yet difficult (from my own perspective) topic. The difficulty comes from the multi-voice structure of most music scores. It's necessary to model both horizontal and vertical spacing or alignment from the notation "context" which is unclear how to define. For example if we have two beamed group associated with two staves, one is a octave higher than the other but have the same onsets, then probably they should share the same spacing among note heads and same orientation of main beams, they might be aligned horizontally as well. So the most direct "context" becomes the mutually aligned beams (and their adjacent notes etc.). It becomes vague how we can combine all sorts of information at this level into a single joint statistical model.

There are notational conventions that dictates the positioning of symbols in many places, but these basic rules are also violated occasionally for some reason. The exception is so rare that can hardly be captured by a statistical model that doesn't account for why it should happen.

I thought it's hard, but really look forward to seeing the justification of applying a model for prediction or improve OMR.

Hi Liang,

While you're working on this, what do you think of calling it (in the UI) something like "PDF Copying Assistant"?

By the way, one area for improvement might be recognizing that if there are noteheads touching a barline, it's probably actually a stem.

This is a good name. The problem with note head detection is that the symbol is too small, so if we detect them independently, it's easy to generate false positives. Once the false positive overlap with some bar line, the bar line recognition will fail at that spot. We also need data from different fonts to train a good detector. I tried a note detector at some point but it makes the whole process very slow -- maybe because the detection was performed multiple times on the same staff line. So basically we need to train a better detector and improve the detection speed.

I'll leave it to you if you want to rename the UI. In terms of barlines vs. notes, I was thinking along the lines (pun not intended) of "this is a straight vertical line with only the lines of the staff going across it—barline; this is a straight vertical line with other things connecting to or extremely near the sides—not barline." There are a whole lot of false positive barlines coming out at present.

I support renaming to "PDF Transcription Assistant" or similar until note detection is in place.

I believe that audiveris detects staff lines, bar lines and then notes in that order. Once the lines have been detected they are effectively considered "grey" during note detection - i.e. their pixels are not considered as black or white and so do not inform (or inhibit) note detection. Of course, ideally note detection and barline detection would complement, rather than ignore, each other, but perhaps that is too ambitious in the short term.

I'd be very happy with just the systems, barlines, and time signatures, and all the measures otherwise empty.

The current system allows recognizing barlines and note heads simultaneously, but the note head detection is not accurate and SLOW. So we need improvements of both speed and accuracy. The first requires eliminating duplicate calculations and probably constraining the detection at only possible locations (line and gaps based on staff positions); the second requires training the model and considering more information such as stems, accidentals etc. The training cannot be easily achieved unless we have enough data and a working model.

Clef (maybe as well as keys because their positions so well constrained) is one of the easiest symbols we can detect since the appearance is so different from others. Some results here: https://github.com/liang-chen/Vintager/blob/master/demo/treble_clef_svm…

If we have a good enough note head detector, we can easily add it to the program and pose negative constraints to bar lines. I've put a placeholder (https://github.com/liang-chen/MuseScore/blob/omr/omr/omrpage.cpp#L292) which is currently commented out for the speed consideration.

A bird in the hand is worth two in the bush—recognizing only barlines/not barlines would be terrific.

Barlines should be the focus in the short term, but an ability to recognise other symbols is clearly useful for working out when to expect a barline.

In the longer term, "fully automatic" OMR would be nice, but I would be more than happy to specify some things manually in return for greater accuracy overall. MuseScore could do a "quick and dirty" OMR run and then ask the user to confirm some things before doing the full run:

  1. What is the time signature? (4/4 was detected. If this is wrong then enter the real one)
  2. What is the key signature? (D Major detected)
  3. How many measures are there? (100 detected. Skip this if your score doesn't show bar numbers.)
  4. etc...

This information can be used as constraints on a second OMR pass.

Component Code Miscellaneous
Status closed needs info

I tried to open a PDF in Musescore 3.0.0, but there is no option to select a PDF file, only the usual "all supported files".

Component Miscellaneous Code
Status needs info closed

... which you will find includes PDF. If you need help, or you think something should be different, please start a new thread on the Technology Preview forum instead of commenting on a closed feature request.