GSoC 2018: Machine Learning Dataset for OMR - Week 8

Posted 5 years ago

Hi :D
Week 8 is done and it was full of closely looking at the results generated using the XMLs generated. We have planned that we will chunk 988 scores into chunks of 150 to 200 scores and closely examine each segment so that by the time we are on the last segment, we will have everything sorted. I have shared some list of shapes on the issue tracker which are missing in OMR. I will keep refining our code accordingly. Also, earlier I have shared the entire results with Herve which consisted of 13,500 files but we need to manage on number of files which are needed to be examined in a single turn as discussed with lasconic. Coming to this week's analysis:

Below is the segmented status of the project:

Current status of the project
We are done with:
1. Porting the OMR work from imeta to master.
2. Grace Notes Implementation for OMR tackling the issue: https://github.com/Audiveris/omr-dataset-tools/issues/27
3. Bracket Implementation for OMR tackling the issue: https://github.com/Audiveris/omr-dataset-tools/issues/26
4. Tuplet Implementation for OMR tackling the issue: https://github.com/Audiveris/omr-dataset-tools/issues/22
5. Time Signature Upper and Lower Halves annotation for OMR tackling the issue: https://github.com/Audiveris/omr-dataset-tools/issues/28
6. Rest Dot Implementation for OMR tackling the issue: https://github.com/Audiveris/omr-dataset-tools/issues/23
7. Simple Image URL for OMR tackling the issue: https://github.com/Audiveris/omr-dataset-tools/issues/30
8. Staccato Dot for OMR tackling the issue: https://github.com/Audiveris/omr-dataset-tools/issues/25
9. SMuFL symbols identifier for OMR tackling the issue: https://github.com/Audiveris/omr-dataset-tools/issues/29
10. Repeat dot implementation for OMR tackling the issue: https://github.com/Audiveris/omr-dataset-tools/issues/24
11. Crash error rectification while XML generation. - An issue which was faced while testing it on different kind of scores.
12. Testing of scores on Musescore so that they generate XML. The application has been tested on a dataset of 988 scores and it works perfectly.
13. Tuplet Implementation made better.
14. Grace Note Implementation corrected. It now has the nested approach as discussed in the issue: https://github.com/Audiveris/omr-dataset-tools/issues/27. Some samples can be seen in the comments.
Clef Implementation made better. No Sym issue corrected. Made changes for using SMuFL names in the code.
Ported all these changes to nasehim7/imeta, which is rebased with the nasehim7/2.3 which has the latest changes from MuseScore/2.3, at this moment.
https://github.com/nasehim7/MuseScore/compare/2.3...nasehim7:imeta
15. Image Format changed to support grayscale image tackling the issue: https://github.com/Audiveris/omr-dataset-tools/issues/31. Testing and resolving issues that I came across. Giving more structure to the code I wrote before. Some changes to my previous commits and adding those to imeta.

Added: Initialized segmented testing of our test data set in chunks of 150 to 200 depending on the complexity of the scores. First, 165 scores are done.

Key accomplishments this week
Ported changes to imeta. Created a PR from nasehim7/imeta to current 2.3. Done closely testing first 165 scores and shared results with Herve on the issue tracker.

Key tasks that stalled
None

Tasks in the upcoming week:
Testing the next segments of our data and accordingly notify Herve and Lasconic. Take their reviews. Work on handpicking some of the scores for Herve to examine. Moreover, more testing and refining of the code is needed. I will discuss with lasconic on proceeding with this chunked approach for passing scores to Herve so that we have subsets to examine and not large dataset in a single go.

It is a terrific project and proceeding it with all my might to get things better and better each day. Hope we are good to go with this project even after post GSoC with this.

Have a Good Day,
Animesh
Github: https://github.com/nasehim7