Multibyte chars are not deleted correctly in text

• Feb 21, 2017 - 08:23
Reported version
2.1
Type
Functional
Severity
S4 - Minor
Status
closed
Project

There are different problems but here is at least one.

  • Create a score
  • Create a staff text
  • Press F2, go to Unicode tab, select musical symbol
  • Add 6 times a double repeat measure text
  • Close the dialog
  • Delete once, a ? appears somewhere in the middle and the cursor goes to the start of the text
  • Delete a second time, the cursor move to the end

Comments

I made a mtest...this is what the 6 double-measure look like in the debugger before performing the delete:

			_text	"\154064\156417\154064\156417\154064\156417\154064\156417\154064\156417\154064\156417"	QString
				[0]	    	55348	0xd834	QChar
				[1]	    	56591	0xdd0f	QChar
				[2]	    	55348	0xd834	QChar
				[3]	    	56591	0xdd0f	QChar
				[4]	    	55348	0xd834	QChar
				[5]	    	56591	0xdd0f	QChar
				[6]	    	55348	0xd834	QChar
				[7]	    	56591	0xdd0f	QChar
				[8]	    	55348	0xd834	QChar
				[9]	    	56591	0xdd0f	QChar
				[10]	    	55348	0xd834	QChar
				[11]	    	56591	0xdd0f	QChar

hmm...I haven't figured out the right commands for test file...here is my function:

void TestText::testMultiByteChar()
      {
      Text* text = new Text(score);
      text->initSubStyle(SubStyle::DYNAMICS);

      text->setPlainText("

looks like my last comment didn't display properly because musescore.org apparently can't handle multi-byte chars either! :)

Anyway here was my test function that isn't working because deletePreviousChar() exits because cursor->line() is 0 (which it don't think it should be...)

void TestText::testMultiByteChar()
      {
      Text* text = new Text(score);
      text->initSubStyle(SubStyle::DYNAMICS);

      text->setPlainText("six double-repeat symbols");

      text->startEdit(0, QPoint());
      text->layout();
      text->moveCursorToEnd();

      text->deletePreviousChar();

      QCOMPARE(text->xmlText(), QString("six double-repeat symbols"));
      }

I'm also discovered the Ctrl+Z does not work to undo the deletion of the double repeat sign.

Ok, I figured out what what wrong with my mtest...here is the correct (currently failing) test:

void TestText::testMultiByteChar()
      {
      Text* text = new Text(score);
      text->initSubStyle(SubStyle::DYNAMICS);

      text->setPlainText(QString("six double-repeat symbols"));

      text->layout();
      text->startEdit(0, QPoint());
      text->moveCursorToEnd();
      text->deletePreviousChar();
      text->endEdit();

      QCOMPARE(text->xmlText(), QString("six double-repeat symbols"));
      }

This is what the text looks like after the end edit:

			_text	"\154064\156417\154064\156417\154064\154064\156417\154064\156417"	QString
				[0]	    	55348	0xd834	QChar
				[1]	    	56591	0xdd0f	QChar
				[2]	    	55348	0xd834	QChar
				[3]	    	56591	0xdd0f	QChar
				[4]	    	55348	0xd834	QChar
				[5]	    	55348	0xd834	QChar
				[6]	    	56591	0xdd0f	QChar
				[7]	    	55348	0xd834	QChar
				[8]	    	56591	0xdd0f	QChar

Notice that there are only 9 QChars, even though started off with 12 QChars to start with (where each 2 QChars of 0xd834 0xdd0f represent one double repeat sign)...

I'm going to look into what happens to the cursor...

so it should be noted that that the QChars 4 and 5 are both 0xd834 next to eachother...when the correct version should have an addition QChar of 0xdd0f in between those two.

I have a suspicion with what is going wrong with the original code:

                        if (c.isSurrogate())
                              i->text.remove(rcol, 2);
                        if (i->format.type() == CharFormatType::SYMBOL) {
                              i->ids.removeAt(idx);
                              if (i->ids.empty())
                                    _text.erase(i);
                              }
                        else {
                              i->text.remove(rcol, 1);
                              if (i->text.isEmpty())
                                    _text.erase(i);
                              }
                        simplify();
                        return;

Notice that if c is a surrogate, then there will be two i->text.remove operations...first inside teh first if, and then in the else later down below. I think there should be only know text.remove. I just need to figure the best way these if's and else should be...

I think this is how that TextBlock::remove code should look:

                        if (i->format.type() == CharFormatType::SYMBOL) {
                              i->ids.removeAt(idx);
                              if (i->ids.empty())
                                    _text.erase(i);
                              }
                        else {                              
                              if (c.isSurrogate())
                                    i->text.remove(idx, 2);
                              else
                                    i->text.remove(idx, 1);
                              if (i->text.isEmpty())
                                    _text.erase(i);
                              }
                        simplify();
                        return;

Need to test this out now... But according to what I see in the debugger, the text.remove function is removing QChars, not Unicode characters. (The difference between the two, I believe, is that a Unicode Char could be composed of one or two QChars...)

I see there are no mtests for the following if block:

if (i->format.type() == CharFormatType::SYMBOL) 

So I'm going to add a mtest...make sure I didn't screw up that...

FYI, I've made 3 tests named:

void testBMPDelete();
void testSMPDelete();
void testBMPSMPDelete();

where BMP means Basic Multilingual Plane https://en.wikipedia.org/wiki/Plane_(Unicode)#Basic_Multilingual_Plane which are encoded with only single UTF16 characters.
SMP means Supplementary_Multilingual_Plane https://en.wikipedia.org/wiki/Plane_(Unicode)#Supplementary_Multilingua… which are encoded with surrogate pairs of UTF16 characters.
The testBMPSMPDelete() has both types together and tests delete of both.

Those pass. Now I just need to test delete with CharFormatType::SYMBOL.

I was looking for a test for symbol...I was adding a little to TestText::testSpecialSymbols() where have "&&", move cursor to end, delete previous char...but turns out that is not what is meant by CharFormatType::SYMBOL...so I still need to figure out how to make CharFormatType::SYMBOL test...

Status (old) patch (code needs review) fixed

Fixed in branch master, commit 1b67e23194

fix #176151 MultiByte char TextBlock::remove()

Previously TextBlock::remove() did not delete Supplementary Multilingual Plane Unicode chars correctly.

Added tests using deletePreviousChar() for SMP Unicode as well as regular BMP Unicode as well as for text that has mixed BMP, SMP, and SMUFL symbols.

Fixed in branch 2.1, commit 4a73a72e5c

fix #176151 MultiByte char TextBlock::remove()

Previously TextBlock::remove() did not delete Supplementary Multilingual Plane Unicode chars correctly.

Added tests using deletePreviousChar() for SMP Unicode as well as regular BMP Unicode as well as for text that has mixed BMP, SMP, and SMUFL symbols.