Multibyte chars are not inserted correctly in text

• Feb 22, 2017 - 16:03
Reported version
2.1
Type
Functional
Severity
S4 - Minor
Status
closed
Project
Tags

Supplemental issue for #176151: Multibyte chars are not deleted correctly in text but this is for insertion instead of deletion.

Minimum reproduction steps:

  1. new score
  2. add system text to meas 1
  3. press F2, select Unicode tab, select Musical Symbol
  4. insert any musical symbol
  5. move cursor to start of textbox
  6. insert any musical symbol

Result is that there is are 3 symbols displayed in the text box:

  1. a ? symbol
  2. one of the musical symbols
  3. a ? symbol

Should just display the two inserted musical symbols.

The cause is most like due to insertion code not properly understanding Supplemental Multilingual Plane characters in Unicode.


Comments

It took me a while to produce a failing mtest to work from...finally got it:

void TestText::testSMPCopyPaste()
      {
      Text* text = new Text(score);
      text->initSubStyle(SubStyle::DYNAMICS);
      text->setPlainText(QString("symbol"));
      text->layout();
      text->startEdit(0, QPoint());
      text->moveCursorToStart();
      text->paste();
      text->endEdit();
      QCOMPARE(text->xmlText(), QString("symbolsymbol"));
      text->moveCursorToEnd();
      text->paste();
      text->endEdit();
      QCOMPARE(text->xmlText(), QString("symbolsymbolsymbol"));
}

Actual : "\uD834\uD834\uDD0F\uDD0F"
Expected : "\uD834\uDD0F\uD834\uDD0F"

I've discovered the problem...in function:

void Text::insert(TextCursor* cursor, QChar c)
      {
      if (cursor->hasSelection())
            deleteSelectedText();
      if (cursor->line() >= _layout.size())
            _layout.append(TextBlock());
      if (c == QChar::Tabulation)
            c = QChar::Space;
      else if (c == QChar::LineFeed) {
            _layout[cursor->line()].setEol(true);
            cursor->setLine(cursor->line() + 1);
            cursor->setColumn(0);
            if (_layout.size() <= cursor->line())
                  _layout.append(TextBlock());
            }
      else {
            _layout[cursor->line()].insert(cursor, QString(c));
            cursor->setColumn(cursor->column() + 1);
            }
      cursor->clearSelection();
      }

The else part is where the insertion occurs. What happens is Text::insert() gets called twice in a row when inserting one SMP char. But this else block increases the cursor column by 1 for each insertion. That means when the 2nd half of the SMP char is inserted, it is inserted in the wrong spot.

Status (old) patch (code needs review) fixed

Fixed in branch master, commit 2d6ce9c635

fix #176601 Supplemental Unicode select, insert, paste

Any operations dealing with Supplemental Unicode must operate on high & low surrogates simulatenously, since they belong together as a pair:
-Text::insert method has a new version of which takes two QChars as input (the high & low surrogates), and inserts them together before incrementing the cursor.
-TextBlock::fragment method now has return parameter ridx (relative QChar index) in addition to rcol (relative cursor position). For Basic Unicode TextFragments, ridx will be same as rcol, since each Basic Unicode char is one QChar. But for text that has Supplemental Unicode, ridx will be be different since each Supplemental Unicode char is two QChars.
-TextBlock::text method now includes any high surrogate QChars in the returned string. Previously, this method simply skipped over QChars that were high surrogates. When iterating over the TextFragment, only increments col when comming across a QChar that is not a high surrogate.
-Text::createLayout, Text::paste and Text::drop methods now uses the new Text::insert method to insert both high & low surrogates together. Previously, the cursor position would get messed up when each QChar was added seperately.
-TextBlock::remove(int start, int n) will now add both QChars of the surrogate pair to the return string.

Also was issue with adding text after a SMUFL symbol if the cursor mode was already in SMUFL. Fixed by making sure to append a new TextFragment regardless of state of cursor, and also making TextFragment::TextFragment(TextCursor* cursor, const QString& s) constructor set the type to CharFormatType::TEXT.

Fixed in branch 2.1, commit 13a126e5ee

fix #176601 Supplemental Unicode select, insert, paste

Any operations dealing with Supplemental Unicode must operate on high & low surrogates simulatenously, since they belong together as a pair:
-Text::insert method has a new version of which takes two QChars as input (the high & low surrogates), and inserts them together before incrementing the cursor.
-TextBlock::fragment method now has return parameter ridx (relative QChar index) in addition to rcol (relative cursor position). For Basic Unicode TextFragments, ridx will be same as rcol, since each Basic Unicode char is one QChar. But for text that has Supplemental Unicode, ridx will be be different since each Supplemental Unicode char is two QChars.
-TextBlock::text method now includes any high surrogate QChars in the returned string. Previously, this method simply skipped over QChars that were high surrogates. When iterating over the TextFragment, only increments col when comming across a QChar that is not a high surrogate.
-Text::createLayout, Text::paste and Text::drop methods now uses the new Text::insert method to insert both high & low surrogates together. Previously, the cursor position would get messed up when each QChar was added seperately.
-TextBlock::remove(int start, int n) will now add both QChars of the surrogate pair to the return string.

Also was issue with adding text after a SMUFL symbol if the cursor mode was already in SMUFL. Fixed by making sure to append a new TextFragment regardless of state of cursor, and also making TextFragment::TextFragment(TextCursor* cursor, const QString& s) constructor set the type to CharFormatType::TEXT.