Bug with accent in filename when export to compressed XML

• Jul 15, 2013 - 12:30
Type
Functional
Severity
S4 - Minor
Status
closed
Project

When exporting to Compressed XML (.MXL), if the filename contains accents, le resulting file cannot be imported by MuseScore or any other software.
Checking the mxl file with 7-zip shows that the accent in filename are converted to invalid characters.
Software:
Musescore version 1.3
Windows 7 sp1 with all mandatory updates applied

Attachment Size
Bug MXL caractères.mxl 1.62 KB
Bug MXL caractères.mscz 1.68 KB

Comments

The mscz and the mxl file have files in them which use the same 'invalid' characters, I guess it uses UTF8 for Encoding?
But indeed MuseScore 1.3 can't Import the mxl file ("unexpected end of file at line 1 column 1"), but can open the mscz
It can, however, Import the extracted xml file, so it doesn't look to be a filename Problem.
A current nightly build does have the same problem

Quick investigation shows at least two issues:

1) The filename (as stored in the mxl file) is encoded in UTF-8 but the language encoding flag (EFS) is not set. This probably explains the invalid characters in zip readers. It is caused by a QZipWriter bug: the version used in MuseScore never sets this flag.

2) File META-INF/container.xml (a required component of the mxl file) states it is encoded in UTF-8 but actually encodes the e with accent grave in the filename as 0xE8 instead of 0xC3 0xA8, which is invalid UTF-8. This is a MuseScore bug. I expect this results in the MuseScore MusicXML importer being unable to extract the MusicXML file, as it cannot correctly interpret the filename.

Do we really need to store the actual name of the file in container.xml and in the zip file? Couldn't we use an ASCII name like score.xml or replace non ascii letters to ASCII equivalent (transliteration) or even"?".

1) I think QZipWriter and QZipReader can't cope with UTF-8 in filename on all platforms.

2) Opening container.xml in a text editor like PSPad reports an ANSI encoding... MuseScore does set the codec to UTF-8 but doesn't specify a BOM... https://github.com/musescore/MuseScore/blob/master/libmscore/xml.cpp#L2… I do think that MuseScore can read back the filename correctly but then it can't match it to the extracted filename because of 1)

Further testing shows 1) is not an issue, Finale does not set the language encoding flag either. Furthermore the MusicXML spec explicitly states that filenames in the mxl file must be encoded in UTF-8.

The mxl file as is (as exported by MuseScore 1.3 and the current trunk) cannot be imported in Finale, MuseScore or Sibelius. By fixing 2) only, the file imports OK in all three. See attached fixed file.

As other MusicXML producers export mxl files with UTF-8 encoded filenames, MuseScore should be (and currently is !) able to import these files correctly.

Attachment Size
Bug MXL caractères fixed.mxl 1.73 KB

1/ Unfortunately, there is no reason why the filename would be encoded in UTF-8 in the zip file since QZipWriter is using toLocal8bit https://github.com/musescore/MuseScore/blob/master/libmscore/qzip.cpp#L…. I believe this is bad :( and will cause inter OS compatibility issues... As Leon stated MusicXML spec requires UTF-8 encoding filenames http://www.musicxml.com/tutorial/compressed-mxl-files/compressed-file-f…

2/ If I create a UTF8 file with a text editor (SublimeText2) with a "é" and save it as UTF-8, it doesn't get encoded to 0xC3 0xA8 but 0xE8... MuseScore might indeed have a bug when writing the container file. It seems that setting the codec before setting the device on a QTextStream doesn't work well.

Despite the bug, I believe MuseScore is reading container.xml correctly but QZipReader fails to locate an entry because the entry are not in UTF-8

In addition, on Windows, it's a hell to debug because it's likely that at runtime MuseScore is using the QZip* classes from Qt and not the one from MuseScore...

To enhance the confusion ( :-) ) yesterday I rebuilt the current trunk (revision reported by "git rev-parse --short HEAD" as 4495b25) on Linux and cannot find incorrect file names anymore. A Cyrillic capital letter DE in the name correctly gets encoded as 0xD0 0x94, both in the local file header in the mxl file and in container.xml. This file imports OK in MuseScore.

This is WITHOUT pull request 422.

My previous experiments were on Mac using an older version of the trunk (several weeks old, still using Qt 4.8).

Attachment Size
CYRILLIC CAPITAL LETTER DE: 'Д'.mxl 1022 bytes