Please include Lyric MetaEvent, preferably in UTF-8

• Jul 20, 2010 - 23:09

This will really make MuseScore stand out!

Right now, MuseScore does not include lyrics in the midi files it produces. Sibelius does, but only in ANSI. Even though you can use Chinese, for example, as lyrics in the score, in the midi files produced by Sibelius the lyrics show up as a bunch of '?'. (At least that's the case up to Sibelius 5.)

For people like me, it would be a great reason to choose MuseScore over others if it would create midi files with UTF-8 lyrics. And, I think, it should be easy to implement. Since the Lyrics MetaEvent has a length field, using UTF-8 in the text area should not break anything as players not understanding it can just skip it over. Use of UNICODE in Lyric MetaEvents is actually officially approved in MIDI's "Recommended Practice RP-026". (See http://www.midi.org/techspecs/rp26.php) I think a real 'Internationalization' of MuseScore should include this feature.

Ying-Da


Comments

Here is the same suggestion as yours.
http://musescore.org/en/node/3370

But could you please give an example that which software can save midi file and which midi player can read text from midi file follow this "Recommended Practice RP-026" ? I think I use a lot of them but I never notice this.

In reply to by yingdat

[If this shows up in the forum, the spam filter must have relented.]

Right now, no notation software I know produces midi files with Unicode lyrics - that's why I said it would really make MuseScore stand out. The beauty of Unicode is that, in one stroke, you make the implemented feature instantly available in all the major languages and there is no longer any need to ask what language the user wants. In cases where the specs are somewhat ambiguous or not specific enough (as is the case here in some aspects, see the last paragraph below), the early implementation becomes the "Industry Standard" :-). So being the first to do something is not necessarily a bad position to be.

Many midi players (e.g., vanBasco's, KaraFun, Karaoke 5, SynthFont) display lyrics along with the music if the midi files contain lyrics. Unfortunately most (all?) of them do not handle UTF-8 at this point. This is one of those chicken-and-egg situations: They don't handle UTF-8 because nobody produces midi files with UTF-8; and music score programs don't produce such midi files because no players handle them. Once MuseScore breaks that bad cycle though, given the huge popularity of karaoke in countries like Japan, China and Taiwan, Karaoke players with UTF-8 will show up soon. (This is not even a problem for English lyrics since UTF-8 encoding of English comes out exactly the same as plain old ASCII.)

The situation is different with UltraStar Delux (available from sourceforge), which is a Karaoke game similar to Playstation's SingStar. USD does handle UTF-8 lyrics starting with version 1.1. USD plays mp3 rather than midi, which is no problem since you can ask MuseScore to produce wave, flac or ogg audio files and use programs like Format Factory to convert them into mp3. Or you can make mp3 files from midi files using programs like SynthFont, which gives you many additional controls such as changing instrument, soundfont, individual preset, tempo, pitch, individual channel volume, VST effect, etc.

Creating text files of synchronized lyrics for USD is a rather tedious and frustrating process now. For mp3 files derived from midi files with lyrics, it should be quite straightfoward to write a program converting the lyrics meta events into synchronized lyrics. (I am in fact considering writing one myself. I already have a program decoding events in midi files into readable text files. Adopting it to do this task is the kind of simple prgramming I can still do.)

Or better yet, in my opinion, maybe some developer of midi to mp3 conversion software can be persuaded to include ID3 tags with the synchronized lyrics right in it. ID3 version 2.4 already has a frame specifically for synchronized lyrics (in a slightly different format from that of UltraStar Delux), and one probably can use the 'Comment' field in version 2.3 for that purpose.

The use of UNICODE in Lyric Meta Event is mentioned in Section 5 of the specs of RP-026 but nothing is said about encoding. I would suggest issuing a Lyric Meta Event at time 0 with the tag {@UTF-8} to indicate Unicode in UTF-8 encoding. I guess one could include the BOM in there somewhere too, but that seems rather redundant.

Ying-Da

In reply to by yingdat

I finally wrote a quick and dirty C program to put UTF-8 encoded lyrics into midi files. So now I can have midi files with Chinese lyrics as well as any other major languages if I knew how to enter them into a text file (which of course I don't).

I also persuaded the author of MidiQuickFix to support UTF-8 encoded lyrics. MidiQuickFix is available on SourceForge. You have to contact the author for the new version not yet published.

Ying-Da

P.S. And I misspoke: ID3 ver 2.3 also has synchronized lyrics frame. Unfortunately the only encoding format it refers to is UTF-16.

In reply to by yingdat

Even though it does not say what unicode encoding is used,"...the BOM serves to indicate both that it is a Unicode file, and which of the formats it is in..." http://www.unicode.org/faq/utf_bom.html
http://en.wikipedia.org/wiki/Byte_order_mark

Although they say the BOM can be used to distinguish between encodings, The UTF-16 and UTF-32 BOM in little endian could be the same, although in practice this should be very rare.

RP-026 says : "...if a byte order mark which specifies UNICODE such as 'FF FE' or 'FE FF' exists...". Those are the BOM in UTF-16.

So it should be safe to use either UTF-16 (mentioned in passing in the spec) or UTF-8 (not mentioned but adheres to the spec).

Hello Ying-Da,

I know this thread is very old but I hope you're still reading here...

I'm currently working on my own open source application that already supports Karaoke for MIDI files. (Only reading so far, writing will follow.)
Of cause I want to support Unicode as well. For reading AND writing into/from MIDI files.

I'm thinking about a text event beginning with either @C (for 'charset') or @E (for 'encoding'). With @C a charset definition for UTF-8 would look like this:
@CUTF-8

That would extend the widely used .kar syntax for MIDI karaoke messages created by Tune 1000.
(@K: type, @V: version, @L: language, @T: title/author/copyright, @I: further information)

Ying-Da, you have proposed {@UTF-8} instead.
However I've never seen text messages with curly braces before. Is this some kind of standard or convention?
If such a convention already exists I would be happy to use it instead of making up my own charset definition.
Where can I find further information about this kind of notation with curly braces?

Best regards
Jan

Do you still have an unanswered question? Please log in first to post your question.