MusicXML was placed on a strong technical foundation by building on the two leading academic formats for symbolic music: Hewlett’s MuseData format (1997) and Huron’s Humdrum format (1997). The first versions of MusicXML were basically an XML version of MuseData, with key concepts from Humdrum added in. Soon MusicXML added features for popular music and other areas that went beyond the original MuseData and Humdrum designs.
We knew from the outset that MusicXML was compatible with these academic formats. MuseData’s structure was also designed to be compatible with MIDI output, to facilitate MuseData to MIDI translations. A key part of early MusicXML development in the first two years was to ensure that MusicXML’s structure was compatible with the major commercial symbolic music applications. We found that MusicXML’s basic structure was very well matched to the Finale and Sibelius data structures. This made it easier to build our own software for these applications, and gave us confidence that other music applications would find a similar good fit. This indeed turned out to be true in practice.
A major reason why NIFF was only adopted by music scanning applications is that its graphical orientation is compatible with those applications. But it is quite foreign to other music applications, making NIFF very difficult for most music software developers to work with. Sibelius’s NIFF reader, for instance, had problems in reading simple pitch information correctly (Good, 2001).
Another reason why MusicXML is compatible with leading applications is that it primarily models a document – a musical score – rather than an abstraction of a document. The most well-known abstraction in the music area is SMDL’s division of music into logical, visual, gestural (performance), and analytical domains (Sloan, 1997). As we discuss below, this is a useful framework for thinking about different aspects of music representation. It extends the popular idea of separation of presentation and content in a sensible way for music applications.
But strict separation of presentation and content can pose more problems than it solves. This is especially true in music notation, where presentation conveys semantic meaning to the user. The horizontal spacing of musical notes and marking conveys information about how music is performed over time. It is not “just” a matter of presentation. Most major music applications recognize these interdependencies and integrate the different domains together in a single format, separating the different concerns into different data structures with varying degrees of modularity. MusicXML follows this approach: the content domains (logical and analytical) are generally represented in elements, while the presentation domains (visual and gestural) are generally represented in attributes.
Note that we are advocating making the structure compatible with leading applications. This is not the same as adopting application-specific terminology or design decisions. It is tempting to do this, especially in situations like ours where the development of the language was tied closely to the development of a translator for a single major application. But this temptation should be resisted to avoid biasing the format to one application. Such bias would make it more difficult for others to work with the format, both for technical and social reasons.