Open-source software and standards for creating music
Ethernet/IPIntroduction Overview Comparisons Assessments Open Issues
SupplementAddressing Primer Standards Bodies References
Quality of Service and Temporal Fidelity
This page discusses special quality-of-service issues for MIDI and other audio-related media and defines relevant terminology. Conventional network quality-of-service criteria are mentioned briefly: it is assumed that any transport carrying MIDI messages will provide adequate quality-of-service as measured by conventional criteria.
The MIDI 1.0 transport layer (as defined by the "hardware" specification on pages 1 and 2 of the Complete MIDI 1.0 Detailed Specification) implicitly provides certain real-time service guarantees. These guarantees are significant to musical performance and contribute to the current success of MIDI in various application areas. These guarantees characterize the basic temporal fidelity (rhythmic integrity) of MIDI musical performance data. For many (but not all) applications, these guarantees must be maintained (and improved as feasible) in order to ensure that MIDI technology remains useful to its core constituency (musicians).
Standard layered network models often downplay issues related to temporal behavior. Often, the quality of network service is primarily defined using measures such as reliability, capacity, and, sometimes, latency (defined above).
These measures are inadequate for MIDI, audio and other types of audio-related media. MIDI was designed for the purpose of conveying musical performance data. As with audio, the ear is extremely sensitive to small variations in the timing of MIDI messages used to trigger or modify audible events such as musical notes. Temporal fidelity (preservation of rhythmic integrity) is affected most strongly by jitter, but latency is also important.
A number of perceptual studies have shown that for streams of individual audio events, timing jitter on the close order of one millisecond can be audible, particularly in the context of rhythmically complex and syncopated ensemble music. [Iyer, Bilmes et al, Lunney, Michon, Schloss, Van Noorden]
[Moore] argues convincingly that time intervals on the close order of 1.5 milliseconds are both audibly significant and controllable by human performers in common musical situations. Consider a sequence of sounds, each consisting of a pair of clicks separated by a short delay (1,2,3,4… msec). Each successive sound has a distinctly different and predictable pitch. The ability to identify musical timbres is strongly linked with their attack transients. If a paired click, as discussed above, were used as the attack transient for sound with much longer duration, the delay between the two clicks would play an important role in determining the timbral identity of the sound.
This phenomenon is particularly significant when grace notes, flams and other musical decorations are played. For example, a pianist plays grace notes by extending one finger slightly before another, and moving the wrist as the hand descends so that the first finger strikes the keyboard slightly sooner. A skilled pianist can reliably control his or her hand geometry so that one finger is about 1 millimeter lower than the other. This corresponds to a time interval on the close order of 1.5 milliseconds under typical playing conditions. While the absolute time position of a particular gesture may vary 10-20 milliseconds (or more) from one performance to another, the relative interval between the grace note and the associated note is far smaller, and quite repeatable. As explained in the preceding paragraph, even small variations in such inter-note delays are quite audible under these circumstances.
In order to preserve the rhythmic integrity of grace notes and other musical decorations, timing accuracy of 1 millisecond or less is needed. Jitter above this threshold can audibly degrade the reproduction of a musical performance.
Within a continuous audio stream, much smaller amounts of jitter are perceptible. High-quality digital audio equipment and digital-to-analog converters provide jitter levels on the order of ten picoseconds or lower.
Jitter may be caused by a number of factors:
Such variations can be due to differences in processing for different kinds of events, varying resource availability within a computer system (e.g. from multitasking or virtual memory) or other factors.
It is important to note that rate-limit jitter is deterministic, whereas bus-contention and most other forms of jitter are not. Since rate-limit jitter is caused by characteristics of the source message stream, it is possible to reorder or otherwise modify the source stream to ensure that high-priority events (such as drum notes) are transmitted at predictable times. In many cases, it is also possible to inspect a given source stream, determine the necessary transmission rate, and request a data transport channel with the appropriate characteristics. Since bus contention jitter and other kinds of jitter are non-deterministic (and cannot easily be bounded a priori), it is impossible for the sender to compensate for these forms of jitter.
Unlike jitter, a fixed amount of latency is much less likely to cause problems, as long as two conditions are true:
Sound travels at approximately one foot per millisecond. The distance between the sound radiating elements of most acoustic instruments and the performer's ears is in the range of one to four feet (about 0.5 - 1.5 meters). This corresponds to a latency of 1 to 4 milliseconds between the moment the performer initiates a note and the time the first acoustic results are heard. In small ensembles such as string quartets, performers are generally located within five to seven feet (about 2 meters) of each other. This corresponds to a maximum inter-performer latency of about 7 milliseconds. Rock groups and other amplified ensembles generally place speakers so that similar inter-performer latencies are produced, even when there are larger physical distances between performers on stage or in a recording studio. Headphones, of course, afford very low-latency acoustic sound reproduction (< 1 millisecond). On the other hand, music as heard by concert audiences is subject to much greater latencies.
The threshold for tolerable latency depends on the specific application. Passive listening applications such as a song player can clearly tolerate significant latencies. Game applications (where musical events are tied to game applications) are generally more demanding. Interactive performance and music composition applications (where an end user is directly involved with triggering musical events and music is the primary focus) require even better latency. Professional users, of course, have the most stringent needs.
Good temporal fidelity requires fixed latency with bounded jitter. Perceptible latency varies according to the application, while perceptible jitter levels are largely independent of the application.
Temporal fidelity is a system-level property. It is characterized by specific, system-level bounds for latency and jitter. Each system component should perform within better-than-system-level tolerances in order to ensure that system-level bounds are met. Therefore, each system component should be allocated a specific, proportionate share of the system-level bounds in order to maintain temporal fidelity at a given level. A complete system comprises three distinct types of components:
Each of these component types should be allocated an appropriate share of the total system-level jitter and latency budget. The MIDI 1.0 hardware specification implicitly defines the performance bounds of the middle (transport) components. Source and sink entities (controllers, sequencers, sound generators) from different manufacturers have varying performance characteristics.
For music performance, recommended system-level bounds are 10 milliseconds total latency and +/- 1.0 milliseconds peak jitter. Preferred system-level bounds are 5 milliseconds total latency and +/- 0.5 milliseconds peak jitter. In order to maintain overall temporal fidelity, the jitter and latency contributions from the media transport components should be significantly less than these system-level bounds.
Bilmes, J. 1993. "Timing is of the Essence: Perceptual and Computational Techniques for Representing, Learning, and Reproducing Expressive Timing in Percussive Rhythm." Masters thesis, MIT Media Lab. http://www.icsi.berkeley.edu/~bilmes/mitthesis/index.html
Brandt, E. and Dannenberg, R. 1998. "Low-latency music software using off-the-shelf operating systems." Proceedings of the International Computer Music Conference. http://www.cs.cmu.edu/~rbd/papers/latency98/latency98.pdf
Freed, A., Chaudhary, A. and Davila, B., "Operating Systems Latency Measurement and Analysis for Sound Synthesis and Processing Applications", ICMC, Thessaloniki, Greece, 1997
Iyer, Bilmes et al. 1997. "A Novel Representation for Rhythmic Structure." Proceedings of the International Computer Music Conference.
Lunney, H. M. W. 1974. "Time as heard in speech and music." Nature (249):592.
Michon, J. A. 1964. "Studies on subjective duration 1. Differential sensitivity on the perception of repeated temporal intervals." Acta Psychologica (22): 441-450.
Moore, F.R. 1988. "The Dysfunctions of MIDI." Computer Music Journal 12(1):19-28.
Schloss, A. 1985. "On The Automatic Transcription of Percussive Music From Acoustic Signal to High-Level Analysis." Ph. D. thesis, Stanford University, CCRMA.
Van Noorden, L. P. A. S. 1975. "Temporal coherence in the perception of tone sequences." Unpublished doctoral thesis, Technische Hogeschool, Eindehoven, Holland.
Wessel, D. and M. Wright (2000), "Problems and Prospects for Intimate Musical Control of Computers". ACM SIGCHI, CHI '01 Workshop New Interfaces for Musical Expression (NIME'01)
Wright, J. and Brandt, E. (2000) "MidiWave Analysis of Windows 98 Second Edition MIDI Performance", Presented at Windows Audio Professionals Roundtable (Winter NAMM 2000) and at the 2000 Annual General Meeting of the MIDI Manufacturers Association.
Wright, J. and Brandt, E. (2001) "System-Level MIDI Performance Testing", Proceedings of the International Computer Music Conference, Havana.
|Copyright 2003 Jim Wright. All Rights Reserved. Legal Stuff|