SoundWIRE

A research project for evaluating quality of service (QoS) using
"Sound Waves on the Internet from Real-time Echoes"
 
Chris Chafe (P.I.), Gary Scavone, Scott Wilson
     Center for Computer Research in Music and Acoustics (CCRMA)
     Department of Music, Stanford University
     Stanford, CA 94305

Project Description

An experimental project is underway to develop a useful tool for validating QoS required by near-real-time interactive flows. Architectures under development for next generation networking (eg. QBone [I2, 1998]) will make commonplace reserved bandwidth and low-latency transactions. The method under investigation should provide appropriate means to assess reliability of full-duplex interactive service.
 
SoundWIRE is a utility which affords an intuitive way of evaluating transaction delay and delay constancy. Its final form is an enhanced "ping" that uses actual sound reflection. A musical tone, such as a guitar pluck, can be created by repeatedly reflecting a digital acoustic signal between two hosts. Using the network delay between these reflections to substitute for a guitar string creates a tone whose stability represents perfectly regular service and whose pitch represents transmission latency. The ear's ability to discern minute differences makes this an unforgiving test of network reliability.

Commonly available tools ping a connection printing momentary or averaged roundtrip delay statistics in a list. The new method drives the connection constantly and displays essential QoS measures not captured by ping. It should be a useful diagnostic for connections supporting interactive, media-rich applications such as high-quality teleconferencing, remote sensing and teleoperation. These applications are just now being enabled via the next generation networks, as U.S. testbed projects and cousins in other countries come online (Canarie, Dante, et al.).
 
The bandwidth requirements of the technique are inherently low (<1Mbit/s), satisfying the requirement for a non-invasive method which can be innocuously embedded without increasing load (for example within a teleconferencing link). It is intended to supplement rather than replace quantitative QoS measurements such as bi-directional metering, c.f., the Internet Engineering Task Force's (IETF) Realtime Traffic Flow Measurement (RTFM) Working Group [IETF 1998].
 
Our project's rationale fits the following needs noted in the Internet2 QoS Working Group [I2, 1999]:
 
     "The goal of the QBone is to provide an interdomain testbed for
       differentiated services (DiffServ), where the engineering, behavior, and
       policy consequences of new IP services can be explored.  As described
       in [I2, 1998], the most demanding advanced networked applications
       require absolute, per-flow service assurances and it is the primary goal of
       the QBone to explore this class of new services."
 
In terms of QoS measurement tools for DiffServ, we would be providing the particular kind of "ping" envisioned in the same draft document:
 
     "4.6 DS Ping
 
       A DS ping is suggested as a future development. A DS ping may be
       linked with a Best Effort ping and the replies may directly indicate
       differentiated service."

Experimental Methods and Procedures, Milestones

Software development and deployment consist of the following  phases:
 
  1.     Prototype architectures (completed Feb-00)
 
A prototype client / server application has been coded which demonstrates audio reflections streaming between sockets on two hosts and tapped to sound output on one of them. 

There are two parts to the prototype: a physical model of the musical instrument and an audio streaming, socket-based delay component. Any physical model which incorporates a "lumped circuit" topology is appropriate. Strings, winds and many percussion lend themselves to this type of simulation. The acoustic waveguide (string, bore, mallet block, etc.) is represented by a pure delayline combined with a filter which "lumps" together spectral modifications and any passive non-linear aspects of the waveguide.
 
A server initiates the tone and uses the client to reflect back the sound as if it were the terminating end of the string or bore. While the simulation method is well known at CCRMA, its division into separate modules within an audio streaming architecture constitutes the new approach.
 
A version using ICMP echo instead of the user-level reflector is up and running faster.
 
To study timing issues, a "ping-like" version measures momentary roundtrip time and uses that to control the pitch of a local pluck synthesis.
 
  2.     LAN demonstration -- client on nearby host
 
The prototypes have been tested within the Stanford network to demonstrate reflections over a well-known testbed.
 
  2.5     Current Effort (Mar-00)
1) We are bringing the streaming and regular pinging prototypes together under one GUI.
 
2) The client / server approach is to be modified so we can tap a signal off both ends and extend to other streaming applications (maybe combining the server and client under one executable - so you always had the server running, had some kind of visual indicator as to whether the server was being used, and you enabled pinging/streaming by clicking a button). This also starts to enable more complicated configurations than simple two point.
 
3) Further investigating and benchmarking the prototypes we have now using nice, the rt utility with the low latency kernel, and possibly the rtlinux kernel.
 
4) The project will participate in development and testing of a digital audio streaming API, perhaps an Alsa-related driver.
 
 
  3.     Internet demonstration -- client unleashed on the world
 
The applications will be run between various locations where Internet2-level connections are available. Considerations affecting latency will be experimented with and logged.
Various reservation schemes and levels of service will be compared.
 
  4.     Multi-channel version
 
The prototype has been designed and coded with expansion to multiple audio channels in mind (for example, looking ahead to playing chords on multiple guitar strings). Stress testing of channel synchrony and number of possible channels will follow. A host computer for an upcoming concert (May 24th, 2000) will be equipped with HDTV and multi-channel sound sources. Using connections verified with SoundWIRE, initial one-way tests of "teleconcerting" sound transmission can be originated.
 
  5.     UI or browser plug-in development
 
An undergraduate student will study the feasibility of a multi-platform plug-in for the
popular browsers. Once this is in place and the client has been ported, we will offer a version of the client to the public (with our server remaining as an online site for testing and statistics purposes).
 
  6.     Demo setup residing in the CCRMA studios
 
Using browser-based access, CCRMA studios will be able to evaluate connections
between the server and other hosts on the net. We would demonstrate multi-channel, bi-directional sound streaming between interacting applications employed in recording and concert production. For example, remote overdubbing or tracking of perfomers in different halls across the campus will be enabled by Stanford's 2.5 Gbit/s backbone. Or in a related extension envisioned as a network-based "effects loop," sound acquired from the stage will be routed for remote processing and returned to the hall for diffusion. We have considerable recent experience in interactive digital signal processing using individual hosts located on stage, but tapping the larger resources of our network of workstations residing at the center remains to be demonstrated.
 
  7.     Net Reverb -- stay TUNED!, TUNED, Tuned, tuned, (tuned),
this may be fun --
We plan to experiment with a waveguide reverberator with mutliple nodes / taps / sources across the net.

Anticipated Results and Significance

The idea was very straightforward to implement, though there have been the inevitable details
to surmount. These will be described in an upcoming paper. On the networking side, the AES White Paper [AES, 1998] provides a thorough description of the challenges facing packetized transmission of digital audio over networks and outlines the minimum QoS requirements. Relevant to the project are issues of bit rate, latency, synchronization, and format. In its uncompressed form, our basic channel will be 705.6kBit/s (16-bit linear sound samples clocked at the standard CD-audio rate of 44.1kHz). Compression schemes for audio abound but may be problematic in terms of added delay.
 
Delays from the 100's of msec. to as low as low 1 msec. have been experienced initially. We hope to show that at this level SoundWIRE provides a practical QoS indicator over IP connections. If this is the case, we will have also
demonstrated audio capabilities of some importance.
 
Delay times in excess of 50 msec. create subsonic fundamental frequencies. For instance, 100 msec. is a very low tone (with a 10 Hz pitch) which the listener hears as plucking very large cables (like the Golden Gate Bridge). Surprisingly, pluck-type instruments don't actually sound bad in this range since a musical timbre results that is comprised of the tone's audible overtones (dozens of harmonics lying within the range of hearing). And the same holds for other instruments that would be demonstrated. As the delays shorten, the fundamental will rise into more realistic pitch ranges.
 
Having a working tool will allow us to probe network conditions under which the delays become shorter.

Additonal  features which could be added to the core technology include a number of extensions based on CCRMA research.
 
Tactile audio feedback is the idea of returning the audio signal to a haptic controller
device in the hands of the user. Imagine plucking SoundWIRE and being able to feel its vibration (as well as the kinesthetics of the string being plucked).
 
Other timbres based on waveguides including percussion, woodwind instruments,
simulations of instrument bodies, etc. can be added to the synthesis choices for  the application. Singing voice is also a possibility   in this case, the glottis becomes the delay-based element and the particular vocalization could be mapped to other QoS parameters.
 
Reverberation based on interconnections between multiple machines is an intriguing possibility. Imagine a mesh of connections which is an assortment of delay paths between a number of sites. At each machine a person would be able to speak or perform adding their sound into a "room" consisting of multi-path sound reverberations made of these reflecting connections.
 
Change the absorption characteristics of such a reverberator and the circuit becomes a 2-D drum surface which is topologically like the reverberation case, but now a connection mesh which can also be impulsively excited to sound like a drum, gong or cymbal stroke. Muliple drummers, at remote locations, could play together on a "distributed instrument."
 
The technology will be presented in CCRMA's annual Industrial Affiliates meetings to member companies actively tracking computer-related audio developments. Project results / software will be kept current and publicly available via CCRMA's web site.

Acknowledgements

National Science Foundation Grant No. ANI-9977246
 

References