Gigaflow

From CCRMA Wiki
Revision as of 02:05, 4 August 2008 by Arenaud (Talk | contribs) (Partners)

Jump to: navigation, search

Gigaflow TeleImmersion Project

Introduction

Networks such as Internet2 are scaling up to astonishing capacities. Where demonstrated real-time, interactive, uncompressed flows have been in the "centi-flow" range for audio and recently support 4k video, the Gigaflow project envisions a near future with several orders of magnitude greater number of interactive channels combining these and other interaction modalities. Collaborative applications explored in our Expedition in Computing will transcend the present state-of-the-art from "almost like being there" to "better than being there." The team is prepared to couple upgrades in raw network power and media fidelity with research in perception, synthesis and prediction.

Aims

Subjects

Gigaflow proposes to examine and implement three work programs (WP) which will be interconnected to form the final Gigaflow high-quality high-definition framework.

WP1 - Emergence

The increasing number of hosts on the network where high-definition acoustical streams are received and scattered to other points constitute nodes on an irregularly-spaced, non-stationary mesh. The expected proliferation of HDIS nodes leads to the advent of an acoustical network with interesting emergent properties as the number of hosts scales up dramatically. A "jam cell" in which remote musicians hear each other exists as part of current practice. An example application is the grouping of seven peers in a many-to-many directly interconnected lattice. In the near future, branching between cells will become common: any nodes can scatter a cell's sound out to a neighboring cell, and all parties become interconnected at one level of remove. The physical and perceptual properties of a multitude of cells propagating sound at various levels of remove is a subject of this expedition in computing. [synchronization]

Several strategies have been implemented to address the well-known problem of delay (latency) in network performance. These include the use of high-speed networks, fast compression algorithms, and artificially increasing the latency by ”one-phase delay” (Ninjam, among others).

Differing amounts of audio delay are acceptable depending on the type of music and the number of performers. Experiences with free improvisation tells us that delays on the order of 100 ~ 200 ms are still acceptable for a good performance, and musicians working in certain genres don't feel it as a hugely inconvenient. On the other hand, delays on the order of 25ms already cause problems for a professional string quartet ensemble playing in classical style.

Visual conducting to synchronize musicians in real spaces doesn't serve the same purpose over the network; audio travels much slower than light in real halls (which is why visual conducting works). In the network scenario, however, audio and video speed are in the best case the same (though present technology actually has audio winning the race). This means that one has the rethink conducting strategies.

We envision two technical fronts that will work to create a better network performance experience: investigating supervisory control and prediction. A supervising conductor will be able to maintain synchronization across a multi-located space. Coupling pattern recognition / prediction and supervisory control techniques, this conductor (which can be the musicians themselves, the machine, or both) will be able to fully explore the musical potential of a given network configuration. In particular, delay configurations will determine the performance outputs which can be influenced by a conductor, dictating for example maximum tempo (understood as the speed of musical events), pattern variability and sound types.

The increasing number of nodes using video on the top of audio calls for a better solution than the current use of existing IP video systems for meaningful interactive network interactions. It is feasible to connect two sites via a high quality video link, however, when the number of nodes exceeds two, similar to the “jam cell” audio emergence issue stated above, video is confronted by the similar issues especially with regards to different video latencies resulting from unevenly spaced nodes.

WP2 - Prediction

In the case of music with a large but finite set of patterns, prediction techniques will be able to ”pull” audio before the sound actually reaches the destination. The predictor outputs pattern sequences based on probability of occurrence using its “dictionary” of learned patterns. These techniques are not only applied to the patterns of sound events, but also to the sound type itself relying on a sound synthesis engine generating the predicted events when needed. For example, a constrained musical world consisting of patterned events played on a simple FM synthesis algorithm can be thought of as a musical palette which regenerates sound based on prediction elements. Or for other specific instrumental scenarios, physical models coupled with the actual instrumental sound may serve this purpose.

WP3 - Control

Current systems for conducting network performances allow limited amount of control over the acoustic features of geographically displaced sites. Performing over the network introduces artifacts that are not occurring in situations where there is only one physical space involved. A session management protocol will be explored so that one master operator can manipulate and fine-tune distant A/V setups. The availability of such a system is crucial to balance audio systems interconnected over the network. We envisage the development of a “cockpit” able to control and monitor distant sites. Without this, for example, a three-way network session requires three remote audio engineers to adjust levels and there is a ripple effect, both technical and psychological, where if one site changes the balance, the two other sites lose their balance and so on. In addition, potential closed-loop audio effects generated by the relationship between reproduction and capture devices need to be avoided, since in many cases open monitoring techniques are used to achieve a maximum level of immersion. Echo cancellations techniques are not appropriate with complex audio pathways.

When applied to video signals, most music network performance research has been focusing efforts on reducing audio latency to a minimum. However, even on high-speed research networks, the use of high definition multi-site video has been limited. This is mostly due to bandwidth limitations and the resulting usage of compression. Even though current video systems are useful at providing a "window" to other remote sites during a network performance, remote video systems are not synchronous with the high definition audio streams. Several strategies such as synchronous message based visualizations, distributed cueing systems, very low definition video, all of which are synchronous to the high-definition audio streams have been used as a compromise. Even though the latter provide a basic level of synchronization and can be used in parallel to video as an additional network-centric interaction medium, they are not offering a fully imersive interactive experience.

Moreover, current video systems provide a limited amount of control when applied to networked environments. A protocol for remotely controlling the resolution, angle and position of video capture sources is important for the development of high-definition network performance systems. Technologies such as motion capture and tracking should also be implemented in this context as they have the potential of becoming resourceful for creating meaningful interdependencies between geographically dispersed performers.

Partners

The Center for Computer Research in Music and Acoustics (CCRMA) is leading the project along with several academic partners in the industry.

Academic

http://www.media.mit.edu/

Industry