Difference between revisions of "Gigaflow"

From CCRMA Wiki
Jump to: navigation, search
(WP2 - Prediction)
(WP3 - Control)
Line 35: Line 35:
  
 
===WP3 - Control===
 
===WP3 - Control===
 +
 +
Current systems for conducting network performances allow limited amount of control over the acoustic features of geographically displaced sites. Performing over the network introduces artifacts that are not occurring in situations where there is only one physical space involved. A session management protocol will be explored so that one master operator can manipulate and fine-tune distant A/V setups. The availability of such a system is crucial to balance audio systems interconnected over the network. We envisage the development of a “cockpit” able to control and monitor distant sites. Without this, for example, a three-way network session requires three remote audio engineers to adjust levels and there is a ripple effect, both technical and psychological, where if one site changes the balance, the two other sites lose their balance and so on. In addition, potential closed-loop audio effects generated by the relationship between reproduction and capture devices need to be avoided, since in many cases open monitoring techniques are used to achieve a maximum level of immersion. Echo cancellations techniques are not appropriate with complex audio pathways.
  
 
==Partners==
 
==Partners==

Revision as of 02:01, 4 August 2008

Gigaflow TeleImmersion Project

Introduction

Networks such as Internet2 are scaling up to astonishing capacities. Where demonstrated real-time, interactive, uncompressed flows have been in the "centi-flow" range for audio and recently support 4k video, the Gigaflow project envisions a near future with several orders of magnitude greater number of interactive channels combining these and other interaction modalities. Collaborative applications explored in our Expedition in Computing will transcend the present state-of-the-art from "almost like being there" to "better than being there." The team is prepared to couple upgrades in raw network power and media fidelity with research in perception, synthesis and prediction.

Aims

Subjects

Gigaflow proposes to examine and implement three work programs (WP) which will be interconnected to form the final Gigaflow high-quality high-definition framework.

WP1 - Emergence

The increasing number of hosts on the network where high-definition acoustical streams are received and scattered to other points constitute nodes on an irregularly-spaced, non-stationary mesh. The expected proliferation of HDIS nodes leads to the advent of an acoustical network with interesting emergent properties as the number of hosts scales up dramatically. A "jam cell" in which remote musicians hear each other exists as part of current practice. An example application is the grouping of seven peers in a many-to-many directly interconnected lattice. In the near future, branching between cells will become common: any nodes can scatter a cell's sound out to a neighboring cell, and all parties become interconnected at one level of remove. The physical and perceptual properties of a multitude of cells propagating sound at various levels of remove is a subject of this expedition in computing. [synchronization]

Several strategies have been implemented to address the well-known problem of delay (latency) in network performance. These include the use of high-speed networks, fast compression algorithms, and artificially increasing the latency by ”one-phase delay” (Ninjam, among others).

Differing amounts of audio delay are acceptable depending on the type of music and the number of performers. Experiences with free improvisation tells us that delays on the order of 100 ~ 200 ms are still acceptable for a good performance, and musicians working in certain genres don't feel it as a hugely inconvenient. On the other hand, delays on the order of 25ms already cause problems for a professional string quartet ensemble playing in classical style.

Visual conducting to synchronize musicians in real spaces doesn't serve the same purpose over the network; audio travels much slower than light in real halls (which is why visual conducting works). In the network scenario, however, audio and video speed are in the best case the same (though present technology actually has audio winning the race). This means that one has the rethink conducting strategies.

We envision two technical fronts that will work to create a better network performance experience: investigating supervisory control and prediction. A supervising conductor will be able to maintain synchronization across a multi-located space. Coupling pattern recognition / prediction and supervisory control techniques, this conductor (which can be the musicians themselves, the machine, or both) will be able to fully explore the musical potential of a given network configuration. In particular, delay configurations will determine the performance outputs which can be influenced by a conductor, dictating for example maximum tempo (understood as the speed of musical events), pattern variability and sound types.


WP2 - Prediction

In the case of music with a large but finite set of patterns, prediction techniques will be able to ”pull” audio before the sound actually reaches the destination. The predictor outputs pattern sequences based on probability of occurrence using its “dictionary” of learned patterns. These techniques are not only applied to the patterns of sound events, but also to the sound type itself relying on a sound synthesis engine generating the predicted events when needed. For example, a constrained musical world consisting of patterned events played on a simple FM synthesis algorithm can be thought of as a musical palette which regenerates sound based on prediction elements. Or for other specific instrumental scenarios, physical models coupled with the actual instrumental sound may serve this purpose.

WP3 - Control

Current systems for conducting network performances allow limited amount of control over the acoustic features of geographically displaced sites. Performing over the network introduces artifacts that are not occurring in situations where there is only one physical space involved. A session management protocol will be explored so that one master operator can manipulate and fine-tune distant A/V setups. The availability of such a system is crucial to balance audio systems interconnected over the network. We envisage the development of a “cockpit” able to control and monitor distant sites. Without this, for example, a three-way network session requires three remote audio engineers to adjust levels and there is a ripple effect, both technical and psychological, where if one site changes the balance, the two other sites lose their balance and so on. In addition, potential closed-loop audio effects generated by the relationship between reproduction and capture devices need to be avoided, since in many cases open monitoring techniques are used to achieve a maximum level of immersion. Echo cancellations techniques are not appropriate with complex audio pathways.

Partners

Academic

Industry