LEONARDO ON-LINE: ARTICLES
Life Music: The Sonification of Proteins
Mary Anne Clark
Artist John Dunn and biologist Mary Anne Clark have collaborated on the sonification of protein data to produce the audio CD, "Life Music." The authors describe the process by which this collaboration merges scientific knowledge and artistic expression to produce soundscapes from these basic building blocks of life, that may be encountered as esthetic experiences, as scientific inquiry, or both. The rationale for both artistic use of the science and scientific use of the art is described from the separate viewpoints of artist and scientist.
Music and Proteins
I (Clark) love to walk into the music building, which on my campus is next door to the science building. Through the doors of the practice rooms, I can hear fragments of 1000 years of written music, played or sung by the current generation of music students, some with finesse, some with hesitation, some with wild improvisation. I think that if somehow I could walk into a living cell, I would hear something similar – the ribosomes ticking away at the synthesis of proteins, playing out their amino acid sequences, note by note, according to a genetic score that is reproduced sometimes with utter fidelity, sometimes with a few unscheduled substitutions, and sometimes with stunningly inventive flourishes. Every generation of cells in every living organism plays the genetic score of its species. However, while the history of music as we know it goes back some 1000 years, the history of genetic music is at least 3.8 billion years in the making.
Over a decade ago, I went to a faculty seminar to hear a colleague talk about composition. As he discussed how he went about selecting, modifying and organizing musical themes, I was struck by the parallels between musical structure and the structure of proteins and the genes that encode them. Proteins also seemed to be composed of phrases organized into themes. For years I was haunted by the image, and tried occasionally to interest musicians in making the transformation for me – converting a protein sequence into a musical sequence.
I was convinced that this would be worth doing – that the amino acid sequences would have the right balance of complexity and patterning to generate musical combinations that are both aesthetically interesting and biologically informative. There are twenty amino acids in proteins (listed in Table 1), enough for about three octaves of a diatonic scale. They are not arranged at random, just as notes are not arranged at random in a piece of music. Both proteins and music are meaningful. The meaning of a protein is its function in the organism, and certain sequences have emerged as the hallmarks of specific functions.
For example, the protein hemoglobin serves the function of oxygen binding. Some features of the hemoglobin tune can be seen by examining the proteins of different species, which play this tune as variations on a theme. Figure 1 represents the sequence of beta globin, which forms half of the protein hemoglobin. For example, the tuatara, an exotic 3-eyed lizard, would seem to have little in common with humans, but the similarities between the human and tuatara beta globin sequence indicate that both proteins are variations on a theme that was in existence before the divergence of the mammalian and reptilian lineages 200 million years ago. Other variations of beta globin can be found in vertebrate species from all over the world, e.g. Australian ghost bats, Brazilian tapirs, Kenyan clawed frogs, Antarctic dragon fish, and Emperor penguins. Although the beta-globin sequences are not identical in these species, they are similar enough that, if converted to music, they would be recognizable as variations on a common theme.
While it seemed obvious that proteins had an inherently musical structure, I did not hear a musical translation of a protein until 1996. In the process of preparing for an honors course on structural similarities between proteins and music, I did an Internet search looking for others who might also be interested in these parallels. There were only a few, but on John Dunn’s algorithmic music site, I found both music based on DNA and protein sequences and the software that would make the musical translation. I purchased one of the software programs to use with the class and discovered that proteins were even more musical than I had anticipated.
Nature As a Template for Art Music
An artist working in the medium of sound is liberated from the cultural imperatives imposed by traditional music, but at a high cost. Music in all cultures is rich in tradition and convention. Not only do listeners expect to hear the musical references they have become familiar with, cultural and musical tradition gives music its deep structure. This deep structure is not heard on the conscious level by most listeners, but is an essential component of any musical work: the component that keeps our interest fresh on repeated hearings. Popular music depends on extra-musical cultural associations for this to a large degree, and so in a rapidly evolving culture must be remade constantly. Classical concert and liturgical music depend far more on multiple layers of abstraction within the music itself, with cultural traditions of harmony and melody that evolve slowly. There are extra-music associations to be sure, but the primary deep structure lies within the music itself; thus, classical music stays fresh in our ears even over centuries.
Midway into the 20th century, when electronics in general and the tape recorder in particular opened vast landscapes of tonal colors and compositional layering to musical explorers, it quickly became apparent that no one out side of the electronic music community was listening. Most people considered "electronic music" to be an oxymoron. The problem was not that the electronically generated tones were uninteresting. The problem was there existed no deep structure to the music, either internally or culturally.
As an early experimenter with electronic music, starting in the 60’s with multiple tape decks and razor blade – musique concrète, it was called then – I (Dunn) vividly remember my first hearing of Carlos’ Switched on Bach , the first electronic music to receive popular acclaim. I was driving and nearly ran the car off the road. This was astounding: pure synthesized music that made no attempt whatsoever to mimic conventional instrumentation, that stood on its own as music. Up to then, electronic music, even my own – especially my own – was of interest only because it was electronic and experimental. It had little to do with music as an esthetic experience and it rarely got a second hearing.
The fly in the ointment, of course, was that the structure that gave Switched on Bach its meaning was a borrowed one, from Bach, and our vast tradition of Western harmony with its abstract, slowly changing cultural associations. In the end it was imitative music after all, barely hinting at the new musical landscapes that had opened up to electronic composers. Morton Subotnick , arguably the best of the early composers of abstract synthesized electronic music, with several landmark albums to his credit, remarked when asked what kind of music he listened to, that he preferred Mozart, Bach, and the other traditional Western composers. He pointed out that electronic music has no history, no tradition, and thus for the present, little that can hold a listener’s interest.
Early on I had determined that my path to composing electronic music would eschew traditional composition, and treat this new medium as a separate art form: sound as an artist’s medium, rather than music as a traditionally trained musician would approach it. The reason for this, to me, was obvious. The great investment of traditional music training has such weight that one cannot help being stuck in that paradigm to some extent. Others have broken out of it – Subotnick comes to mind immediately – but I wasn’t that confident of my own ability. So I went to art school to study sound as art, rather than traditional music, and it was there that I discovered computers.
Digital computers have given electronic musicians new tools for developing deep structure. The computer’s great strength is in its use as a compositional tool for algorithmic music – music that is developed with computer processed rules which can combine together in tonal and structural relationships that would be difficult if not impossible to calculate by traditional means. Joseph Schillinger, who ironically died in 1943, the year the first electronic computer was "born," developed much of the groundwork for algorithmic music in his series of lectures that has been posthumously published as The Schillinger Theory of Musical Composition . His theory that all music, perhaps all art, can be broken down to small whole number ratios is difficult to align with traditional music composition techniques (although that is exactly what he attempts to do), however it is a perfect fit for computer algorithmic composition.
While algorithmic processes have given electronic art music a means of achieving deep structure, it is largely an alien structure to our 20th century ears. And since this music is still very much in the pioneer stage with the frontiers of its paradigms still shifting and ephemeral, the listening audience for this kind of music remains negligible.
Thus, when botanist Dr. K.W. Bridges from the University of Hawaii asked me in 1989 to look into sonification of some of his data on tide tables, it occurred to me that, just as an artist’s approach rather than a musician’s helped loosen the bounds of tradition, perhaps substituting the structure in scientific data for that of cultural tradition would help lend form to electronic music that contemporary ears could appreciate.
While the tide table data failed to resonate with any internal map I could discern, and the data were seemingly too random to give the resulting music a sense of structure, deep or otherwise; it did lead to discussions about what kind of scientific data might do this. Eventually the discussions with Bridges led to DNA data and its associated protein sequences. It seemed to me that a relatively simple alphabet of four tokens that form just twenty letters that in turn combine to form the basis of all Earth life had to be rich with structure, and very likely would resonate with the inner maps of us humans who are built upon this code.
This turned out to be the case. The DNA/protein sequences have proven to posses deep and highly resonant structure, that sounds both alien and familiar, like music from another culture: pleasantly unusual but quite listenable. Our first public presentation of this music was in January, 1981, at the University of Hawaii in a concert entitled, Inflections: Musical Interpretations of DNA Data, which included music composed by myself and by Dr. Bridges, and related visuals performed by artist Sonia Sheridan.
At the time I thought the DNA/protein music would be a passing thing for me, a stepping stone on the exploratory search for compositional structure and meaning to parallel the remarkable electronic and digital tools technology has given us. But the well has not run dry. How could it? Nature’s music of life is on a far vaster scale than any human (merely one of Her sonnets) could possibly surpass. But She gives us a raw score so rich and harmonic it may well become the fountainhead for future sonic artists, just as She has been for visual artists throughout human history.
As a Research Fellow in the Arts at the University of Michigan for the past two years, I have collaborated with Jamy Sheridan, a visual algorithmic artist who has worked closely with me for several years on the algorithmic art and music software I have developed, and with Dr. M.A. Clark, the co-author of this article. The collaboration with Clark began some two years ago, when she emailed me some technical questions regarding the software she purchased. Further email correspondence revealed we were on similar trajectories regarding the sonification of protein data, but with two separate sets of keys: hers based on science and mine on art.
Sonification of DNA and Proteins
DNA (deoxyribonucleic acid)is a long multi-unit molecule containing Nature’s digital code for life on Earth. There are just four coding elements: T, C, A and G. The letters stand for the four different subunits of DNA (thymine, cytosine, adenine and guanine) that form the "steps" on the helical ladder that is the data base for all organisms. These four coding elements are combined into groups of three, which are called codons. There are 64 possible codon combinations, of which 61 are used to encode the 20 amino acids, plus three stop codons that indicate the end of a protein sequence, as a period indicates the end of a sentence.
The twenty amino acids of which proteins are composed [Table 1] differ from one another in size, solubility, and electrical charge. Generally, water-insoluble amino acids like leucine, isoleucine and valine cluster together in the interior of a protein, while more soluble amino acids are exposed on the surface. Positively charged amino acids like lysine and arginine and negatively charged amino acids like glutamic and aspartic acid may also attract each other. These interactions encourage the protein to fold, like origami, into its functional form, and the shape it assumes will depend on the position of each amino acid in the sequence.
Just as a musical theme is defined by the intervals from note to note, not by the absolute pitches of the notes, proteins are defined more by their overall patterns than by their absolute sequences. In order to form beta globin, the amino acids must line up in a way that allows the sequence to fold into a molecule capable both of binding and of releasing oxygen with the appropriate physiological parameters.
The amino acid interactions that stabilize a particular folding pattern must be preserved, even if the specific amino acid sequence is not, in order to preserve the function of a protein. The phrase (in amino acid letter names) FSDGL in human beta globin and the phrase FGEAV in tuatara are different, but the amino acids at the last four positions of each cluster have similar charge and solubility characteristics. Such substitutions are said to be conservative, and act a little like a musical key change, because they maintain the shape of the line even though the absolute sequence is changed.
How Proteins are Encoded
Protein sequences and the organisms that contain them have the look of being designed or composed. The design of an organism and its molecular components emerges from the information stored in the DNA of its genes. The relationship between DNA coding sequences and protein structure is something like the relationship between Morse coding and plain text. Figure 2 demonstrates Morse code for the message "beta-globin." Some features of the two coding systems are the following :
Morse Code uses combinations of two elements, the dot and the dash, to specify letters of the alphabet and punctuation marks. In genetic code, combinations of the four subunits A, T, C, and G are used to specify the 20 amino acids of the protein alphabet.
Morse code uses coding combinations of various lengths, from a single dot (a short pulse) or dash (a longer pulse) to four dots/dashes for the 26 letters of the English alphabet. Genetic code always uses combinations of the same size – three units. The DNA codons, e.g. AAA, CGA, CAT, specify the 20 amino acids, the alphabet of protein structure. Transmitted Morse code uses a brief period of silence to mark the boundaries between codons (e.g. to distinguish the letter combination "et" from the letter "a" in the message "beta globin"). Genetic code is read continuously, parsing the DNA data string into triplets, and depends on the translating ribosomes to get the reading frame right.
Morse code begins with the first character of the message and uses a stop codon (.-.-.-) to specify the end of the message. Genetic code also begins with the first character of the message and ends with one of three stop codons: TAG, TAA, or TGA. In both codes, the codons are laid out in the same sequence as the letters of the message.
In Morse code, the relationship between codons and the letters of the message is fully unambiguous: either can be predicted from the other. V is only …- and …- is only V. However, genetic code is unambiguous only when reading from the DNA to the protein. The reason that 61 DNA codons encode only 20 amino acids is that genetic coding is redundant. Most amino acids are represented by two or more codons (see Table 1 for a codon listing); only two amino acids are specified by a unique codon. Coding redundancy for several amino acids of a single protein can be seen in Figure 3, which represents the DNA coding sequence and the corresponding amino acid sequence for human beta globin. For example the amino acid lysine (K) is represented by both of its two DNA codons, sometimes by AAA and sometimes by AAG, and the amino acid glycine (G) is represented by three of its four possible codons – GGC, GGG and GGT.
These examples show that the sequence of a protein is not a fixed structure, but a tentative one, like a melody in the mind of a composer. The theme played by a protein in one of its guises may turn up again as a variation or counter-theme in another part of the orchestra. In some cases, e.g. sickle-cell hemoglobin, a single amino acid substitution can seriously reduce the functionality of the protein. But sometimes a refolded tertiary structure develops new talents. The sickle-cell mutation has the side effect of increasing resistance to malaria. The normal beta globin is itself a variant of an earlier protein that also gave rise to other globins. Other protein variants have acquired completely new functions, i.e. the derivation of the milk protein lactalbumin from the protective enzyme lysozyme, and the derivation of several eye lens crystallins from respiratory enzymes [4, 5].
The necessity for a working protein always to have some meaning, some function, has made proteins change slowly enough when they do change, that they have left the traces of their previous history behind in the record of their amino acid sequences. Changes in protein sequences are generated by their "composers," i.e. the DNA sequences that encode them. DNA produces new variations both by making a change in the identity of a codon or by the wholesale recombination of themes taken from different DNAs. With the development of computer programs that can instruct digital musical instruments to play genetic scores, it has now become possible to hear these protein songs.
Collaboration of Art and Science
When we began the protein music project, we wanted to convey both something about primary amino acid sequence and something about the folding patterns of proteins. Our goal was to create an audio CD album that would stand on its own as art music, and at the same time offer empirical proof of the esthetic patterning of nature’s deep structure. One way to approach this was to take advantage of secondary structure of proteins: simple folding patterns that are combined to produce the overall tertiary structure of a protein. There are three secondary patterns: alpha-helix, beta-strand, and turns.
A protein chain is like a necklace, with the chemical groups that identify each amino acid dangling from the chain like pendants. These "pendants" are known as R-groups. Alpha-helix looks like the binding of a "spiral" notebook, or a strand of string wound at even intervals down a pencil. A helix is also like a spring in that you can stretch it along its long axis, and when released, it will return to its original shape. In alpha-helix, the R-group "pendants" project outward from the axis of the helix.
Beta-strands fold back and forth at the carbon atom to which the R-groups are attached. In beta-strands, the R-groups project from the folded chain on alternate sides. Beta strands from different parts of the sequence or even from different sequences can line up with their R-groups in register. Adjacent strands form weak bonds that connect them into beta sheets or cylindrical beta barrels.
Turns are just that: a region of the molecule that goes off in a different direction than the one it came from. Turns may connect two regions of alpha helix or beta strands to form alpha-turn-alpha or beta-turn-beta complexes, or to connect alpha and beta regions.
As elements in the music that add to its depth, the fact that these secondary structures exist in proteins, in addition to the variation and theme of the protein sequences themselves, is enough to make rich and interesting music. But to better understand how these simple patterns might contribute to even deeper structure in the music, we looked to the extra insight offered by the scientific study of more complex protein folding patterns.
Various combinations of secondary structure form local domains in a protein’s tertiary structure, or overall architecture. As cathedrals can be classified as Romanesque, High Gothic, Perpendicular, and so forth, protein architectures are grouped into different categories, some of which are named simply for one of the proteins exhibiting the pattern, like "immunoglobulin folds," while others are named more descriptively, like "trefoil (cloverleaf)," "Greek key," or "beta sandwich."
The proteins we chose to work with in this project were representative of four major pattern categories: fibrous, predominantly alpha, predominantly beta, and mixed alpha-beta. To distinguish between alpha and beta regions of these proteins, and to mark the turns, we decided to use changes in instrumentation and/or pitch. For those proteins that have long regions in which one or more motifs are tandemly reiterated, we chose instead to use different voicings to differentiate between these motifs. What surprised us as we began to hear the sequences was that some of the alpha and beta regions also were marked by motifs whose sequences might not obviously repeat, but the general shaping of whose phrases did.
Discovering the Music in Proteins
In Dunn’s previous music programs using DNA or protein sequences to generate music, pitches were assigned in two ways, either absolutely by giving a fixed pitch to each amino acid or relatively by making a frequency histogram of the amino acids in the protein and assigning more consonant intervals to the more frequent amino acids. Because the properties of amino acids are important in determining folding pattern, we decided to recognize those properties by adding a third method for assigning pitch. We arranged the amino acids roughly according to their water solubility. The most insoluble residues were assigned pitches in the lowest octave, the most soluble, including the charged residues, were in the highest octave, and the moderately insoluble residues were given the middle range. Pitches ranged over three octaves in the diatonic scale, two octaves for a chromatic scale, and about four for pentatonic and whole-tone scales.
Since solubility scales are set according to various criteria, about which there is no real consensus, we also paid some attention to issues of harmony, setting the pitches of amino acids with similar R groups at consonant intervals. Setting the scale according to solubility produced an interesting effect. As the linear sequence winds in and out of the interior of the protein, we hear counter-melodies in the music: one in the lower register representing the interior water-insoluble amino acids, and another in the upper register representing the more soluble ones arranged at the protein-water interface: our linear sequences were playing two and sometimes three parallel and slightly offset tunes.
We also discovered another feature of the proteins: they had more than one personality. One of the earliest proteins that we set was lysozyme C, and it was set three times, twice by Dunn, and once by Clark. This happened more or less by accident as we each prepared for lectures that were given, along with visual artist Jamy Sheridan, at the Ann Arbor Museum of Art in May, 1997. However, the experience of listening to these parallel compositions, each developed independently in two different locations (Clark in Texas, Dunn in Michigan), but with the same protein data, and on the same sonification software, gave more insight into the astounding depth of structure Nature has built into Her art. Each piece was different from the others, so different that probably only someone very familiar with the lysozyme sequence would recognize it as the basis of the three pieces. We asked ourselves how the same sequence could assume these different characters.
One answer was relatively trivial: any piece assumes a different character if its rhythm, tempo and instrumentation are changed, just as the tune of "Amazing Grace" could function either as a march or as a lullaby, depending on such factors. The protein tunes also vary depending on which of the many pitch tables available to us are used. However, each of these variants is an authentic voice of the protein, because of a critical feature of the proteins and nucleic acids as informational molecules. For each, there are so many possible combinations of tunes, it is often possible to specify a protein or DNA sequence uniquely by using fewer than ten amino acids (or DNA codons) as the search pattern. This is not surprising, since any given sequence pattern of 10 amino acid residues would occur at random with a probability only of 1 / 1.024 x 1013. Indeed there are some combinations of 10 amino acids that do not appear in any protein now recorded in the data bases. However, for a real protein, the pattern of pitch relationships produced by a given sequence will belong only to that protein, regardless of the pitch table used. Listening to a given protein’s many voices is a way of inquiring into its nature, asking it to "Use language we can comprehend" . And so we interview each sequence many times, hoping to ask it the question that will produce an answer meaningful to us, in terms of our own musical experience.
Because of the fruitfulness of multiple inquiry, we have continued to set individual proteins independently, as we did lysozyme, with Dunn asking "Where is the art in your science?" and Clark asking, "Where is the science in your art?" Our musical answers, and the software we used to ask these questions, are available on the Internet sites given below. We invite interested persons to add to the harmony with their own interpretations.
John Dunn, Algorithmic Arts. http://algoart.com
John Dunn, DNA Music. http://algoart.com/dnamusic
Dr. M. A. Clark, The Music Room. http://www.startext.net/homes/macclark/Music/musicpag.htm
Dr. Kent W. (Kim) Bridges. http://www.botany.hawaii.edu/faculty/bridges/
1. Wendy Carlos. Switched-On Bach, 1968 CBS MK 63501. http://www.player.org/pub/u/wendy/
2. Morton Subotnick. http://newalbion.com/artists/subotnickm/
5. PROSITE. http://expasy.hcuge.ch/sprot/prosite.html Accession # PDOC00119. Documentation for entry PS00128: Lactalbumin_lysozyme. Accession # PDOC00793 . Documentation for entry PS01033: Globin.
7. IMB-Jena. Notations, Properties and Images of the 20 Standard Amino Acids. http://www.imb-jena.de/IMAGE_AA.html
8. NIH. Table of Standard Genetic Code. http://www.nih.gov/dcrt/expo/talks/cybersci/links/gencode.html
9. SWISS-PROT. http://expasy.hcuge.ch/sprot/sprot-top.html Accession # P02023. Hemoglobin beta chain, Homo sapiens. Accession # P10061. Hemoglobin beta-2 chain, Sphenodon punctatus.
10. OMIM (Online Mendelian Genetics in Man). http://www3.ncbi.nlm.nih.gov/Omim/ Entry # 141900. Hemoglobin--beta locus; HBB.
Tables and Figures
15 December 1997.
Send comments to: email@example.com
copyright 1997 ISAST
If you are interested in becoming a Leonardo Web Site pioneer and homesteading this page, please contact the Leonardo Homesteading Project.