Friday, October 19, 2007

The SHARC Timbre Dataset v. 2.0: XML Format

SHARC is a dataset of musical timbre information that I collected by analyzing over 1300 orchestral musical instrument notes. Specifically, the information is amplitude and phase data from a selected steady-state portion of each note. The dataset is now available in XML format.

Some time ago, when I was a grad student, and while holding various fellowships after I got my PhD, I did research in music, human hearing and digital audio (see my publications). One of the projects I undertook was to compile a collection of information on musical instrument tones, which I called SHARC ("Sandell Harmonic Archive").

I've described SHARC in a few places before: in an article from 1991, and in the release notes from the original distribution. Briefly, though, what I did was this. I had a collection of CDs consisting of individually performed notes of all the standard instruments of the orchestra, one recording for each note in the respective instrument's playable range. For each note, I chose a middle portion of the recording, during the note's steady state, and performed a spectrum analysis. I saved the amplitudes and phases of all the harmonics of the pitch's fundamental frequency up to a ceiling of 10kHz.

In my first version of the distribution (which you can still download in compressed tar format), SHARC consisted of a series of files, one for each note that was analyzed, organized into directories by instrument. That was 1994; since then, XML has come into being and I've now released SHARC in an XML format.

I'm calling this SHARC's "2.0" release, and back-versioning the original distribution to "1.0" (even though I timidly referred to it at as version 0.921 at the time). In this blog article, I'll describe the design of this 2.0 version, for the convenience of anyone who would like to work with it.

Let's consider the XML that specifies a single instrument and all of its notes, and their harmonics. The rough outline of the XML is:



<instrument>
<note> <!-- first note -->
<a/> <!-- harmonic 1 -->
<a/> <!-- harmonic 2 -->
...etc...
</note>
<note> <!-- second note -->
<a/>
<a/>
...etc...
</note>
...etc...
</instrument>



The <instrument> element has the following attributes:
  • id: the instrument's short name, containing no spaces, suitable for variable names and querystring parameters
  • name: the instrument's longer, more descriptive name
  • source: the cd from which the tone originated
  • cd: the volume of cd
  • track: the track on the cd
  • numNotes: the number of notes for this instrument


Here is a sample <instrument> element:


<instrument
id="CB_pizz" name="Contrabass (pizzicato)"
source="McGill" cd="1" track="18"
numNotes="41">


The <note> element has the following attributes:
  • pitch: the notes pitch and octave number, e.g. c4 = middle C. Sharps are specified with the letter s, e.g. 'fs4' rather than 'f#4'.
  • seq: the sequential order number of the note in the series (i.e. starting at 1 with the first note)
  • keyNum: numerical location of the pitch on a piano keyboard, where middle C = 48
  • fundHz: the frequency of the note's fundamental (e.g. a4 = 440)
  • numHarms: the number of harmonics (i.e. the number of <a> elements to follow)



Here is a sample note element:

<note pitch="cs1" seq="2" keyNum="13"
fundHz="34.648" numHarms="287">


Finally we have the harmonic data itself, contained in the <a> element. The harmonic amplitude value is the text node of the element, expressed as a linear value (i.e. not in dB). The attributes for the <a> element are:
  • n: the sequential order number of the harmonic in the series (i.e. starting at 1 with the first harmonic)
  • p: phase, expressed in the range between negative and positive pi


Here is a sample sequence of a few <a> elements:

<a n="1" p="-1.686">32.91</a>
<a n="2" p="0.309">2131.69</a>
<a n="3" p="1.764">5878.0</a>


Using the brief names 'n' and 'p' keeps the size of the XML document lower. For similar reasons, the frequency of each harmonic is not given. To obtain the frequency of the harmonic, you simply multiply the value of the "n" attribute by the value of the "fundHz' attribute of the 'note' element.

As I said, that is a rough sketch of the XML; to simplify the explanation I left out some of the detail. In addition to what I have discussed so far, each instrument element, and each note element, has a sibling element <ranges> which contains useful metadata. Here is a sample <ranges> element for an <instrument> element:


<ranges>
<lowest>
<harmonicFreq harmNum="1" keyNum="12"
pitch="c1">32.7
</harmonicFreq>
<pitch
fundHz="32.7" keyNum="12">c1</pitch>
<amplitude freqHz="8449.15" keyNum="22"
pitch="as1" fundHz="58.27"
harmNum="145">0.0</amplitude>
</lowest>
<highest>
<pitch fundHz="349.22"
keyNum="53">f4</pitch>
<harmonicFreq harmNum="151" keyNum="25"
pitch="cs2">10463.69</harmonicFreq>
<amplitude freqHz="261.62" keyNum="48"
harmNum="1" pitch="c4"
fundHz="261.626">15389.0</amplitude>
</highest>
<pitches>c1 cs1 d1 ds1 e1 f1 fs1 gs1 a1 as1
b1 c2 cs2 d2 ds2 e2 f2 fs2 g2 gs2 a2 as2
b2 c3 cs3 d3 ds3 e3 f3 fs3 g3 gs3 a3 as3
b3 c4 cs4 d4 ds4 e4 f4
</pitches>
</ranges>


The logic behind the <ranges> element is mostly convenience for applications that will be constructing graphic plots from the data. For example, having the highest and lowest frequency specified here, rather than making it necessary to traverse through the data to find it, makes it easier for a program to set up the minimum and maximum for a graphic plot. The <pitches> element is another convenience that keeps the user from having to issue a thorny xpath query just to get a list of all the instrument's pitches.

Let's drill down into the details of this <ranges> element. The text node of the ranges/lowest/harmonicFreq element is the lowest frequency of any harmonic in the entire instrument's collection. Obviously, this is always harmonic 1 of the instrument's lowest note. The attributes for harmonicFreq convey this, as well as the pitch (c1) and keyNum (12). The element ranges/lowest/pitch contains the same information, but described in terms of the lowest pitch and its fundamental frequency. This redundancy has little impact since it is occurs just once for the instrument. Information about the lowest amplitude harmonic to be found in the instrument is given in the ranges/lowest/amplitude element. For the instrument in question, this honor goes to a#1 (keyNum of 22, fundamental frequency of 58.27 Hz), 10 semitones above the instrument's lowest note, the 145th harmonic (frequency of 8449.15 Hz).

The ranges/highest element provides equivalent data for the highest harmonic frequency, highest pitch and highest amplitude.

Here is a sample <ranges> element for an <note> element:

<ranges>
<lowest>
<amplitude freqHz="6475.19"
harmNum="198">0.0</amplitude>
<harmonicFreq
harmNum="1">32.7</harmonicFreq>
</lowest>
<highest>
<amplitude freqHz="98.1"
harmNum="3">2335.0</amplitude>
<harmonicFreq
harmNum="303">9909.0</harmonicFreq>
</highest>
</ranges>


This element provides data similar to instrument/ranges, but in terms of the highest/lowest frequency and amplitude harmonics for the note in question.

The XML was designed in a way that the entire SHARC dataset could be combined into a single XML file (i.e. as a series of instrument elements), and this file is in fact available for download in zip format. However, this file is quite large (nearly 3 meg), which will put quite a burden on parsers, and especially DOM parsers. For more efficient processing, I have placed each instrument into its own dataset file.

For a summary, here is a shorthand showing the overall design of the xml, with attributes shown in red and text nodes in blue:


tree

instrument (id, name, source, cd, track, numNotes)
ranges
lowest
harmonicFreq (harmNum, keyNum, pitch) [frequency]
pitch (fundHz, keyNum) [pitch]
amplitude (freqHz, keyNum, pitch, fundHz, harmNum) [amplitude]
highest
(all same as lowest)
note (pitch, seq, keyNum, fundHz, numHarms)
ranges
lowest
amplitude (freqHz, harmNum) [amplitude]
harmonicFreq (harmNum) [frequency]
highest
all same as lowest)
a (n, p) [amplitude]


I'm not attached to this particular XML design, and I may come out with a 3.0 version some day. One change I expect to make in a future version is to move a lot of information that is in attributes to elements, which means that more queries would return element nodes that could be further processed. Another idea I have is to make a secondary, "bare bones" release, that would have no metadata, for quicker processing.

Enjoy playing with the data!