The intersections between musicology and artificial intelligence (AI) are reviewed, describing the rewards from the interdisciplinary study of music with AI techniques, and the converse benefits to AI research. The arguments for formalisation of musicological theories using AI and cognitive science concepts are presented. These bear upon the approach of research, considering ethnographic and process models of music versus traditionally descriptive methods of music study. This enquiry investigates the degree to which the human task of music can be studied and modelled computationally. It simultaneously performs the AI task of problem domain identification and constraint.
The psychology behind rhythm is then surveyed. This reviews findings in the literature of the characterisation of elements of rhythm. The effect of inter-onset timing, duration, tempo, accentuation, meter, expressive timing (rubato), the inter-relationship between these elements, the degree of separability between the perception of pitch and rhythm, and the construction of timing hierarchy and grouping is reported. Existing computational approaches are reviewed and their degrees of success in modelling rhythm are reported.
These reviews demonstrate that the perception of rhythm exists across a wide range of timing rates, forming hierarchial levels within a wide-band spectrum of frequencies of perceptible events. Listeners assign hierarchy and structure to a rhythm by an arbitration of bottom-up phenomenal accents and top-down predictions. The predictions are constructed by an interplay between temporal levels. The construction of temporal levels by the listener arises from quasi-periodic accentuation.
Computational approaches to music have considerable problems in representing musical time. In particular, in representing structure over time spans longer than short motives. The new approach investigated here is to represent rhythm in terms of frequencies of events, explicitly representing the multiple time scales as spectral components of a rhythmic signal.
Approaches to multiresolution analysis are then reviewed. In comparison to Fourier theory, the theory behind wavelet transform analysis is described. Wavelet analysis can be used to decompose a time dependent signal onto basis functions which represent time-frequency components. The use of Morlet and Grossmann's wavelets produces the best simultaneous localisation in both time and frequency domains. These have the property of making explicit all characteristic frequency changes over time inherent in the signal.
An approach of considering and representing a musical rhythm in signal processing terms is then presented. This casts a musician's performance in relation to an abstract rhythmic signal representing (in some manner) the rhythm intended to be performed. The actual rhythm performed is then a sampling of that complex "intention" rhythmic signal. Listeners can reconstruct the intention signal using temporal predictive strategies which are aided by familarity with the music or musical style by enculturation. The rhythmic signal is seen in terms of amplitude and frequency modulation, which can characterise forms of accents used by a musician.
Once the rhythm is reconsidered in terms of a signal, the application of wavelets in analysing examples of rhythm is then reported. Example rhythms exhibiting duration, agogic and intensity accents, accelerando and rallentando, rubato and grouping are analysed with Morlet wavelets. Wavelet analysis reveals short term periodic components within the rhythms that arise. The use of Morlet wavelets produces a "pure" theoretical decomposition. The degree to which this can be related to a human listener's perception of temporal levels is then considered.
The multiresolution analysis results are then applied to the well-known problem of foot-tapping to a performed rhythm. Using a correlation of frequency modulation ridges extracted using stationary phase, modulus maxima, dilation scale derivatives and local phase congruency, the tactus rate of the performed rhythm is identified, and from that, a new foot-tap rhythm is synthesised. This approach accounts for expressive timing and is demonstrated on rhythms exhibiting asymmetrical rubato and grouping. The accuracy of this approach is presented and assessed.
From these investigations, I argue the value of representing rhythm into time-frequency components. This is the explication of the notion of temporal levels (strata) and the ability to use analytical tools such as wavelets to produce formal measures of performed rhythms which match concepts from musicology and music cognition. This approach then forms the basis for further research in cognitive models of rhythm based on interpretation of the time-frequency components.