Score tracking is a favorite problem within the computer music community, but most systems rely on F-zero pitch tracking in combination with symbolic (i.e. MIDI score) to match live music with either an event counter (MAX-MSP) or tempo tracking/warping (e.g. dynamic time-warping, resampling,etc.)
Instead, my proposal is a rich-feature system that relies on Mel-Frequency Cepstral Coefficients (MFCC's) via SoundSpotter. To utilize the accurate matching of SoundSpotter, we can employ a guide-track for a given instrument part, or a prev. recorded track of the close approximation of the same material. By comparing the live instrument input matching to the guide track with the guide track matching to a warped version of the guide track, a temporally-accurate distance can be computed.
So! The short story is that the system more-or-less works, but needs testing/playing with and there are a few questions left:
0. the distance measure is dead-simple: It's simply the difference between the index values from SS1 and SS2, where SS1 is the guide-track matching, SS2 is the live-input matching. Is there a better measure (related to the last question)?
1. What happens when there is big sections of silence? Is it an important feature to account for this, esp. with respect to “event” triggering?
2. Is it useful to learn habits of performers? Perhaps a simple prior can be learned each iteration for a probabilistic model automatically? This seems esp. important for the boundaries around a perfect match between SS1 and SS2.
~ S. Topel