Lessons from CORM/future projects

Over this past summer I found myself very wrapped-up with the compositional process and tech behind CORM. Consequence of Recursive Memory was a piece I composed and presented with funding I received from the City of Seattle Office of Arts & Culture. I presented p_CORM (or what I dubbed an ‘iterative premix’ of the work) in Portland on September 6, and then CORM in Seattle on September 7. The first version was somewhat of a practice set, which informed a more fully fleshed out version of the work the following evening in Seattle. That said, the actual acoustic environment of the Portland performance may have been more suited to the work. Leaven Community is an old church, with carpeted surfaces, as opposed to the live space of the Chapel Performance Space in Seattle. Prior to the Seattle performance I held a workshop/demo at a Patchwerks, an electronic music store in the Eastlake neighborhood. The workshop focused on some of my methods for using SuperCollider to control modular synths.

About two weeks have elapsed now since I performed the work. Since that time, I’ve reflected a bit on what worked, what didn’t, and in general, what I’ve learned since initiating this project. My primary challenge with this work was to engage with machine learning technology and put it to use. Its a deep subject, I learned a lot about it, and the tech influence the way I composed the work, but ultimately I couldn’t use it for the final performance of the work. Initially I set out to use the tech with the idea that the computer would learn from text that I would speak. Implementing that in realtime posed challenges, but it also became less interesting when I began to understand that it was only going to respond in ways that I programmed it to. I decided to rethink my approach, and focus more on how I might build a work that could progress based on what I was saying, such that the computer would recognize cues in the text that could trigger events to fully automate the work, to where I wouldn’t have to interface with the technology at all.

After some time engaging with possible approaches, I decided to learn Rebecca Fienbrink‘s Wekinator, taking her Kadenze course Machine Learning for Musicians and Artists. A brief video example pointed to the use of the apps Dynamic Time Warping feature for use with voice converted to data. The video showed a process of extracting MFCC data (Mel-frequency cepstral coefficients) from an incoming signal and use this data to train Wekinator in the cues. MFCCs are basically a timbral signature of a given sound source. This was done in a custom standalone app, but I managed to figure out how to do this within SuperCollider, sending the array of 13 coefficients over OSC through the Dynamic Time Warping feature of Wekinator to train it to my cues. Dynamic Time Warping is simply a way of listening in time for something specific. When a cue struck, it would then send an OSC message back to SuperCollider to initiate a sequence or parameter change. This worked, but then I hit a brick wall. With a silent background, I recorded my cues. However, if anything else was happening sonically in the background, my cues would no longer work. In headphones it worked fine, but it wouldn’t work over speakers in the same room with the mic because it would pick up the music I was trying to generate, in addition to my voice cue. Also, turning my head just slightly changed the timbral qualities of my voice enough to render my training set unusable. And lastly, the more cues I would add, the less capable the application seemed to recognize my cues – in part because I couldn’t adjust a matching accuracy threshold for individual cues (a limitation to what you can do in the Wekinator’s GUI).

I had made a piece with a system in mind that would work this way, and then ultimately, the system didn’t work. So I just mapped these cues to a MIDI controller that I manually controlled in performance. It worked, but it wasn’t what I had set out to do. If I were to try this again, or try using machine learning for something like this again, I would try a different approach. Maybe not use MFCC’s but use a different algorithm. Or maybe try using dynamics and pitch recognition. Or maybe use all three, process that data in some way and hopefully discover a more versatile, accurate method? Or, what if I somehow tied in a prebuilt AI voice recognition system through an API – like Seri or Google? For now, I’m going to take a break from it, at least with speech.

As I look forward to this year, I’ll be spending a lot more time with SuperCollider, and beginning to design a new hardware instrument built on SuperCollider, potentially using the Prynth platform or possibly doing something similar with a combination of a Raspberry Pi 4 + Teensy 4.0. The goal is to build a system that with a SC brain housed in a box with ports that connect to additional boxes that enable specific functionality with external hardware. The additional boxes might include sensors (IR, Proximity, flex, pressure, faders, buttons, encoders, light, temperature), or network ports that run to lighting equipment, or motors designed to strike or resonate acoustic objects, or a DAC sending +/-10V to connect to a modular synth.

I’ll also begin developing an interactive music system to generate sound and light for a dance piece Corrie Befort is creating. These will be integrated into large-scale set pieces and generate sound via sensors triggered by dancer and audience.

Leave a comment Cancel reply