Once you open up the possibility of a heavy-duty DSP chip like the SHARC with hardware accelerated FFT and biquad filters and can leverage some high-throughput ram like DDR2, you really have a powerhouse on your hands. I’ve been having a blast getting some of these wacky algorithms implemented. I’m still working out some of the details, and have a fair amount of hardware design to do on the interfacing side (CV, knobs, etc), but the framework is in place to allow massive amounts of modulation and some great time and frequency-based audio mangling.
As with the ‘b and the ‘b2 expansion is key. I’ll be building this to load new algorithms out of flash memory which should make this the most powerfully flexible hardware DSP in not only the euro world, but pedals and rack as well… (Hmmm pedals and rack? Anyone interested?)
The first algorithm I implemented on the ‘dsp was an impulse response reverb. It’s nice – who wouldn’t love an IR reverb in euro? But it’s a bit dry to demo. I’ll be putting together some reverb demos as soon as I get more algorithms implemented.
What’s a little more interesting? Extreme spectral stretching. That was next. Let’s get started with a basic demo. I’m using songs as the demo material mainly because the source material should be familiar and I want to demonstrate the level of manipulation that’s going on. Just remember that the point of the demos is not what it does to the source material, but what comes out of the algorithm as a basis for your own creations.
The audio starts and then the module is triggered. In the mode that’s being demoed, the last five seconds of audio are played through the stretching algorithm at some fixed speed.
The base stretching algorithm works just like PaulStretch or any other extreme stretching algorithm that takes the DFT of a signal, randomizes the phases, and recombines the DFT windows with an overlap-add mechanism.
When considered not as an effect to an existing piece of music, but a means to an end, the long, involved textures are quite hypnotic, and as a sound designer or composer, it can be extremely useful.
Here’s another demo – Listen as it evolves. It’s a lot like taking a pencil drawing and rubbing your thumb across it. All of the material exists, but it’s completely blurred.
This sample is played back at 128x stretch. That means that the 5 second sample in memory will take 10 minutes to loop!
Okay… What else can we do? Besides the playback speed, what other knobs can we manipulate?
We have two options: Time domain parameters and frequency domain. Since the stretch algorithm converts everything to the frequency domain, we have the opportunity to do some spectral processing which is a little more rare than time-domain processing. Let’s start there.
First of all, let’s try cutting any frequencies that are lower than a given threshold. I can cover the formula later, but the way we would do this is to take any given frequency bin, and if it doesn’t meet a threshold value, just set it to zero. This demo also captures looping through the entire 5 second sample in 10 minutes.
As you can hear, this completely changes the texture. If you consider what we’re doing, it’s two things. The first is that we’re removing any sustained tones that are lower than our threshold. The second is that we’re removing any frequencies whose magnitude may be higher in the time domain, but are not sustained and therefore lower in the frequency-power spectrum.
So instead of simply blurring the signal, we’re smoothing it out and enhancing the sustained tonic material.
Let’s hear another… This time, we’ll speed up to 16x playback speed.
So if we are removing the frequencies that are less prominent, can we do the opposite? Let’s zero out the bins for the frequencies that are above a threshold and leave the ones that are below. Here’s what that sounds like:
So there’s a bit more of an ethereal quality? Like if you did edge detection on an image and then blurred it? Possibly. Breathier perhaps? Here’s another sample. This is also at 16x playback speed.
It shouldn’t come as a surprise that the higher frequencies tend to have lower magnitudes and the lower frequencies have higher magnitudes. There’s probably a couple of reasons for this… Or maybe those reasons all boil down to some root cause like that we tend to listen in octaves, but the DFT bins are linear, or maybe that we don’t like overly bright music? Anyway, since those last two samples seemed a bit bright, maybe we want a little of both? Kill the dominant tones and super-low magnitude, but enhance the mid tones…
Not bad. Here’s another one that happens to also be a piano:
Woah woah woah WOAH WOOOOAH! The purists among us will begin balking. “You can’t just remove frequencies!” “The DFT is not some toy for you to play with as you wish!” “That’s not the way it works!” “That will cause nonlinear distortion in the spectral domain for non integral frequencies!” Or maybe those are the voices in my head. They aren’t entirely wrong. However, that’s fine. It’s an interesting effect, gives us some happy Bob Ross accidents, and can be used in any number of contexts.
But let’s say those voices are right and we want to enhance the dominant frequencies and cut the less harmonic ones – without butchering the frequency spectrum. How can we do that? That would have to be done in the time domain similarly to the way that a vocoder works… with a bank of resonant filters.
While the signal is in the frequency domain, we can iterate over the frequency bins and find the ones with the greatest magnitude and use that to set the center frequencies of a 5th order biquad bandpass filter bank.
Here, I’ve done this by creating 5 musical bands, similar to a parametric EQ (low, low-mid, mid, high-mid) and limiting each of the 5 filters to one of those bands.
This creates a slightly different texture (we’re back to 128x playback speed now)
Remember, it’s not about what it’s doing to the source material, it’s about what comes out. Here’s another one for good measure.
I may not have mentioned before, but these samples are straight off the dev board. Input from an iPhone – into the ADC, SHARC, and DAC… No further processing at all. No chorus, reverb, phasing, compression, etc. in a more realistic context, we’d be taking this digital power house and processing it with analog filters, distortion, waveshaping, reverb, delay, etc.
You may have noticed that all of the tracks are stereo. This is quite un-eurorack isn’t it? Yes, I agree, and the module does have a mono mode. All of the recordings so far have been using the mono-to-stereo algorithm. In mono-to-stereo mode, the width that you feel is based on the phase coherence of the stretched sound. You can go from 100% coherence (mono) to 0% coherence (completely random phase). To be honest, when tracking, I prefer to just record at 100% and you can always effectively narrow the signal using any mid-side plugin you may have access to in your DAW.
We’ve played a little with the playback speed. Some of the samples have been at 16x and others at 128x. There’s no requirement that the playback be at a power of two, but once you start writing code, you kind of get into some patterns like that.
Can we play back our “stretch” algorithm at real-time? Meaning with zero stretch? I wonder what that’s like?
What? That sounds like… a reverb? Hmmm. Is there some correlation between phase modulation and reverb? Actually, yes. Most of the traditional reverbs are banks of all-pass filters (really, just phasers) with some feedback and other controls like filters and delays.
This algorithm is basically nothing more than 32,768 phase modulators. Yes, you’re reading that right… The SHARC chip is performing two sets of 16,384 point DFTs, and in real-time, modulating the phases of each band. Actually, we’re doing more than modulating them. We’re making a complete disaster of them. Then recombining into two distinct time-domain signals – one for each left and right. That’s basically a total of thirty-two-thousand all-pass filters!
So all we’re missing is some feedback mechanism, a way to shape the frequency response of the input, output, and feedback, and maybe a couple of other finer points and we might have some really nice reverb… I’ll write more about that in another post, but for now, let’s see what it sounds like when we crank the stretch back up and add about 30% feedback. Playback speed is 54.613x (to prove that it doesn’t have to be a power of 2).
Let’s review what knobs we’ve manage to pull out of this extreme time stretching process so far. We’re interpolating a few points that were not explicitly mentioned, but are nonetheless, paramaters that we can configure and modulate.
- Playback Speed
- Process Thresholds?
- Minimum Frequency Threshold Removal
- Maximum Frequency Threshold Removal
- Process Filters?
- Number of Filters
- Filter Shape
- Feedback Filter Shape
- Stereo Width
There’s probably a few I didn’t mention that need to be implemented:
- DFT Brickwall High (Frequency domain low-pass filter)
- DFT Brickwall Low (Frequency domain high-pass filter)
- Swapping frequency bins? (this is crazy talk, stop me now!)
So, can we modulate these parameters? Absolutely. I need to get further than a prototype board to be able to have some knobs to tweak and CV inputs, but the modulation matrix is in progress to allow routing and configuration of these values…
Here are a couple of samples where the Minimum Frequency Threshold Value is modulated with a sloooow sin wave. (This is, afterall time-stretching… the modulation parameters should be configurable in the seconds to minutes!)
And something with a bit more frequency content:
I’ll leave you with a few more recordings I made of some different source material.
This is one of my favorites… just love that violin tone, tambora, with a little vocals mixed in…
Here’s something that’s heavy vocals:
Thanks for reading and listening. More info to come soon!