Disclaimer: I don’t have strong opinions on LLMs. I don’t think they’re going to kill us. I don’t think it’s going to steal my job. I don’t think it’s useless, nor do I think it has few real-world applications (like say, a public distributed ledger). I think my biggest criticism is it’s attracted way too much capital, but other than that it’s just another possible tool in the toolbox.
With that exceedingly medium take, managing to piss off everyone because (because everyone has strong opinions on LLMs), we can get to the point:
Generating Music Without LLMs
Why?
It documents the process of me smashing my head against the keyboard to build a game called LETSGO
It’s gotten long enough to break into several sections:
So here's what I want to do. I want to build Stockfish, but for music.
Stockfish, of course, is the chess engine orders of magnitude better at chess than human.
Can I make a music engine orders of magnitude better than humans?
No.
There main problem with building Stockfish for music theory is a simple one:
Chess is a zero-sum game.
There is a winner and loser and each possible move either improves or worsens your chances of winning the game.
The entire basis of chess engines is to walk the most amount of most winningest moves into as efficient an algorithm as possible.
Chess Engine programming is a surprisingly deep rabbit hole:
My next adventure in low-level programming might just be replicating some of these fundamentals in Zig.
In music, it’s all subjective. It’s hard to say choosing a note is bad in absolute terms.
It’s true there are cultural rules to music- this entire series started with Designing Sound, the conceit being culture gives you rules to make reasonable sounding music.
But the best you can get with perfect adherence to music theory is stunning average music.
But if we can create a system to create inoffensive music, in theory it can have an infinite playtime. And things get interesting if we can hook in gameplay events to change the music:
- Shift to a Minor scale when an enemy approaches
- Or brighten to a Major scale when you get the big reward.
Music is not the sound that’s made, music is the feeling the human get when they hear the sound.
Human participation is the most important part of music.
A good music engine is one that responds appropriately to the human’s action in the game.
Possible Approaches
My first instinct for programming this note composition was to use the Strategy pattern to define different methods of constructing notes.
It’s a fairly straight-forward approach, utilizing a well-known design pattern in its intended use.
It would allow me to define things like Set Pedal Point
on a bass instrument- Just start playing a low Tonic note every beat or whatever.
Then I can write a contained strategy for Develop Motif
, a one/two bar melodic structure.
The thing I find potentially interesting about this is strategies that extend/consume other strategies:
Establish Chords
could consume the Motif created and build chords using the notes chosen by the motif.
But that might also be optional, the ordering reversed- we establish chords, then generate a motif consuming the chords.
What I like about this approach in general is that it is additive in nature. Repetition legitimizes, so a motif of ii-V-I
repeated in the chord progression creates a sense of cohesion in the piece.
Music is not a zero-sum game. It’s positive, fractal in nature. It’s about consuming the other musician’s contributions and iterating.
It might not be stockfish for music theory, but it is more in line with the fundamental nature of music.
Musical Strategies
So I opened a new PR to encapsulate this new feature:
I created a simple MusicalStrategy
interface and built out a single concrete strategy of setting a pedal point.
This was all accomplished quickly.
Things immediately slowed down once I started asking hard questions like “How do strategies get chosen”
To figure that out, I decided to go on a long rubber duck session designing the data for what the Composer actually needs:
The result of which is this pseudo-data object:
//ComposerData Object
// Data relevant for playing the sounds
FInstrumentData = BassInstrument;
int OctaveMin = 1;
int OctaveMax = 2;
// Music Theory Data
Scale = [ C, D, E, F, G, A, B ]
// Chosen strategies, what to play, when to play it
ChosenStrategies = [
{
InstrumentSchedule = { [ C, C, C, C ] }
StartAtBar = 4,
BarsToPlay = 2,
Strategy = PedalPoint
},
{
InstrumentSchedule = { [ C, D, G, E ] }
StartAtBar = 6,
BarsToPlay = 4,
Strategy = PlayMotif
},
]
// Weighted strategies to choose from when composing next section
PossibleStrategies = [
PlayMotif = {
AppropriatenessToChooseStrat = 8.0,
InstrumentInput = {
{
ComposerData = LeadGuitar*,
AppropriatenessForStrat = 1.0,
},
{
ComposerData = ChordSynth*,
AppropriatenessForStrat = 0.0
},
},
},
This object:
- Is representative of a single instrument through
InstrumentData
. - Has a set of
InstrumentSchedules
, wrapped in a data object with ancillary information like when to play, and how many times to play it. - Has a set of
StrategyData
objects, defining different approaches for creating music, along with data related to choose each strat, and a set of ComposerData for other instruments that can be used by the Strategy.
I was able to implement this object into the game pretty easily:
But, things slowed down considerably once I started trying to define those Appropriateness
weights.
The Weighted Sum of All Fears
I had this idea that strategies could be chosen by some weight.
In essence, the MusicComposer
would be much more likely to choose CreateMotif
strategy over PedalPoint
strategy if Create Motif had an Appropriateness value of 0.6
vs. 0.2
.
And that’s true. Easy logic.
Slightly more complex trying to figure out how to generate 0.6
vs. 0.2
Like, do I start at 1.0
and remove points for conditions not met?
Or maybe start at 0.0
and add for conditions that are met?
Or 0.5
and why not both?
And then how do I maintain any sort of consistency among strategies?
There are some global rules that can be followed: Repetition Legitimizes
for example.
If a motif of D G C
exists, its better for an instrument to use and extend that motif.
An instrument can be chosen to do chords- D-F-A, G-B-D, C-E-G
- creating triads off that root note.
Ultimately I decided to go for a fairly straightforward approach- start at an arbitrary value, increase or reduce that value based on input.
After fighting with the data structure, specifically flattening and simplifying as much as possible, I ended up with something that kind of works- a pedal point strategy hitting the same note every beat:
The video reveals some bugs… some of which have since been fixed in a major refactor:
… Some of the bugs still remain. Regardless, we forge ever onward.
CreateMotif
The further away from 1, the stronger the desire to return to 1.
If we snap back to 1, the tension is gone, it is resolved unto the tonic
So there is a tension budget
0.5 does not have the same pull to resolve as .9
At .9 we almost guarantee the next note will resolve
There’s only 0.1 room for resolution
For a ii-V-i - 0.5, 0.8, 1.0
Here there is a strengthening desire to resolve
Furthermore, the ol V-I is a “perfect cadence”, it finishes an idea.
Some common progressions as reported by the internet:
1-5-6-4
1-4-5-4
2-5-1
1-1-1-1, 4-4-1-1, 5-4-1-1
1-6-4-5
1-5-6-3, 4-1-4-5
Converted into resolution weights:
1.0, 0.8, 0.3, 0.6
1.0, 0.6, 0.8, 0.6
0.5, 0.8, 1.0
1.0, 1.0, 1.0, 1.0 - 0.6, 0.6, 1.0, 1.0 - 0.8, 0.6, 1.0, 1.0
1.0, 0.3, 0.6, 0.8
1.0, 0.8, 0.3, 0.4 - 0.6, 1.0, 0.6, 0.8
Things that jump out at me:
The 4th is a common, versatile chord. Seems like excellent “glue” chord
Works well with the 6th, but only in terms of 6→4, not 4→6.
7, 2, 3, chords are surprisingly rare
I suppose this because the 3 is mainly reserved for modifying the shape of each chord in each of those progressions. In the above example, there’s no distinction between minor or major 6th, we’re just defining the root note.
I think the 7th works the same way, It would only really work as a lead in to the root.
The other thing is that these are very much chord progressions. They are common progression in terms of the chords they would produce. You’ll put a 7 in any chord you want, for flavor, but you rarely have a 7th chord.
Additionally, there are emotional evocations for each key:
Major Keys
- C Major: Pure, joyful, innocent
- G Major: Friendly, happy, pastoral
- D Major: Triumphant, bright, bold
- A Major: Joyful, confident, spirited
- E Major: Powerful, brilliant, resilient
- B Major: Optimistic, bright, intense
- F♯ Major: Majestic, ecstatic
- D♭ Major: Warm, dreamy, elegant
- A♭ Major: Graceful, tender
- E♭ Major: Heroic, strong, noble
- B♭ Major: Harmonious, cheerful
- F Major: Calm, simple, rural
Minor Keys
- A Minor: Sorrowful, melancholic, reflective
- E Minor: Mournful, restless, poignant
- B Minor: Dark, brooding, dramatic
- F♯ Minor: Mysterious, intense
- C♯ Minor: Depressed, reflective
- G♯ Minor: Agitated, intense
- D♯ Minor: Desperate, poignant
- B♭ Minor: Gloomy, lamenting
- F Minor: Serious, pensive
- C Minor: Tragic, heroic, passionate
- G Minor: Discontented, restless
- D Minor: Grave, serious, solemn
Why does this matter? We’re creating a motif- a motif is a musical idea, the strength of that idea is the emotional resonance that idea evokes, and how well that idea integrates with the rest of the composition.