Home

2024

Worklog

LETSGO Game

Generative Music Without LLMs
Generative Music Without LLMs

Generative Music Without LLMs

Tags
DesignEngineeringUnreal
Owner
J
Justin Nearing

Disclaimer: I don’t have strong opinions on LLMs. I don’t think they’re going to kill us. I don’t think it’s going to steal my job. I don’t think it’s useless, nor do I think it has few real-world applications (like say, a public distributed ledger). I think my biggest criticism is it’s attracted way too much capital, but other than that it’s just another possible tool in the toolbox.

With that exceedingly medium take, managing to piss off everyone because (because everyone has strong opinions on LLMs), we can get to the point:

Generating Music Without LLMs

Why?

🎶
This is part of a ongoing series called Building A Music EngineBuilding A Music Engine

It documents the process of me smashing my head against the keyboard to build a game called LETSGOLETSGO

It’s gotten long enough to break into several sections:

So here's what I want to do. I want to build Stockfish, but for music.

Stockfish, of course, is the chess engine orders of magnitude better at chess than human.

Can I make a music engine orders of magnitude better than humans?

No.

There main problem with building Stockfish for music theory is a simple one:

Chess is a zero-sum game.

There is a winner and loser and each possible move either improves or worsens your chances of winning the game.

The entire basis of chess engines is to walk the most amount of most winningest moves into as efficient an algorithm as possible.

In music, it’s all subjective. It’s hard to say choosing a note is bad in absolute terms.

It’s true there are cultural rules to music- this entire series started with 🎵Designing Sound, the conceit being culture gives you rules to make reasonable sounding music.

But the best you can get with perfect adherence to music theory is stunning average music.

But if we can create a system to create inoffensive music, in theory it can have an infinite playtime. And things get interesting if we can hook in gameplay events to change the music:

  • Shift to a Minor scale when an enemy approaches
  • Or brighten to a Major scale when you get the big reward.

Music is not the sound that’s made, music is the feeling the human get when they hear the sound.

Human participation is the most important part of music.

A good music engine is one that responds appropriately to the human’s action in the game.

Possible Approaches

My first instinct for programming this note composition was to use the Strategy pattern to define different methods of constructing notes.

It’s a fairly straight-forward approach, utilizing a well-known design pattern in its intended use.

It would allow me to define things like Set Pedal Point on a bass instrument- Just start playing a low Tonic note every beat or whatever.

Then I can write a contained strategy for Develop Motif, a one/two bar melodic structure.

The thing I find potentially interesting about this is strategies that extend/consume other strategies:

Establish Chords could consume the Motif created and build chords using the notes chosen by the motif.

But that might also be optional, the ordering reversed- we establish chords, then generate a motif consuming the chords.

What I like about this approach in general is that it is additive in nature. Repetition legitimizes, so a motif of ii-V-I repeated in the chord progression creates a sense of cohesion in the piece.

Music is not a zero-sum game. It’s positive, fractal in nature. It’s about consuming the other musician’s contributions and iterating.

It might not be stockfish for music theory, but it is more in line with the fundamental nature of music.

Musical Strategies

So I opened a new PR to encapsulate this new feature:

I created a simple MusicalStrategy interface and built out a single concrete strategy of setting a pedal point.

This was all accomplished quickly.

Things immediately slowed down once I started asking hard questions like “How do strategies get chosen”

To figure that out, I decided to go on a long rubber duck session designing the data for what the Composer actually needs:

The result of which is this pseudo-data object:

//ComposerData Object 	

// Data relevant for playing the sounds
FInstrumentData = BassInstrument;
int OctaveMin   = 1;
int OctaveMax   = 2;

// Music Theory Data
Scale = [ C, D, E, F, G, A, B ]

// Chosen strategies, what to play, when to play it
ChosenStrategies = [
	{
		InstrumentSchedule = { [ C, C, C, C ] }
		StartAtBar = 4,  
		BarsToPlay = 2,
		Strategy = PedalPoint
	},
	{
		InstrumentSchedule = { [ C, D, G, E ] }
		StartAtBar = 6,  
		BarsToPlay = 4,
		Strategy = PlayMotif
	},
]

// Weighted strategies to choose from when composing next section
PossibleStrategies = [
	PlayMotif = {
		AppropriatenessToChooseStrat = 8.0,
		InstrumentInput = {
			{ 
				ComposerData = LeadGuitar*,
				AppropriatenessForStrat = 1.0,
		  },
		  {
			  ComposerData = ChordSynth*,
			  AppropriatenessForStrat = 0.0
		  },
		},
	},

This object:

  • Is representative of a single instrument through InstrumentData.
  • Has a set of InstrumentSchedules, wrapped in a data object with ancillary information like when to play, and how many times to play it.
  • Has a set of StrategyData objects, defining different approaches for creating music, along with data related to choose each strat, and a set of ComposerData for other instruments that can be used by the Strategy.

I was able to implement this object into the game pretty easily:

But, things slowed down considerably once I started trying to define those Appropriateness weights.

The Weighted Sum of All Fears

I had this idea that strategies could be chosen by some weight.

In essence, the MusicComposer would be much more likely to choose CreateMotif strategy over PedalPoint strategy if Create Motif had an Appropriateness value of 0.6 vs. 0.2.

And that’s true. Easy logic.

Slightly more complex trying to figure out how to generate 0.6 vs. 0.2

Like, do I start at 1.0 and remove points for conditions not met?

Or maybe start at 0.0 and add for conditions that are met?

Or 0.5 and why not both?

And then how do I maintain any sort of consistency among strategies?

There are some global rules that can be followed: Repetition Legitimizes for example.

If a motif of D G C exists, its better for an instrument to use and extend that motif.

An instrument can be chosen to do chords- D-F-A, G-B-D, C-E-G - creating triads off that root note.

Ultimately I decided to go for a fairly straightforward approach- start at an arbitrary value, increase or reduce that value based on input.

After fighting with the data structure, specifically flattening and simplifying as much as possible, I ended up with something that kind of works- a pedal point strategy hitting the same note every beat:

The video reveals some bugs… some of which have since been fixed in a major refactor:

… Some of the bugs still remain. Regardless, we forge ever onward.

CreateMotif

The further away from 1, the stronger the desire to return to 1.

If we snap back to 1, the tension is gone, it is resolved unto the tonic

So there is a tension budget

0.5 does not have the same pull to resolve as .9

At .9 we almost guarantee the next note will resolve

There’s only 0.1 room for resolution

For a ii-V-i - 0.5, 0.8, 1.0

Here there is a strengthening desire to resolve

Furthermore, the ol V-I is a “perfect cadence”, it finishes an idea.

Some common progressions as reported by the internet:

1-5-6-4

1-4-5-4

2-5-1

1-1-1-1, 4-4-1-1, 5-4-1-1

1-6-4-5

1-5-6-3, 4-1-4-5

Converted into resolution weights:

1.0, 0.8, 0.3, 0.6

1.0, 0.6, 0.8, 0.6

0.5, 0.8, 1.0

1.0, 1.0, 1.0, 1.0 - 0.6, 0.6, 1.0, 1.0 - 0.8, 0.6, 1.0, 1.0

1.0, 0.3, 0.6, 0.8

1.0, 0.8, 0.3, 0.4 - 0.6, 1.0, 0.6, 0.8

Things that jump out at me:

The 4th is a common, versatile chord. Seems like excellent “glue” chord

Works well with the 6th, but only in terms of 6→4, not 4→6.

7, 2, 3, chords are surprisingly rare

I suppose this because the 3 is mainly reserved for modifying the shape of each chord in each of those progressions. In the above example, there’s no distinction between minor or major 6th, we’re just defining the root note.

I think the 7th works the same way, It would only really work as a lead in to the root.

The other thing is that these are very much chord progressions. They are common progression in terms of the chords they would produce. You’ll put a 7 in any chord you want, for flavor, but you rarely have a 7th chord.

Additionally, there are emotional evocations for each key:

Major Keys

  • C Major: Pure, joyful, innocent
  • G Major: Friendly, happy, pastoral
  • D Major: Triumphant, bright, bold
  • A Major: Joyful, confident, spirited
  • E Major: Powerful, brilliant, resilient
  • B Major: Optimistic, bright, intense
  • F♯ Major: Majestic, ecstatic
  • D♭ Major: Warm, dreamy, elegant
  • A♭ Major: Graceful, tender
  • E♭ Major: Heroic, strong, noble
  • B♭ Major: Harmonious, cheerful
  • F Major: Calm, simple, rural

Minor Keys

  • A Minor: Sorrowful, melancholic, reflective
  • E Minor: Mournful, restless, poignant
  • B Minor: Dark, brooding, dramatic
  • F♯ Minor: Mysterious, intense
  • C♯ Minor: Depressed, reflective
  • G♯ Minor: Agitated, intense
  • D♯ Minor: Desperate, poignant
  • B♭ Minor: Gloomy, lamenting
  • F Minor: Serious, pensive
  • C Minor: Tragic, heroic, passionate
  • G Minor: Discontented, restless
  • D Minor: Grave, serious, solemn

Why does this matter? We’re creating a motif- a motif is a musical idea, the strength of that idea is the emotional resonance that idea evokes, and how well that idea integrates with the rest of the composition.