Home

2024

Worklog

LETSGO Game

Generative Music Without LLMs
Generative Music Without LLMs

Generative Music Without LLMs

Tags
Owner
Justin Nearing

Disclaimer: I don’t have strong opinions on LLMs. I don’t think they’re going to kill us. I don’t think it’s going to steal my job. I don’t think it’s useless, nor do I think it has few real-world applications (like say, a public distributed ledger). I think my biggest criticism is it’s attracted way too much capital, but other than that it’s just another possible tool in the toolbox.

With that exceedingly medium take, managing to piss off everyone because (because everyone has strong opinions on LLMs), we can get to the point:

Generating Music Without LLMs

Why?

🎶
This is part of a ongoing series called Building A Music EngineBuilding A Music Engine

It documents the process of me smashing my head against the keyboard to build a game called LETSGOLETSGO

It’s gotten long enough to break into several sections:

So here's what I want to do. I want to build Stockfish, but for music.

Mate in Five: How Chess Engines Work

There is a helpful chess engine wiki as a starting point:

Modern chess engines follow a couple phases to choose the next best move:

  1. Board Representation - Have the current state of the game, and all possible rules
  2. Search - Algorithm to search possible moves from current state
  3. Evaluate - Algorithm to evaluate each possible move

There’s some other stuff, like opening/endgame databases, but for our purposes this is a good starting point.

  1. Composition Representation - Have the current state of the musical composition, and knowledge of music theory, etc.
  2. Search - Algorithm to search possible Notes from current state
  3. Evaluate - Algorithm to evaluate each searched note value

Obviously the big difference between chess and music is that Chess is a zero-sum game. There is a winner and loser and each possible move either improves or worsens your chances of winning the game.

In music, it’s all subjective. It’s hard to say choosing a note is bad in absolute terms.

Thankfully, there are cultural rules to music. If a song is following in the musical tradition of Samba, for example, you absolutely have the ability to choose the “wrong note” in context.

If we treat these cultural rules as real rules, we can in theory create reasonable sounding music.

💡

I wont go as far as to say “good sounding music”, as perfect adherence to music theory will get you at best stunningly average music. But if we can create a system to create inoffensive music, in theory it can have an infinite playtime. Things get more interesting if we can also hook in gameplay events to change the music- say a modal shift to a Minor scale when an enemy approaches, or brighten to a Major scale when you get the big reward.

Possible Approaches

My first instinct for programming this note composition was to use the Strategy pattern to define different methods of constructing notes.

It’s a fairly straight-forward approach, utilizing a well-known design pattern in its intended use.

It would allow me to define things like Set Pedal Point on a bass instrument- Just start playing a low Tonic note every beat or whatever.

Then I can write a contained strategy for Develop Motif, a one/two bar melodic structure.

The thing I find potentially interesting about this is strategies that extend/consume other strategies:

Establish Chords could consume the Motif created and build chords using the notes chosen by the motif.

But that might also be optional, the ordering reversed- we establish chords, then generate a motif consuming the chords.

What I like about this approach in general is that it is additive in nature. Repetition legitimizes, so a motif of ii-V-I repeated in the chord progression creates a sense of cohesion in the piece.

Music is not a zero-sum game. It’s positive, fractal in nature. It’s about consuming the other musician’s contributions and iterating.

It might not be stockfish for music theory, but it is more in line with the fundamental nature of music.

Musical Strategies

So I opened a new PR to encapsulate this new feature:

I created a simple MusicalStrategy interface and built out a single concrete strategy of setting a pedal point.

This was all accomplished quickly.

Things immediately slowed down once I started asking hard questions like “How do strategies get chosen”

So I need to figure that out.

Right off the rip, we can rule any strategy that is not valid. For instance, if you need a full scale for a “Create Motif” strategy, don’t use it if you only have a tonic note.

Let’s widen our lens though, think about the entire lifetime of a musical composition.

We did this as part of the design exercise in Designing The Core Gameplay LoopDesigning The Core Gameplay Loop

Intro
Verse
Bridge
Chorus
Verse
Chorus
Outro

As part of that we also defined gameplay phases as being assigned to a musical phase:

Action Name
Action State
Eligible
Repeatable?
Eligible Phase
Set Tonic
Complete
False
False
Intro
Set Third
Currently Active
True
False
Intro
Set Mode
Pending
False
False
Intro
Bass Drop
Pending
False
True
Bridge
BPM Switch
Pending
True
True
Bridge, Chorus, Outro

I never actually implemented this, gameplay phases are currently ordered by hand, but the entire point of generative music is to have dynamic ordering of phases at runtime- reducing the chances of “predictable” music being generated.

In a similar vein, here’s some concrete strategies and what they would need:

Strategy
Description
Requirements
Pedal Point
Tonic per beat
Requires tonic
Create Motif
Create 1-2 bar melody
Requires scale*
Motif Variation
Change notes of melody
Requires motif
Motif Augmentation
Extend/Contract melody
Requires motif
Create Chord Progression
Establish ii-V-I in key
Modulate Chord Prog
ex. backdoor ii-V-I
Requires chords

It’s the Data Stupid, the DATA

Ok, so after free jazz coding a bit, I finally think I have the piece of the puzzle that sticks it all together.

It’s the data.

Consider the following object:

// There's an expectation all values are nullable 
USTRUCT()
struct FComposerData
{
	GENERATED_BODY()

	int NumBarsToCompose;
	FLetsGoGeneratedScale Scale;
	int OctaveMin = 1;
	int OctaveMax = 5;
	FInstrumentSchedule InstrumentSchedule;
	IMusicCompositionStrategy* CompositionStrategy;
	int ComposerDataObjectIndex;
};

Here I have build a struct representing the data a Composer needs to do its job- it’s incomplete, but it’s the glue that will be passed around to each other object that needs to consume what the composer is creating.

The Composer, broadly, needs to create InstrumentSchedules to send to some Instruments.

Instrument Schedules contain the Sound that will be played, and the beat it will be played on.

The Composer will hold a set of ComposerData.

In this set, some of the InstrumentSchedules will be currently playing, others will be pending.

For instance, imagine a Bass instrument playing a pedal point of [ Bb, Bb, Bb, Bb ]

The composer created a ComposerData object that contains the schedule for the Bass.

However, it only wants the Bass to play this for say, 4 bars.

After that, it will want to essentially replace the InstrumentSchedule with a motif:

[ Bb, Db, F, Ab ] - a walk up a B minor 7th scale.

So, given that scenario, what do we know we need?

  • FMusicalScale containing the Tonic, and Notes of the B minor scale
  • An FInstrumentSchedule containing the Sound wave of each note and the beat to play on
    • This schedule has a set of bars to play- so it would have an array of 4 [ Bb, Bb, Bb, Bb ]
    • The schedule also contains the Sound to play
      • It doesn’t actually send Bb , it sends {Note}_{Instrument}_{OctaveNumber}.wav
      • This means ComposerData needs to know the Instruments and the mapping to the actual wav file.
  • We mentioned this is a Bass instrument that will be playing.
    • This infers a type of instrument, and/or a range of Octaves that an instrument will play
  • A NumBars for the number of bars to play this instrument schedule
    • Actually, InstrumentSchedule is structured to have a set of bars to play already
    • What is needed is which bar to start playing this instrument.
    • With these two data points, the bar this will finish on can be derived.

So what do we do in terms of the evolution from pedal point to motif?

Somewhere the composer needs to determine: Bar 1-4 do a pedal point, then 5-8 “evolve” to motif.

Which means the Composer needs to hold a set of instruments and their evolutions:

// Bass 
{
	Instrument: SynthThing
	OctaveRange: [1..2]
	Schedules: [
		{ 
			schedule: [ [ Bb, Bb, Bb, Bb ] ], 
			bars: 4 
			startAtBar: 1
		},
		{
			// 2 bar schedule
			schedule: [ [ Bb, Db, F, Ab ], [Ab, F, Db, Bb] ]
			bars: 2 // would play 4 total bars
			startAtBar: 5
		} 
	]
}

Potentially here, the schedule described above doesn’t necessarily need to be an InstrumentSchedule.

It could be a related data object specifically for the Composer, the intent being a valid InstrumentSchedule can be derived from this object, when it is being sent to an Instrument to play.

OTOH, that might be an unnecessary abstraction. The InstrumentSchedule is structured in a way that works with the rest of the program. So like, get with the program amirite?

The real reason though is if I want to do something like chords, this is already easy to do in InstrumentSchedule. Ab at beat 1, F at beat 1, Db at beat 1, etc.

There’s also something to be said for the fact that I’m calling this a bass. I have a drum instrument and a single sampled synth instrument configured in the game. Technically the bass could be my “simplesynth” at Octaves 1-2 and lead at octaves 3-5.

Which leads to this question of what instruments are serving what purpose? Do I want to structure the rules of say, 4-part harmony?

Eesh. This is a lot to think about. But I gotta keep going, if only because I think Im on the right track.

So.

ComposerData has Instrument and Octave range. Easy enough.

Instrument + Octave + Note gives you SoundData (how the .wav file is represented in my code).

Note is derived from Scale, Scale is updated dynamically based on gameplay events.

Notes are selected via a MusicCompositionStrategy - PedalPoint, create Motif, modify Motif, etc.

A composition strategy takes a ComposerData and returns a InstrumentSchedule?

Yes, because Composer Data has the instrument, the octave range, etc.

ComposerData also has a set of InstrumentSchedules, so Strategies have access to the existing InstrumentSchedules, useful if you want to “evolve” an existing schedule.

So Composer loops through its ComposerData’s, checking if there is enough schedules for next, say, 2 bars. If not, create some bars for that part.

💡

There might be another meta layer on top of this responsible for turning on or off parts, setting up Bridge vs. Chorus vs. Verse, etc.

Actually let’s talk about that meta layer for a minute.

Essentially, there is another process in the Composer that is creating the structure of the song. It’s mapping song sections into the instruction to create/modify ComposerData’s. It will be the thing that says “The Intro will be 8 bars, then a Verse for 8, then a Chorus for 2, then a Bridge for 2, etc.”

Then a looping function creates the into 8 bars, then the verse, etc.

So, I create this structure:

UENUM()
enum ESongSection
{
	None,
	Intro,
	Chorus,
	Bridge,
	VerseSection, // Verse has a conflict with some CoreEngine namespace
};

USTRUCT()
struct FSongSections
{
	GENERATED_BODY()

	TEnumAsByte<ESongSection> SongSection;
	int SectionLengthInBars;

	FSongSections(): SongSection(None), SectionLengthInBars(0) {}
	FSongSections(const ESongSection InSection, const int InLength): SongSection(InSection), SectionLengthInBars(InLength) {}
};

Establishes that song sections exist, and are in a struct that has how many bars for that section.

Now I’m thinking to add a set of valid Music Composition Strategies- Intro has “SetPedalPoint” and maybe “CreateMotif”, where Chorus would contain… other strategies…

I dunno. This all seems kinda sus. Does endlessly generative music have set song sections like this? This feels like a random abstraction

Off the rails and in the Data

Ok, Lets bring it back. The thing we know is important is the NumBars. We also know what bar it currently is in time.

We know that we need some kind of function that creates FComposerData.

We know we need to pass ComposerData to the Strategy.

We don’t know if the ComposerData is valid for the Strategy.

  • If the Scale doesn’t have a Tonic, PedalPoint fails.

So, what if we added a function to the IMusicCompositionStrategy to assert is valid?

It makes sense that Composer would own a set of CompositionStrategies.

  • Create and hold references to all

When its ready, it will choose a Strategy and call the Apply function to get an instrument schedule from the strategy.

As part of that “readiness, choose strategy” logic, it can query each Strategy it has, with the data it current has, and get a subset of valid strategies.

Then just choose at random.

Eventually I want some kind of weighting system that will provide a better solution than at random.

But thats not really a IsValid… that’s more, is appropriate. And I still need some sort of appropriateness algo as setting a pedal point, and then setting a pedal point, and then setting a pedal point… not that hard if just choosing at random. Not appropriate.

So hol up. There may be a strategy here to Create a Bass. Or a lead.

Imagine we have Validity and Appropriateness all figured out. At some point the Composer chooses “Hey I need a bass.”

That’s a strategy.

Not a complete strategy though.

If this is the content of our ComposerData:

	Scale Scale; 

	FInstrumentData InstrumentData;
	int OctaveMin = 1;
	int OctaveMax = 5;

	TArray<FInstrumentSchedule> Schedules;

Then “Give me a bass” essentially returns InstrumentData + OctaveRange.

What is interesting about this is a pedal point for a bass is a lot more appropriate than for say a lead/soprano type instrument/part.

{
 	FInstrumentData = BassInstrument;
	int OctaveMin = 1;
	int OctaveMax = 2;
	
	StrategyWeights = {
		PedalPoint = .8,
		Chords     = .4,
		RespondToCall = 1.2
	}
}

So here we have a mapping of different weights to some arbitrary appropriateness value.

If we loop through our ComposerData’s, we have data to determine appropriateness of strategies for this ComposerData.

We still need something to represent NumBars in here though.

Now InstrumentSchedule itself defines this like:

// Two bars long
[ Bb, Bb, Bb, Bb ],
[ Bb, Ab, F,  Bb ]

But when we’re creating the InstrumentSchedule, we need to pass in the number of bars to create.

{	
	BarsPerStrategy = {
		PedalPoint = 4,
		Chords     = 2,
		RespondToCall = 1
	}
}

So if we choose a pedalpoint, the Strategy knows how many bars of InstrumentSchedule to create.

{
	// What part does this play in the composition? 
 	FInstrumentData = BassInstrument;
	int OctaveMin   = 1;
	int OctaveMax   = 2;
	
	// What strategies are appropriate for this part? 
	StrategyWeights = {
		PedalPoint    = .8,
		Chords        = .4,
		RespondToCall = 1.2
	}
	
	// How many bars per part?
	BarsPerStrategy = {
		PedalPoint    = 4,
		Chords        = 2,
		RespondToCall = 1
	}
	
	// On which beat do the instruments play each schedule? 
	WhenPlaySchedules = {
		PedalPoint = 4, // start on bar 4
		Chords1    = 8, // "replaces" pedalpoint
		Chords2    = 10 // "replaces" Chords1
	}
}

If the composer is on bar 9, it can determine that this Data only has bars to 12, it can determine to create more schedules.

The “create a bass” strategy though would look something like:

 	FInstrumentData = BassInstrument;
	int OctaveMin   = 1;
	int OctaveMax   = 2;
	
	// Default strategy weighting
	StrategyWeights = {
		PedalPoint    = 0.8,
		Chords        = 0.4,
		RespondToCall = 0.0
	}
	
	BarsPerStrategy = {}
	WhenPlaySchedules = {}

The composer can easily determine that this bass needs new schedules, uses the weights and has a 66% chance of selecting PedalPoint.

 	FInstrumentData = BassInstrument;
	int OctaveMin   = 1;
	int OctaveMax   = 2;
	
	// Weights update on pedal point selection
	StrategyWeights = {
		PedalPoint    = 0.2, // Appropriateness reduced
		Chords        = 0.8, // appropriateness increased
		RespondToCall = 0.0  // Special case... needs external schedule
	}
	
	// By selecting PedalPoint, we define NumBars to create
	BarsPerStrategy = {
		PedalPoint = 4 
	}
	
	// On select pedal point, we also define when to start 
	WhenPlaySchedules = {
		PedalPoint = 2 
	}

There is a catch, caught in RespondToCall.

A big part of music is the interaction with the other musicians/parts/instruments within a composition.

In terms of a bass part, I can’t imagine a bass that doesn’t take input from, and provide input to, the drums. As a default case, these two should be locked in, synced up, and grooving in the pocket.

I’m going to break here I think, let this angle cook in my subconscious.

It’s called Drum & Bass for a reason

Ok so specifically for the bass, it generally wants to follow the kick of the drum. It doesn’t have to, it can do whatever it damn pleases, but those raised in the western musical tradition are going to appreciate a drum and bass working together.

Now we can assume a kick drum playing on the 1 and the 3.

And we can assume a kick drums is getting its instructions from the Composer.

Right now, its not, StartDrums Phase selects a predefined drum pattern at random.

What’s interesting here is the fact that there are predefined drum patterns.

The code has knowledge of a Basic 1-3 rock beat. Or samba, or jazz swing. These things are codified as InstrumentSchedules already.

And I’m saying that the bass needs access to a selected InstrumentSchedule.

 	FInstrumentData = BassInstrument;
	int OctaveMin   = 1;
	int OctaveMax   = 2;
	
	AllOtherInstrumentSchedule = {
		DrumKick*,
		DrumSnare*,
		LeadGuitar*
	}

That would be a naïve implementation.

I think what would be better would be RelevantInstrumentSchedules for… relevant… instrument… schedules.

The reasoning being the bassist doesn’t necessarily care about the Snare.

That being said, it might be better to also assign a weighting of how much it cares:

	RelevantInstrumentSchedules = {
		DrumKick = 0.8,
		DrumSnare = 0.2,
		LeadGuitar* = 0.5
	}

Ahhh but here’s the thing. It’s something I had forgotten to flesh out. There is a difference between which note you play and when you play it.

	PercussionInstrumentSchedules = {
		DrumKick = 0.8,
		DrumSnare = 0.2
	}
	
	MelodicInstrumentSchedules = {
		LeadGuitar = 8.0
	}

Now the lead guitar can give the bassist important percussive information. Remember this:

 	FInstrumentData = BassInstrument;
	int OctaveMin   = 1;
	int OctaveMax   = 2;
	
	Strategies = {
		RespondToCall = {
			InstrumentSchedules = {
				LeadGuitar = 1.0, 
			},
			Wheight = 8.0,
			StartAtBar = 4, // In theory the ~beat after the lead guitar makes a "Call" 
			BarsToPlay = 1, 
		},
		PedalPoint = {
			InstrumentSchedules = {} // Not listening for input from anyone 
			Weight = 2.0,
			StartAtBar = 4,
			BarsToPlay = 4
		}
	}

So here I’ve done a couple things, I’ve collapsed WhenPlaySchedules and BarsToPlay sections in the pseudo-data above into a single strategy object.

I’ve also add weighted InstrumentSchedules as a member of that object.

Now, Call and Response might actually run a bit differently. It might make more sense for this kind of strategy to take 2 instruments, Caller = guitar, Responder = Bass, and inject the appropriate instrument schedules.

But I think this makes sense in a general format.

Strategies of Pure Music

There is a slight issue.

InstrumentSchedules are essentially

[ 1.0, { Sound = Db_SynthThing_Octave1, Volume = 1.0 } ],
[ 3.0, { Sound = Bb_SynthThing_Octave1, Volume = 1.0 } ]

Where it’s playing Db, Bb on the 1-3 beats.

This is perfect for instruments, providing all the information they need to play the sounds at the correct time without having to derive any nonsense.

Not great if you want to create a Motif, and have the guitar and bass play the motif at the same time.

For something like “Create Motif”, the perfect data object would be something like

[ Db, Bb, F, Db ] // I, V, III, I

So how do I represent this data and feed it into the ComposerData object?

Well what is interesting about this data is the entire suite of logic it infers exists.

Something needs to create this data, something that knows about music theory in pure music theory terms.

And I already have large sections of that figured out - I have arrays of Notes built into scales of many different nodes.

So creating this is actually fairly simple- hooking it into ComposerData is whats needed.

Scale Scale = [ C, D, E, F, G, A, B ]
Scale[] Motifs = [
//  I  I  V  III
	[ C, C, G, E ], 
	[ C, B, A, G ],
	[ C, F, A, G ] 
]
Chords = [
//       I            I            V           III
	[ [ C, E, G ], [ C, E, G ], [ G, B, D ], [ E, G#, B ] ]
]

So things start off from the scale

From the scale we can have motifs

From motifs we can have chords.

Or have that reversed- it doesn’t matter what comes first, as long as what comes next considers what came before it.

A chord itself is a collection of notes to be played at the same time.

An InstrumentSchedule will flatten those chords out into three sounds at beat 1, beat 2, beat 3, etc.

InstrumentSchedules have no concept of musicality- it has no knowledge of notes.

Nor should it, drums convert into InstrumentSchedules and they have no notes (unless I did some gangster shit and put A = Snare, Ab = Kick, B = HiHatClosed).

Actually now that I think about it that’s essentially what Ableton does for MIDI drums. But not really, thats mapping a midi controller piano roll to a drum roll. There’s no reason to assume that internally it treats kick = some arbitrary note.

 	FInstrumentData = BassInstrument;
	int OctaveMin   = 1;
	int OctaveMax   = 2;
	
	Scale = [ C, D, E, F, G, A, B ]
	
	Strategies = {
		RespondToCall = {
			InstrumentScheduleData = {
				{ 
					Instrument = LeadGuitar,
					AppropriatenessForStrat = 1.0,
				  MusicalStructures = { [ C, C, G, E ] }
			  }
			},
			AppropriatenessToChooseStrat = 8.0,
			StartAtBar = 4,  
			BarsToPlay = 1, 
		},
		DoubleRootOfChord = {
			InstrumentScheduleData = {
				{
					Instrument = ChordPiano 
					AppropriatenessForStrat = 1.0,
					MusicalStructures = { 
						[ 
							[ C, E, G ], 
							[ C, E, G ], 
							[ G, B, D ], 
							[ E, G#, B ] 
						]
				}
			},
			AppropriatenessToChooseStrat = 2.0,
			StartAtBar = 4,
			BarsToPlay = 4
		}
	}

So, we have some sort of MusicalStructure associated with an InstrumentSchedule - coupling the sounds an instrument needs with the musical data required by the composer into a InstrumentScheduleData object. A StrategyData object contains a set of ISD’s along with the data for how this strategy would fit in the structure of the song.

It strikes me that this just defines the data set for choosing what to play next. The above object is missing what has already been chosen to play.

I’m wondering if this is just simply adding the InstrumentSchedules to a set in the root of data object. More I’m wondering if there’s metadata information we’d need for this set of data. Like, would I want to record that InstrumentSchedule [Db, Db, Db, Db] was created from the pedal point strategy?

It strikes me as probably definitely. We don’t necessarily need to store the entirety of a PossibleStrategy:

		ChosenStrategies = [
			{
				InstrumentSchedule = { [ C, C, C, C ] }
				StartAtBar = 4,  
				BarsToPlay = 2,
				Strategy = PedalPoint
			},
			{
				InstrumentSchedule = { [ C, D, G, E ] }
				StartAtBar = 6,  
				BarsToPlay = 4,
				Strategy = PlayMotif
			},
		]
		
		PossibleStrategies = [
			PlayMotif = {
				AppropriatenessToChooseStrat = 8.0,
				InstrumentInput = {
					{ 
						Instrument = LeadGuitar,
						AppropriatenessForStrat = 1.0,
					  ChosenStrategies = { 
					  		InstrumentSchedule = { [ C, D, G, E ] }
								StartAtBar = 6,  
								BarsToPlay = 4,
					   }
				  }
				},
			},
		]

But we would need a separation of PossibleStrategies and ActualStrategies: