Building QMidiGen: A Procedural MIDI Composer in Rust

TL;DR: QMidiGen is a procedural MIDI music generator I’ve been building, spiritually descended from a 1999 Windows shareware program called Melody Raiser. It runs natively on Linux, macOS, and Windows, generates complete multi-section songs from musical rules rather than samples, and has an optional MCP server so AI agents can ask it to compose things. The stack is Rust talking to Qt/QML through cxx-qt, which is genuinely interesting and occasionally a nightmare.


The 1999 Inspiration

Melody Raiser was a piece of Windows 9x shareware by a Japanese developer named Yoji Ojima. It did one thing: generate MIDI songs procedurally from a handful of parameters. No samples, no loops, no pre-recorded anything — just math turning into music. You’d set a key, a tempo, pick a genre, and it would produce a completely original piece that you could export or keep generating until something clicked.

I found it because I was deep into RPG Maker 95 and RPG Maker 2000 at the time, which meant I needed music. It was delightful — for a 12-year-old.

It was also last updated around the time the first Matrix movie came out and has not been updated since. The program runs in a compatibility shim on modern Windows, doesn’t run at all on anything else, and the source code is not available. So the options were: keep running a 25-year-old binary through Wine, or build an evolution of the idea that actually runs in the current decade — with a piano roll editor, multi-section song structures, real-time synthesis, native file dialogs, cross-platform packaging, and an MCP server for good measure.

I went with the second option, which has taken considerably longer than I originally estimated. Shocking.


Why Rust

The natural choice for a Qt app in 2026 is C++. Qt is written in C++. QML interoperates with C++ directly. The ecosystem, the documentation, the examples — all C++. Picking Rust and then bridging it back to C++ is the kind of decision you have to be able to explain.

The honest answer is that I write Rust for fun and C++ for penance. The more defensible answer is that the music generation logic is pure computation — no UI state, no event loops, no Qt objects, just data going in and MIDI tracks coming out. That kind of code is exactly where Rust earns its keep: deterministic ownership means the generation functions are straightforward to test in isolation, the type system catches whole categories of bugs that C++ would happily let through, and cargo test works without standing up a Qt application. The generator has hundreds of rules and edge cases; being able to run them headlessly matters.

The tradeoff is that anything touching Qt still has to cross the C++ boundary, which brings us to cxx-qt and most of the technical headaches in this project.


Why MIDI

MIDI is a 1983 file format. It is 43 years old. It has survived every “this will replace it” moment in digital music production and is more widely supported today than it’s ever been.

The reason is that MIDI is not audio — it’s instructions. A MIDI file doesn’t record sound; it records “at this timestamp, play this note on this instrument at this velocity.” The audio you hear depends entirely on what interprets those instructions. Change the soundfont, change the instrument assignments, change the synthesis engine, and the same MIDI file sounds completely different. You can take a MIDI composition from 1987, load it into a modern synthesizer, and it will sound the way the composer intended it to sound — except better, because the synthesis has improved in the intervening 37 years.

For a procedural generator, this is exactly right. The generator produces structure — chord progressions, melody lines, rhythm patterns, harmonic relationships between parts. It doesn’t care what those things sound like. That concern belongs to FluidSynth and whichever soundfont you’re using, and those are completely swappable at runtime.

The midly crate handles all the MIDI serialization. It’s well-designed and the API maps cleanly onto what the generator produces. The main complication is that MIDI tracks store delta times (ticks since the previous event) rather than absolute timestamps, so export involves sorting all note-on and note-off events by absolute tick and then walking the list to convert. Straightforward, but easy to get wrong if you’re not thinking about it.

pub fn export_midi(song: &Song, path: &Path, meta: &ExportMeta<'_>) -> Result<()> {
    let mut smf_tracks: Vec<Track<'static>> = Vec::new();

    let first_stanza = song.stanzas.first();
    let tempo_bpm = first_stanza.map(|s| s.tempo).unwrap_or(120.0);
    let ts = first_stanza.map(|s| s.time_signature).unwrap_or(TimeSignature::FOUR_FOUR);
    smf_tracks.push(build_tempo_track(tempo_bpm, ts, meta));

    // Flatten stanzas sequentially; one track per instrument slot across all sections.
    let max_tracks = song.stanzas.iter().map(|s| s.tracks.len()).max().unwrap_or(0);
    for track_idx in 0..max_tracks {
        smf_tracks.push(build_instrument_track(song, track_idx));
    }
    write_smf(smf_tracks, path)
}

The TICKS_PER_BEAT constant is 480. Higher values give finer timing resolution; 480 is the standard sweet spot that most DAWs expect and that gives enough precision for the generator’s rhythm cells without inflating file sizes.


Procedural Generation

The generator works from the outside in: song structure first, then harmonic skeleton, then individual voices.

Every generation run starts by picking a key, a tempo from a per-style range, and a set of section types from the preset. A “section” here is what the code calls a stanza — a distinct musical segment with its own tempo, time signature, instrumentation, and rules. An Overworld song might have an intro, two main sections, a bridge, and a return. A Boss Fight has different instrumentation, a denser rhythm grid, and a semitone lift going into the final section.

The harmonic skeleton is a chord progression generated from a scale palette appropriate to the style. The scale isn’t just major/minor — the JRPG subtypes draw from things like PhrygianDominant ([0,1,4,5,7,8,10]) for dungeon-appropriate tension, MelodicMinor for emotional sections, and Blues where the vibe calls for it.

Once the chord progression exists, it drives everything else. The melody generates note choices from the chord tones plus scale passing notes, subject to a rhythmic “cell” that defines the accent pattern for that section. The bass line derives from the same root motion. The percussion pattern is templated to the style but has per-bar density variation built in, including occasional drop bars where the drums step aside. Counter-melody voices fill in between the primary melody’s phrase gaps.

The result is that everything in a section shares a structural skeleton — the tracks don’t generate independently and then get stacked, which is how you get music that feels like several programs running at the same time. Whether it always sounds good is a different conversation, but it sounds intentional.

# A slice of what a JRPG Combat preset looks like in YAML
name: JRPG
styles:
  - id: combat
    label: Combat
    tempo_range: [148, 180]
    scale: PhrygianDominant
    time_signature: "4/4"
    sections: [intro, a, b, a, climax]
    instruments:
      melody: TrumpetSection
      harmony: StringEnsemble
      bass: ElectricBassPick
      percussion: StandardKit

One implementation detail worth mentioning: the primary melody instrument is locked once per song and held consistent across sections. This sounds obvious in retrospect, but the first version picked independently per section and the result was disorienting in a way that took a while to diagnose. Music has continuity expectations that aren’t explicitly written in any rule.


cxx-qt: The Bridge That Works Until It Doesn’t

cxx-qt is a KDAB project that generates C++/QML bindings from Rust. The basic idea: you annotate a Rust struct with #[cxx_qt::bridge], declare which fields get exposed to QML as properties, and the build system generates the C++ glue code that makes the struct a proper QObject that QML can data-bind against.

In theory this means the music generation logic stays in pure Rust and QML binds directly to the results. In practice this works well for the generation side — the AppController QObject exposes the song data and generation controls, QML observes property changes and updates the UI, and the generator never touches Qt types.

The friction appears at the edges. The cxx-qt bridge has opinions about what types can cross the boundary — QString, QList<T>, the cxx-qt-lib types. If you need to pass a nested data structure from Rust to QML you have options: JSON-serialize it and parse on the QML side (genuinely what I ended up doing for the piano roll note data), define a QObject wrapper, or fight the type system until one of you gives up. The piano roll has thousands of notes that need to be inspectable and editable from QML. Serializing to JSON and deserializing on access was the path of least resistance and has been fine in practice.

The build script (build.rs) is where things get interesting. cxx-qt-build compiles the Qt C++ and generates the QML module registration, but it invokes cmake and ninja under the hood, which means the Rust build invokes a C++ build system that invokes a Rust build. Getting the dependency tracking right — so that any relevant change triggers a rebuild — required explicit rerun-if-changed declarations for every QML file, every C++ source, and the Rust bridge file:

// build.rs: explicit rerun-if-changed for QML, C++, and the Rust bridge
println!("cargo:rerun-if-changed=src/bridge/app_controller.rs");
println!("cargo:rerun-if-changed=src/file_dialog.cpp");
println!("cargo:rerun-if-changed=qml/components/PianoRoll.qml");
println!("cargo:rerun-if-changed=qml/components/TransportBar.qml");
// ... etc

Running qmllint as part of the build step turned out to be worth the setup cost. QML syntax errors produce cryptic failures from the C++ compilation step unless you catch them first with the linter, and by the time the error surfaces through cxx-qt-build’s output you’ve lost the file and line number. Putting qmllint first in build.rs means you get a clean error immediately.


The C++ That Couldn’t Be Avoided

Three pieces of the project are C++ by necessity.

File dialogs. Nobody wants a Qt-themed file picker on a KDE desktop. The native dialog backends are platform-specific C APIs: kdialog on KDE, xdg-desktop-portal elsewhere on Linux, osascript on macOS, GetOpenFileNameW / IFileOpenDialog on Windows. These are all C or C++ APIs. The file dialog code lives in src/file_dialog.cpp with a C-compatible header, and Rust calls it via extern "C":

extern "C" {
    fn kdialog_pick_open(title: *const i8, start_dir: *const i8, filter: *const i8) -> *mut i8;
    fn kdialog_free(s: *mut i8);
}

App icon. Setting the application icon requires a Qt call that has to happen before QGuiApplication is fully initialized. That’s a qmidigen_set_app_icon() C++ function called from main.rs.

QML engine guard. The QQmlApplicationEngine::load() call prints errors to stderr but does not panic or return an error code if QML fails to load. You get a blank window and no indication of what went wrong. qml_guard.cpp provides a C function to check whether root objects were created, which lets main.rs detect the failure and exit with a useful message rather than silently showing nothing.

The plasma-integration race condition deserves its own paragraph. KDE’s plasma-integration plugin, loaded when Qt detects a Plasma session, calls QQuickStyle::setStyle("org.kde.breeze") during application construction. That call initializes an internal static singleton to "org.kde.breeze" and locks it — any subsequent setStyle() call is silently ignored. If your app needs the Basic style (which QMidiGen does, because its QML is built for Basic and Breeze touches things it shouldn’t), you have to win the race by calling setStyle("Basic") before QGuiApplication::new(). Miss the race and you get white borders, null crashes under Kvantum, and a debugging session that starts with “why does this only happen on KDE.”

// Must be called before QGuiApplication::new() — wins the plasma-integration
// static-init race before it can lock the style to org.kde.breeze.
unsafe { qmidigen_reset_quick_style(); }

MCP: The Optional Part

The qmidigen-mcp crate is a standalone binary with zero Qt dependency. It implements the Model Context Protocol — the open standard for giving AI agents structured access to tools — over stdin/stdout JSON-RPC. Wire it into any MCP-compatible client and the agent can ask the generator to compose music.

The key design decision was keeping it completely separate from the GUI application. The MCP server doesn’t need Qt and shouldn’t need Qt. It directly includes the generation source:

// mcp/src/main.rs — directly path-includes the shared Rust modules
#[path = "../../src/music/mod.rs"] mod music;
#[path = "../../src/midi/mod.rs"]  mod midi;
#[path = "../../src/audio/mod.rs"] mod audio;

This is unorthodox and somewhat frowned upon in polite cargo society, but the alternative — pulling the generation logic into a separate crate that both the GUI and MCP server depend on — would require restructuring the whole project to accommodate a feature that most users will never use. The current approach keeps the GUI crate simple and builds the MCP server in isolation when needed.

The MCP server exposes tools for listing presets, generating full songs, generating individual sections, playing audio via FluidSynth, and exporting to audio formats. An agent can ask for a dungeon theme in F# minor, get back a MIDI file path, and either play it or export it to MP3. The generation is seeded, so specifying a seed gets you the same piece every time — useful for iterating on something you like without having to describe it all over again.

This is entirely optional. The main application doesn’t know the MCP server exists. You build it separately (ansible-playbook playbooks/qmidigen.yml -t mcp), configure it in your MCP client, and it’s there. Skip it entirely and nothing changes.


The Parts That Were Unexpectedly Painful

AppImage packaging. An AppImage needs to bundle everything it uses. Qt’s Wayland stack includes a GPU buffer sharing layer (wayland-graphics-integration-client plugins) that Qt needs to use GPU rendering on Wayland. Without them, Qt prints “Available client buffer integrations: QList()” and crashes with “Failed to create RHI” — a crash message that gives you no indication that the missing thing is a Wayland plugin you’ve never heard of. The fix is bundling those plugins explicitly; finding out that’s the fix required reading Qt source.

Similarly, AppImages built against a newer Qt will pull in QML plugins that link against the bundled Qt version. If you forget to bundle those plugins’ libQt6*.so dependencies and the user has an older system Qt in their path, the plugins find the system library at runtime and crash on the first symbol mismatch. The fix is scanning every QML plugin .so with ldd at package time and bundling anything it links that isn’t already present.

Windows emoji rendering. Qt on Windows under the Basic style cannot reliably render SMP Unicode characters (anything above U+FFFF) in Text elements. The mute icon, the track lock icon — these used emoji from the U+1F5xx range. On Windows they rendered as boxes. The fix was replacing them with BMP-range Unicode symbols (⊗, ♩, ♪, ■, □) that Qt’s text rendering can actually handle. It works, but discovering that Qt silently ignores SMP characters instead of falling back gracefully required more time staring at a mostly-blank Windows VM than I’d like.

Wayland vs. X11 menu grabs. Qt defaults to the xcb (X11) platform plugin even inside a Wayland session, which runs the app through XWayland. On XWayland, X11 menu grabs are often silently rejected by KWin — popups render but clicks don’t register. The fix is forcing QT_QPA_PLATFORM=wayland when a Wayland session is detected. The detection requires checking three different environment variables because there’s no single authoritative indicator.


Current State

The 0.1.1 release is out, with AppImage for Linux, DMGs for both Apple Silicon and Intel macOS, and an NSIS installer for Windows. The JRPG preset is where most of the generation work has gone — 13 subtypes, each with distinct instrumentation, chord palettes, and section structures. The other genres (Pop, Jazz, Blues, Classical, Funk, Ambient) exist and work but are less opinionated.

The piano roll editor does what a piano roll editor should do: add, delete, move, resize notes; rubber-band selection; bulk operations; playhead scrubbing. It’s not competing with a DAW. It’s for making targeted edits after generation, not composing from scratch.

AUR packages are available (qmidigen stable, qmidigen-git VCS), and a FreeBSD port (audio/qmidigen) is in the tree.

It’s been a more interesting project than most. The domain — procedural music with real-time synthesis — turns out to have a lot of surface area. And the stack, whatever its rough edges, is more comfortable to work in than a pure C++ Qt app would have been. Melody Raiser earned its place in the story. This is just where it goes next.