Musical Speech

A Transformer-based Composition Tool

[Requires Chromium-based browser. Normal processing time is 15-30sec, wait times may be longer during heavier usage]


Welcome! Our system generates a musical outline of an input speech sound. You can either choose one of the sample tracks or record your own 10-second speech below! The recorded audio is first processed to extract formant frequencies and amplitudes, thus constructing piano notes from speech. These notes are then trimmed based on speech features (e.g. peaks of formant amplitude envelope). This sparsified note sequence is used by the transformer to generate a new musical sequence. You can listen to the audio samples at each stage of the conversion process and even visualize the spectrograms and the MIDI notes! You can also use the Mixing Controls to overlay the original speech sample and generated music clips.
Please note that once you click ''generate outline'', then that segment of audio is temporarily stored on a remote server, even if you do not click ''save''. Similarly, while we will do our best to remove any inappropriate material that is saved by users, we are not responsible for that content.

Listen to more samples produced using this tool. Return to project home page.

Mixing controls
Original Speech Notes from Speech Sparsified notes Transformer-refined notes

Original Speech:

Notes obtained from Speech ?

0:00 0:00
Sparsification Type:?

0:00 0:00
Transformer refined Notes:?

0:00 0:00