Austrian Composer Peter Ablinger has transformed a child speaking so that it can be played as MIDI events on a mechanically-controlled piano, making the piano a kind of speech speaker. Via Matrixsynth, the readers at Hack a Day get fairly involved with how this may be working.

It seems not quite accurate to describe this as vocoding in the strictest sense, so much as a simple transformation to a (much) lower frequency resolution – that is, the 88 keys of the piano. Ablinger, for his part, describes the events as “pixels.” It’s pretty extraordinary that without a bandpass filter, you get something approximating the noisy sibilance of the speech, but this seems to be the result of having lots of events (that is, lots of resolution in terms of time). Edit: Listening again, the short answer to how you can hear so much of the voice through the piano seems to be, you can’t; the original is almost certainly mixed in. It’s nonetheless an interesting effect, and I’d like to hear the piano on its own. In other words, the basic process is, 1) convert the sound spectrum of the recorded voice to a series of MIDI events, and 2) play back the translated MIDI file. You can see that the MIDI playback is accomplished with Pd (Pure Data) running on a Windows Linux/KDE netbook, though it’s not clear what was used to do the original conversion. (The screen shot with side-by-side audio and MIDI appears as though it may be for demonstration purposes, only.)

Correction: The work is absolutely done in custom software developed by the composer in Pd (Pure Data). It’s an ideal tool for the job, and free and open source. I wouldn’t dare try to replicate the results here, but this is fantastic inspiration for playing with sound in Pd.

One Windows tool that’s capable of the job is TS Audiotomidi, as observed by Hack a Day spacecoyote. Whether or not that’s what’s at work here – and it may well be – that utility is itself interesting. Edit: Yeah, far more likely the whole thing was done in Pd. And Pd should be up to the task.

TS-AudioToMIDI

Of course, this is to say nothing of the lovely work done on the mechanical piano. It’s a beautiful piece. Here’s hoping some government bureaucrats got the message of the declaration. Now, we just need a chorus of something really loud – say a thousand trumpets – shouting out the Universal Declaration of Human Rights.

audiotomidi

76 responses to “The Speaking Piano, and Transforming Audio to MIDI”

  1. Dub says:

    Gorgeous! Just gorgious.

  2. I would have never guessed that a piano's frequency specificity would be sufficient for this kind of thing!

    It would be cool if he wrote a piece that transitioned gradually from something more recognizably "music" to "speech".

  3. I also wonder if he deliberately chose a kid with sibilant heavy speech.

  4. vcd says:

    Interesting piece, but I'm not so sure you would be able to tell it was based on speech if both the original audio was no played on top of it, or if the transcription was not being shown to read in time. The work

    done with the piano is pretty stunning though.

    Leave it to Peter to turn something borderline sensational into something completely sensationalized (re: last paragraph).

  5. shamburglar says:

    anybody know of any decent audio to midi apps for Mac?

  6. Adrian Anders says:

    TS-AudioToMIDI dev should invest the time to make a VST plug-in version of his software. I would be interested in it then.

  7. Dano says:

    @shamburglar

    Similar thing for Mac:
    http://widisoft.com/english/mp3-midi-products.htm

  8. Jay Smith says:

    I use WIDI for mac. I made a "player piano" video a while back for the ohm64 using it and show how it is done here http://www.youtube.com/watch?v=KkKESe_QdKE

  9. nick kent says:

    Well he's encoding in a seemingly similar process vocoder but decoding in a non-traditional way.

    If you think he's mixing the original with the piano then he's definitely cheating. It doesn't look realtime to me but perhaps that might be a reason why it would be somewhat excusable to hear the original

    If it is not cheating I think is very impressive. A traditional vocoder adjusts each frequency band's volume continuously. A piano just has a velocity, short attack and long decay that can be dampened, so besides the rich harmonic pitched sound vs bandpass decoding.

  10. KULTURTECHNO says:

    Quadraturen…

    Als Nachtrag zum Palinsong hier ein just veröffentlichtes YouTube-Filmchen über Peter Ablingers Quadraturen, bei denen ein mechanisch gespieltes Klavier Sprache imitiert.

    (via)

    ……

  11. kobe says:

    one word: Melodyne.

  12. […] Wie genau das funktioniert und warum Peter Ablinger erklärt der Film ausreichend. (Direktlink, via Create Digital Music) […]

  13. Ivica Bukvic says:

    Looks to me that the desktop was actually running Linux/KDE with Pd, rather than Windows. Also, the person at the computer looks awful lot like Winfried Ritsch from IEM where they do a lot of work with Pd. So, it seems unlikely that the Windows app in question is being used here and more likely that the whole thing is done in in Pd…

  14. Peter Kirn says:

    @Ivica: You're completely right. That is indeed very clearly KDE. And there's a big honking "X" in the other window. 😉 So, yes, I agree, and I should get back to learning more Pd signal processing kung fu.

  15. Stij says:

    Wow. I've often wondered if something like this was possible, but I've never had any idea of how to implement it. If this is legit then it's very impressive.

  16. jens-oliver says:

    Here's a piano only video http://vimeo.com/1483630. Not the same text and with additional notes. Very amazing.

  17. Stij says:

    Hmm…yeah, it isn't nearly as intelligible without the original voice mixed in, but you can still hear some of the sibilants.

    It also sounds extremely creepy!

  18. GMM says:

    Wow this is amazing. And it is only a piano. Imagine when you have a whole orchestra scored and conducted to reproduce speech, and then further on, a whole orchestra running in realtime as a vocoder!

  19. Peter Kirn says:

    Here we go – here's the full explanation of how the whole thing works, including a blurry image of the Pd patch.

    http://ablinger.mur.at/docu11.html

    I must say, I love the idea of pixelation – this is something that, as a general approach, could be attached to a wide variety of work.

    Oh, and I actually prefer the more abstract rendition minus the overlaid speech. Who needs intelligibility? It's gorgeous.

  20. Fishboy says:

    Why are so many commenters focusing on the sibilants? What makes them more interesting than other phonemes/classes of phonemes?

  21. Peter Kirn says:

    I'm not a linguist, but sibilants are essential to understandability, and they're the thing that would theoretically be hardest to hear on a piano which is least able to produce broad-band noise (versus formants/vowels). If you listen to the piano without the voice, in fact, it's what seems to be largely missing.

  22. Fishboy says:

    So are you saying you hear vowels in the video without the actual voice layered in? http://vimeo.com/1483630 I couldn't hear a voice in that one, myself, at least not well enough to make out any words or phonemes – vowels, sibilants, or otherwise. I guess to my ear it sounded vaguely vocal. But anyway, I thought the most interesting would be vowels, especially diphthongs, since the language used is English.

  23. John says:

    I'm not clear, how is this concept of "pixels" really any different than that of wavelets?

    As an aside, I'm not wholly convinced that they *are* mixing in the original audio on the feature video. Upon hearing the kid's plain voice, his formant seems different than what is coming from the piano audio. Is there anything other than subjective listening which would indicate that they are mixing in the original audio? The Vimeo clip Fishboy links to is difficult to compare, simply because of the vastly different acoustics, different piano AFAIK, and it doesn't seem to have the dampening that the one above does.

    Interesting work regardless of this point.

  24. Dub says:

    Also covered by MeFi

  25. […] by Martin Poulter on 8 October 2009 An amazing hardware hacking project: a mechanical piano, computer-controlled, becomes a speech […]

  26. […] kraftfuttermischwerk & createdigitalmusic] Tweet This!Share this on FacebookPost this to MySpaceShare this on del.icio.usDigg this!Share […]

  27. […] # Create Digital Music » The Speaking Piano, and Transforming Audio to MIDI […]

  28. […] Read | Permalink | Email this | Comments Go to Source […]

  29. […] Read | Permalink | Email this | Comments Loading… @import url("http://www.google.com/uds/css/gsearch.css"); window._uds_vbw_donotrepair = true; @import url("http://www.google.com/uds/solutions/videobar/gsvideobar.css"); .playerInnerBox_gsvb .player_gsvb { width : 320px; height : 260px; } function LoadVideoBar() { var videoBar; var options = { largeResultSet : !true, horizontal : true, autoExecuteList : { cycleTime : GSvideoBar.CYCLE_TIME_MEDIUM, cycleMode : GSvideoBar.CYCLE_MODE_LINEAR, executeList : ["ytchannel:theworacle","ytchannel:luckymauro","ytchannel:mttdx"] } } videoBar = new GSvideoBar(document.getElementById("videoBar-bar"), GSvideoBar.PLAYER_ROOT_FLOATING, options); } // arrange for this function to be called during body.onload // event processing GSearch.setOnLoadCallback(LoadVideoBar); Filed under: Engadget No Comments Comments (0) Trackbacks (0) ( subscribe to comments on this post ) […]

  30. piker says:

    so what. he got a computer. good for him.

  31. […] Read | Permalink | Email this | Comments Tagged with: art    austria    blackmothsuperrainbow    engadget    internet    midi    music    peter ablinger    peterablinger    pure data    voice […]

  32. […] appeared on Engadget on Fri, 09 Oct 2009 10:07:00 EST. Please see our terms for use of feeds.Read | Permalink | Email […]

  33. […] appeared on Engadget on Fri, 09 Oct 2009 10:07:00 EST. Please see our terms for use of feeds.Read | Permalink | Email this | Comments Comments [0]Digg […]

  34. […] appeared on Engadget on Fri, 09 Oct 2009 10:07:00 EST. Please see our terms for use of feeds.Read | Permalink | Email this | Comments Related ArticlesBookmarksTags […]

  35. […] appeared on Engadget on Fri, 09 Oct 2009 10:07:00 EST. Please see our terms for use of feeds.Read | Permalink | Email […]

  36. […] Read | Permalink | Email this | Comments Tags : Art , austria , Black Moth Super Rainbow , BlackMothSuperRainbow , European Environmental Criminal Court , EuropeanEnvironmentalCriminalCourt , hack , midi , mod , Peter Ablinger , PeterAblinger , Piano , pure data , PureData , speech , vocoder , voice No comments for this entry yet… […]

  37. […] appeared on Engadget on Fri, 09 Oct 2009 10:07:00 EST. Please see our terms for use of feeds.Read | Permalink | Email this | Comments Categories: Black Moth Super […]

  38. […] appeared on Engadget on Fri, 09 Oct 2009 10:07:00 EST. Please see our terms for use of feeds.Read | Permalink | Email […]

  39. […] Read | Permalink | Email this | Comments Tagged with: 9th-2009    austria    hack    midi    mod    networking    neutral    peter ablinger    peterablinger    piano    Politics    Sport […]

  40. […] appeared on Engadget on Fri, 09 Oct 2009 10:07:00 EST. Please see our terms for use of feeds.Read | Permalink | Email this | Comments Posted on October 9, 2009 at […]

  41. […] appeared on Engadget on Fri, 09 Oct 2009 10:07:00 EST. Please see our terms for use of feeds.Read | Permalink | Email […]

  42. […] Read | Permalink | Email this | Comments Bookmark This Post Comments (0) […]

  43. […] Read | Permalink | Email this | Comments Related Reading: Moon In My Room Batman Begins [UMD for PSP] iTouchless Stainless-Steel Hands-Free 13-Gallon Infrared Automatic Trash Can Get Smart (Single-Disc Widescreen Edition) Travel Accessories Samsonite UK Grounded Uncle Milton Rainbow In My Room Share this on del.icio.usStumble upon something good? Share it on StumbleUponTweet This!Share this on FacebookPost this to MySpaceShare this on TechnoratiAdd this to Google Bookmarks Permalink|Comments RSS Feed – Trackbacks are closed|post a comment. […]

  44. […] Read | Permalink | Email this | Comments Rate this topic: (No Ratings Yet) Popularity: 0 You can follow any responses to this entry through the RSS 2.0 feed. You can skip to the end and leave a response. Pinging is currently not allowed. […]

  45. […] ich euch an dieser Stelle nicht vorenthalten möchte. Nachdem ich gestern schon über ein sprechendes Klavier gestolpert bin, ist dieses Treppen-Piano ja mal der […]

  46. Jhhl says:

    Speaking orchestra? http://www.heraldscotland.com/speakings-a-new-mus
    Harvey has done excellent work for decades.

    To blow my own horn: my Amiga program RGS is a real time spectrogram paint program (from 1987 originally), which could send out spectra as MIDI information, therefore being able to make my (microtonally tuned) DX7 to emit intelligible and unintelligible speech. http://www.echo.net/~jhhl/Mp3/RGS/

  47. Pianoman says:

    Intelliscore is another program that converts audio to MIDI. It works with the latest versions of Windows, including Vista and Windows7. The website is: http://www.intelliscore.net/

  48. […] was really excited by Peter Ablinger’s Speaking Piano–a system that takes human speech and translates it to a sequence of notes to be played on a piano […]

  49. Sylvaiw says:

    Why do you say the original voice is mixed with the piano ? Where did you get this information ? I can't find it.

    In my opinion only the piano is heard. and that's the whole interest of this thing.

  50. telfer cronos says:

    i'm sure you are right, sylvia.

  51. “Listening again, the short answer to how you can hear so much of the voice through the piano seems to be, you can’t; the original is almost certainly mixed in”.
    I disagree. Surely the point of the exercise is that the original is not mixed in.

  52. richardmullins says:

    “Listening again, the short answer to how you can hear so much of the voice through the piano seems to be, you can’t; the original is almost certainly mixed in”.
    I disagree. Surely the point of the exercise is that the original is not mixed in.

  53. richardmullins says:

    “Listening again, the short answer to how you can hear so much of the voice through the piano seems to be, you can’t; the original is almost certainly mixed in”.
    I disagree. Surely the point of the exercise is that the original is not mixed in.

  54. Firekraag says:

    Any sound is a sum of pure frequencies in the 20-20k Hz range. Computers do the same sampling job since sound cards were created, but I have to admit using a live piano is kinda sexier.

  55. Firekraag says:

    Any sound is a sum of pure frequencies in the 20-20k Hz range. Computers do the same sampling job since sound cards were created, but I have to admit using a live piano is kinda sexier.

  56. Firekraag says:

    Any sound is a sum of pure frequencies in the 20-20k Hz range. Computers do the same sampling job since sound cards were created, but I have to admit using a live piano is kinda sexier.

  57. Jim Bumgardner says:

    I’ve been trying to reproduce this effect in software using FFT and piano samples. Having a very hard time getting the voice to be nearly as intelligible as it is in these clips. At this point, I’m inclined to agree with Peter that the composer may be mixing in the original audio a bit (I’ve seen similar cheats in many photomosaics). Unfortunately, most of the writings on the composer’s site are about *why* he did it, rather than *how* he did it.

    • dvf says:

      Did you treat the piano samples as though they were sine tones, or did you apply the FFT to both the voice signal and the piano samples and then solve the system of triangular linear equations?

      • krazydad says:

        I did the former. You make a good point, the latter should do a better job of approximating the desired result. Will have to try it. Have you?

        • dvf says:

          I haven’t tried it. Sounds fairly computationally intensive, and I do all my work with interpreted rather than compiled languages. I do know how to do all the steps and I have coded an FFT (in PostScript, lol, but it works). Piano sample sets vary lots from one to another, so I suspect that makes a big difference. I’d go with the fairly percussive samples,

          • Lorenzo Peyrani says:

            It’s absolutely not mixed in. I got extraordinary results without any effort (I was lucky with the midi converter, go to this site: http://www.ofoct.com/audio…/convert-wav-or-mp3-ogg-aac-wma-to-midi.html); if you use a flute sound instead of the piano it gets even clearer. The quality of the result also probably depends on the timbre of the original voice. I used the TS Eliot recording of The Waste Land and you can even recognize Eliot’s particular accent (with just a shit midi playing!).

  58. Jim Bumgardner says:

    I’ve been trying to reproduce this effect in software using FFT and piano samples. Having a very hard time getting the voice to be nearly as intelligible as it is in these clips. At this point, I’m inclined to agree with Peter that the composer may be mixing in the original audio a bit (I’ve seen similar cheats in many photomosaics). Unfortunately, most of the writings on the composer’s site are about *why* he did it, rather than *how* he did it.

    • dvf says:

      Did you treat the piano samples as though they were sine tones, or did you apply the FFT to both the voice signal and the piano samples and then solve the system of triangular linear equations?

      • krazydad says:

        I did the former. You make a good point, the latter should do a better job of approximating the desired result. Will have to try it. Have you?

        • dvf says:

          I haven’t tried it. Sounds fairly computationally intensive, and I do all my work with interpreted rather than compiled languages. I do know how to do all the steps and I have coded an FFT (in PostScript, lol, but it works). Piano sample sets vary lots from one to another, so I suspect that makes a big difference. I’d go with the fairly percussive samples,

          • Lorenzo Peyrani says:

            It’s absolutely not mixed in. I got extraordinary results without any effort (I was lucky with the midi converter, go to this site: http://www.ofoct.com/audio…/convert-wav-or-mp3-ogg-aac-wma-to-midi.html); if you use a flute sound instead of the piano it gets even clearer. The quality of the result also probably depends on the timbre of the original voice. I used the TS Eliot recording of The Waste Land and you can even recognize Eliot’s particular accent (with just a shit midi playing!).

  59. Jim Bumgardner says:

    I’ve been trying to reproduce this effect in software using FFT and piano samples. Having a very hard time getting the voice to be nearly as intelligible as it is in these clips. At this point, I’m inclined to agree with Peter that the composer may be mixing in the original audio a bit (I’ve seen similar cheats in many photomosaics). Unfortunately, most of the writings on the composer’s site are about *why* he did it, rather than *how* he did it.

    • dvf says:

      Did you treat the piano samples as though they were sine tones, or did you apply the FFT to both the voice signal and the piano samples and then solve the system of triangular linear equations?

      • krazydad says:

        I did the former. You make a good point, the latter should do a better job of approximating the desired result. Will have to try it. Have you?

        • dvf says:

          I haven’t tried it. Sounds fairly computationally intensive, and I do all my work with interpreted rather than compiled languages. I do know how to do all the steps and I have coded an FFT (in PostScript, lol, but it works). Piano sample sets vary lots from one to another, so I suspect that makes a big difference. I’d go with the fairly percussive samples,

          • Lorenzo Peyrani says:

            It’s absolutely not mixed in. I got extraordinary results without any effort (I was lucky with the midi converter, go to this site: http://www.ofoct.com/audio…/convert-wav-or-mp3-ogg-aac-wma-to-midi.html); if you use a flute sound instead of the piano it gets even clearer. The quality of the result also probably depends on the timbre of the original voice. I used the TS Eliot recording of The Waste Land and you can even recognize Eliot’s particular accent (with just a shit midi playing!).

Leave a Reply to Mechanical piano hacked to talk, says nothing you’d be interested in « Mini Apple Store Cancel reply

Your email address will not be published. Required fields are marked *