Suno AI Revolutionizes Music Generation

Before we begin, readers might want to put on headphones and listen to this piece:

Do you still remember the shock of using ChatGPT for the first time? Today, ChatGPT has integrated into our lives in various ways, becoming a “normal” phenomenon. But if we think about it, it has only been less than two years since OpenAI launched ChatGPT (on November 30, 2022), which is even shorter than the time since the COVID-19 pandemic began. As the initial shock of using it fades, I recently discovered an AI product that has impacted me just as much as ChatGPT. If all goes well, this product will replace many jobs in the music industry that are responsible for functional music. So, how long will it take for us to become accustomed to this new AI?

On March 21, 2024, Suno AI officially released their V3 model as a stable release (the official version following its initial release at the end of 2023). This AI can generate a piece of audio lasting one and a half to two minutes based on just a few lines of description from the user. In advanced mode, users can also input lyrics and choose whether to generate pure music. The piano piece you are listening to is the result of inputting “Classical baroque orchestra” in pure music mode. The orderly alternation of piano and strings, especially the beginning of the second movement near the end, demonstrates its high level of completion. If I choose to extend it, I could expand this piece into a complete ten-minute or even twenty or thirty-minute composition. The fact that it can achieve such results with the most complex branch of classical music, Baroque, makes me eager to try other styles. Is this AI performing so well because of the complex yet structured nature of classical music, or does it have the capability across all styles?

In contrast to the orderly format of classical music, the first style that comes to mind is jazz. Perhaps I can ask the AI to rewrite “Autumn in New York”? I typed on the keyboard: “Jazz song about Autumn in New York, female vocalist, with free improvisation,” and clicked generate:

[Verse] Swaying leaves, a crisp cool breeze Autumn comes to New York with ease A golden hue, the city’s in view Romantic moments, we’re about to pursue [Verse 2] The city streets, colored autumn shades Love’s symphony, the jazz parade We’ll wander hand in hand, exploring the sights Lost in the magic of these autumn nights [Chorus] Autumn serenade, in New York we’ll sway With every note we play, love’s melody will stay Embraced in each other’s arms, we’ll float away In this jazz-infused autumn serenade

As the light piano and double bass slowly play in the background, a gentle, lazy female voice emerges. Although the instrumental parts are not very complex, the atmosphere is somewhat comparable to Ella’s original “Autumn in New York.” However, the lead female voice has a slightly synthetic sound, which feels a bit out of place in this song. After trying a few more versions, I found that as long as it’s not pure instrumental, the vocal parts will always have a synthetic sound.

Returning to the main topic, can Suno AI perform well in jazz as well? To be honest, the song just generated, whether in chord progression or instrumental arrangement, leans toward pop, making it difficult to showcase the unique “irregular complexity” of jazz. I then tried various keyword arrangements but didn’t achieve a satisfactory result similar to John Coltrane’s “Giant Steps.” Subsequently, I attempted to generate a modern representative of jazz—Fusion. However, the soulful electric guitar solo not only failed to use the complexity in scale that masters like Guthrie Govan employ but also had a monotonous sixteenth-note rapid-fire rhythm. Honestly, I was quite disappointed in this aspect but also relieved.

A Fusion piece generated by Suno AI, which I think is rather average

(You could also argue that this isn’t Fusion…)

Perhaps Suno AI is not as proficient in “improvisation” as it is in writing structured classical music (this statement is quite strange since the entire generation process is a large improvisation). But on second thought, this is not the sole purpose of Suno AI. AI music is not a new phenomenon. Two years ago, I wrote an article titled “AI, Music, and Mass Culture” on WeChat, which discussed an AI that continued Schubert’s unfinished Eighth Symphony and a music creation AI named AIVA. Some of the music generated in that article was not inferior to what I’m sharing here today, but the biggest difference lies in the fact that—two years ago, each generated piece was the result of massive research teams working tirelessly, finally able to bring it to market, whereas today, these are generated products that I can obtain by simply typing a few words, no longer the expensive creations. When a product moves from the lab to the market, it means it has found its positioning. For AI music, its positioning is definitely not to replace top jazz musicians or violinists but to drastically reduce the costs of functional music like film scores and game OSTs. So, how does Suno AI perform in these more practical scenarios, away from the difficulties of classical and jazz music?

For independent game developers, being able to obtain high-quality background music at a low cost is incredibly beneficial. Thus, I input the following description: “RPG game, Boss battle OST, cyberpunk background, scene set in Chinatown,” and received the piece titled Neon Thunder:

After clicking on the piece for just a few seconds, I was already swinging through the night city as Spider-Man, wielding a titanium katana to slash at riot robots. Is there any music more fitting than this?!! It’s simply exhilarating! Moreover, I noticed a very interesting characteristic—this piece’s mixing is much better than the previous jazz and classical pieces, perhaps because electronic music is constructed using synthetic sounds, and musicians who typically create electronic music are generally more adept at mixing, so the training set is likely better. According to this reasoning, many music genres that are inherently wild or not afraid of synthetic sounds can provide Suno AI with great opportunities.

So, I tried metalcore and study Lofi music (two styles that seem unlikely to appear together actually side by side, applause 👏👏👏!). First, the drums in this piece, Brutal Regression, gave me an ecstatic experience, and since the metal vocals are often hard to decipher, it felt quite good. Meanwhile, this Lofi Study Session only needs a cover image with cherry blossoms and a pixel anime girl to get +999 comments on NetEase Cloud Music.

The consequences of pressing this button are on you 👆

Lofi study music

In summary, Suno AI has its specific areas of suitability, and these areas can already handle the majority of work that treats music as a production factor. I can’t help but wonder, in five or six years, will listening to music become a process of selecting a few favorite tags on Spotify and waiting for the site to generate a brand new three-hour playlist? Of course, some may argue: what we love to listen to is music infused with human emotions. I certainly agree with that; I hope each album is inspired by the touching story behind it, not just set.seed=42. I also hope that the musicians we can appreciate are mostly human. However, performance-based musicians are ultimately just one part of the entire pyramid. As the demand for functional music gradually approaches zero, can this large pyramid still stand? Truly, for whom does the bell toll?

About half a year ago, I watched an anime called “Vivy: Fluorite Eye’s Song,” which tells the story of an AI diva who, in an effort to save humanity from destruction by artificial intelligence, sends her consciousness back a hundred years to correct history. The entire story revolves around the protagonist Vivy’s exploration of “how to infuse love into songs” (a reward set by her creator), thus questioning herself and her environment to drive the plot. In my view, this is humanity delegating its dilemmas (the definition of love) to AI, seeking answers in numerical estimations (Vivy continuously attempts over a hundred years). Today’s Suno AI is quite similar to Vivy, creating music by learning from songs that have already been infused with love. When we use Suno AI, we are, in fact, shirking our own responsibilities and abilities, much like Vivy’s creators.

Vivy: Fluorite Eye’s Song

Therefore, regardless of the field of AI, what we should do is not lament how much human work will be replaced by AI, but rather think about the correct collaboration methods. It’s not 100% AI-created, nor is it all human effort. Nowadays, the ChatGPT API is rented out to various platforms, forming a plethora of new businesses, products, and various remarkable tools, which are excellent examples. Because of this, not only should we mourn, but we should also celebrate together, right?

(Image source: Internet)

Related Articles:

AI, Music, and Mass Culture

Observations and Research 01—A Brief History of Jazz Development

Leave a Comment Cancel reply