Today, the most commonly used AI-powered music source-separation techniques work by analyzing spectrograms, which are heat map-like visualizations of a song’s different audio frequencies. “They are made by humans for other humans, so they are technically easy to create and visually easy to understand,” says Defossez.
Spectrograms, which can only represent sound waves as a montage of time and frequency, cannot capture such nuances. Consequently, they process a drumbeat or a slapped bass note as several noncontiguous vertical lines rather than as one neat, seamless sound. That is why drum and bass tracks that have been separated via spectrogram often sound muddy and indistinct.
AI-based waveform models avoid these problems because they do not attempt to push a song into a rigid structure of time and frequency. Defossez explains that waveform models work in a similar way to computer vision, the AI research field that aims to enable computers to learn to identify patterns from digital images so they can gain a high-level understanding of the visual world.
Defossez says his system can also be likened to the seismographic tools that detect and record earthquakes. During an earthquake, the base of the seismograph moves but the weight hanging above it does not, which allows a pen attached to that weight to draw a waveform that records the ground’s motion. An AI model can detect several different earthquakes happening at the same time and then infer detail about each one’s seismic magnitude and intensity. Likewise, Defossez’s system analyzes and separates a song as it actually is, rather than chopping it up according to the preconceived structure of a spectrogram.
Lyt eksempler i den fulde artikkel:
https://tech.fb.com/one-track-minds-usi ... eparation/