Data-Centric AI for Signal Processing Applications
For some applications like autonomous driving, speech recognition, or machine translation, the adoption of AI can count on large datasets and abundant research. In those domains, investments most often focus on improving system performance through the design of ever more complex machine learning and deep learning models. On the other hand, in most industrial signal processing applications, data tend to be scarce and noisy, tailored models very rare, and traditional AI expertise hard to find.
This talk focuses on how data-centric workflows driven by domain-specific expertise can be used to significantly improve model performance and enable the adoption of AI in real-world applications. Learn more about signal data and specific recipes related to improving data and label quality, reducing variance and dimensionality, and selecting optimized feature-space representations and signal transformations. Explore popular simulation-based methods for data synthesis and augmentation and present the latest options for selecting suitable AI models to use as starting point.
Published: 25 May 2022
[MUSIC PLAYING]
Hello, and welcome to this MATLAB Expo talk on the use of MATLAB for data-centric artificial intelligence, specifically for signal processing applications. My name is Gabriele Bunkheila. I work in product management at MathWorks. And for this session, I'm joined by my colleague Frantz Bouchereau.
Hi. I am Frantz Bouchereau. I'm a development manager. And I lead a team that builds some of our signal processing toolboxes.
Before we get started, a couple of reminders. First, you'll find these slides as a downloadable PDF in this BigMarker session. Second, if you feel like posting about this session on social media, it would be great if you could use the #MATLABEXBO tag. In terms of the two of us, you can find us on LinkedIn using the links here at the bottom.
But back to our topic-- Gabriele, when was the first time you heard about data-centric AI?
Well, the first time I thought about something along the lines of data-centric AI was around four or five years ago, when I came across this light from a talk by Andrej Karpathy, the head of AI at Tesla. He was basically pointing out that while a lot of research publications focused almost exclusively on deep learning models and the latest network architectures, in industry, the vast majority of engineering efforts actually focused on the data, the data necessary to train those models.
What about you, Frantz? Was it around the same time that you, too, heard about data-centric AI?
Yeah. Actually, this is an idea we have been working on since we started exploring the use of AI for signal processing in my team. Now it has become clearer as Andrew Ng has been trying to formalize it. Many of you in the audience will know his work. But if not, I recommend that you go to this interview in the IEEE Spectrum as a starter.
Referring specifically to deep learning, he envisions an engineering movement where the key driver to using and applying AI is to focus on the data used to train the AI model. This is an alternative to the more traditional model-centric AI, which focuses on development of application-specific models, sometimes quite complex models, usually developed by deep learning experts.
Very interestingly, one of his key conclusions is that the focus on data allows you to work on a much smaller AI problem from many points of view. You'll be able to see the reasons for this over the next 20 minutes or so. So let's get started.
If you look at the evolution of AI, and more specifically the evolution of deep learning, you'll see that few applications have accumulated a large amount of research based on very large and complicated deep learning models. Those large deep learning models need large amounts of data to be trained. So it shouldn't come as a surprise that in the majority of this research, it came from very large web-centric and customer-facing companies that were able to collect these large amounts of data. This created a virtual circle for a fast evolution of deep learning with competency and workforce converging towards those big players.
Right. But if you look at the rest of the industries, there are a myriad of engineering applications out there that would benefit from using deep learning and AI. Traditional model-centric AI simply cannot be the answer for all, simply because it doesn't scale. How many companies have the capacity to hire entire new groups from scratch, collect huge data sets, manage a whole new level of computational complexity?
The good news is that isn't necessary. For example, one of my favorite takeaways from data-centric AI is that you don't need much deep learning or machine learning expertise. The engineering expertise in the driving seat is in fact the one from the specific application, what you could call domain expertise.
Before seeing how, we're curious to know which of these three best describes your challenges in adopting AI today. Is it the complexity of the models, the deep learning networks-- understanding them, designing them, running them-- or is it the complexity or the size of the data needed to train them-- collecting it, annotating it? Or else, is it about hiring new experts or developing new expertise? To let us know, please use the interactive survey that should appear in the screen about now.
And while you give it a go, I'll just say that we'll look at the results together at the end of the session before starting Q&A. For now, those of you who are done responding could perhaps also take a few seconds to drop a quick line this time in the chat and describe your applications to us presenters and to the rest of your fellow attendees. For what application or task are you using or hoping to use AI these days?
Now, really, the three topics on the screen will also be the structure of the rest of this talk, correct?
Yes, kind of. The caveat is that for signal data, most of the time, we also preprocess it before feeding it to the deep learning networks. It's common to call this step feature extraction. And we'll talk about that, too. But, mind you, we won't be exhaustive. We'll only discuss three concrete approaches to put data-centric AI into practice for signal processing applications.
For models specifically, we'll focus on transfer learning. This is about repurposing for your own applications deep networks developed by others. This implies not having to design or train any network from scratch.
We'll then discuss the use of feature extraction for achieving complex tasks while using relatively small deep learning models. And, finally, we'll provide a few pointers on improving the actual training data, always with the objective of producing better AI models.
Let's start with transfer learning, using a concrete working example. Assume you're helping a firm that uses or produces air compressors. They want to identify faulty units based on the sound they produce. And they have some audio recordings that they labeled themselves associated with the known set of different full conditions. The data is divided into eight classes in total, one class for sounds from healthy machines and seven more classes for sounds produced by machines with several different types of faults.
We're going to see how to apply transfer learning to this problem, and you can review all the steps and the code offline at your pace using the link at the bottom of the slide.
The first step would transfer learning is always to choose pretrained models to reuse. A great place to get started for that these days-- here's the new MATLAB Model Hub on GitHub. But keep in mind that models developed in other tools like TensorFlow and PyTorch and also be brought into MATLAB in multiple ways.
For this example, I will directly use the Deep Network Designer app because on opening, it shows the list of pretrained deep networks that are ready to use. Here I choose the sound classification network called YAMNet. This was pretrained by Google using tons of audio clips from YouTube videos.
The idea of transfer learning is to reuse all that these layers have learned about patterns and sound recordings, but for a different tasks, here, to recognize faults in air compressors and not to classify sounds across about 520 different types, as per the original design. To do that, I remove only the final layers. I replace them with fresh ones. And I configure those to only classify across eight new classes.
This new network now doesn't need to learn how to make sense of sound signals but only to optimize the layers that we just replaced. So the size of our new label data is sufficient. And on a related note, retraining the network only takes a few minutes.
Once retrained, we can test the network on sound sample that it hadn't seen before and review how each new recording is classified against its true-false type. We call this a confusion matrix. And when a model is trained successfully, you want to see most set sample on the diagonal, as in this case.
I hope that my key points came across OK. But remember that for this example, you can review all the code and the step-by-step instructions at your own pace by following the link in the previous slide.
Now, if you wanted to go a bit deeper, you may have noticed already that this BigMarker session includes a few downloadable PDF handouts. Two of those are on transferable learning. The first is about an independent paper published recently on the Journal of Sensor and Actuator Networks. It's open, fully downloadable, and it features a few concrete transfer learning experiments, including the very combination of network and data set that I've just walked you through.
One of the conclusions of the paper is that, when possible, it is best to choose a model pretrained with the same type of data of your own problem. Or you can also do transfer learning with models pretrained with different types of data. And to help you navigate that options, the second handout here includes links to a couple of MATLAB examples that repurpose networks pretrained on images for classifying signals like communication waveforms and ECG recordings.
Gabriele, in your examples and handouts, are models being fed the raw signal data? A lot of pretrained networks out there work on images. So can you say a little bit more about this?
Great questions, Frantz, and right about our second big topic here, which is around feature extraction. We've said a couple of times in passing already that deep networks most often don't learn directly from the raw signals. When we retrained YAMNet a few minutes ago, for example, although we didn't show it, we didn't feed the network the audio waveforms from the air compressor data sets but a transformation of those, looking more like this.
This one here is a specific type of feature extraction, and it belongs to the wider family of the so-called time-frequency transforms. There are plenty of variations of these out there, depending on what characteristics of the signals you want to bring out and what type of signal you want-- you work with.
Having experience with the signal will most likely inform the right choice. With transfer learning, you often use time-frequency transform when repurposing networks that were originally trained with images since these representations of signals are two-dimensional, just like images.
Actually, time-frequency transformations are prevalent in many AI applications for signal processing. So I think it's worth stopping here for a minute to explain what time-frequency analysis is all about.
To transform a signal into the time-frequency domain, you follow these steps. You slice the signal using sliding, overlapping windows. And for each window, you compute a frequency transformation, for example, using the Fast Fourier transform.
You then concatenate all the spectral estimates and end up with a time-frequency map that shows how the signal's frequency content varies over time.
Frantz, is this really just about reformatting signals so we can use networks that were pretrained with images?
Actually, there is more to it than just size formatting. Time-frequency transforms convert signals into a domain where key signal characteristics are more evident. Look, for example, at these three types of signals buried in noise. If I gave you hundreds of realizations of these signals and ask you to classify them by eye you would have a very hard time. However, if we transform the signals into the time-frequency domain, the differences are evident. You would have no issue classifying these signals this time around.
So that is a catch. If you can more easily spot differences compared to the raw waveforms, so will a machine learning model. Let me briefly discuss another concrete example. This was taken from our documentation and explores this time-frequency idea.
Say that you're developing a medical device for heart monitoring that analyzes ECG signals. You need to design a system that recognizes the three most significant segments of an ECG signal, called P, QRS, and T waves, and are shown in this slide. You need a data set of signals, annotated by a cardiologist, that looks like this.
But notice that you only have a few signals. 210 of these signals is not a large data set. You are looking for a deep learning model able to recognize the regions of the signals automatically in the same way that a cardiologist would.
You have read that recurrent neural networks are a type of network that works great with input signals. So you choose one such network to solve your problem. You build a fairly simple model with only a few layers and a low number of weights for three reasons. One, you're not a deep learning expert. So it is hard for you to conceive more complex models.
Second, you do not have enough data. So you can't expect to be able to train a complex model successfully. And third, you want to target a battery-powered device that has limited computational bandwidth.
Since recurrent neural networks work with signal inputs, you feed the raw data to the network. And you notice that you get very poor results, hitting low-percentage accuracy for each one of the ECG waveforms. Then you add a time-frequency transform before the network because, as we showed before, they tend to make the key signal features more evident, and that helps the AI model.
Time-frequency maps can also be seen as time sequences, by the way. In this case, these are Fourier transforms over time. Training the same network with this time-frequency transform produces significantly higher accuracy without having to change the model design.
Right. So are you saying that we use features to improve the performance when we have to work with simple models?
Yes. But you can think about the other way around. The same type of design accuracy could be achieved with a much more complicated deep learning model. But you would require not just more data but more labeled data, which may be difficult to obtain.
You would also need to figure out a more complex deep learning architecture, which is not easy if you are not a deep learning expert. These two points I just made summarize the foundational role of feature extraction and are a key to the data-centric AI idea. A data-centric approach reduces the need for data and for model complexity while allowing you to achieve a desired accuracy for your problem at hand.
Yeah, this makes sense to me. But many would ask, how do you choose the right features to extract?
That is definitely an important question that often haunts engineers that try to train an AI model. Given the virtually infinite choice of features, which one should I choose? And it's true that it depends on many factors. But it doesn't need to look more difficult than it actually is.
For example, if you borrowed a model that was trained with images of a certain size, then you may want to use a time-frequency transform that generates images of that size. Some other times, the choice is driven by the nature of the signal and the applications. For example, audio engineers know that there are certain time-frequency transformations that are great for representing variability in speech and acoustic signals. So they use these algorithms as feature extractors.
In the absence of more specific clues, there are approaches like wavelet scattering that generate features automatically for you, and actually, with the extra advantage of possible dimensionality reductions. We have tried wavelet scattering in many different applications with outstanding results.
Other engineers choose to try many features at once by using so-called bags of features. And then, they use experiments to test which combinations of features turn out to be the most effective. Note that these days, MATLAB provides a few of those bags of features for signal data, as well as apps like Experiment Manager, to systematically experiment different combinations of deep learning models, features, and parameters.
The bottom line is that domain experts know the nature of their signals and are very well placed to select the right feature extraction algorithms that make the most important signal features more evident to the AI models.
OK. Yeah, I really like your summary. So looking again at the handouts, you will find a couple more that are about feature extraction. Again, this is to help you go a bit deeper offline if you're interested.
The first is about a post from the deep learning blog on mathworks.com, run by our Johanna Pingel. And it talks about a recent AI hackathon on geoscience data where a team of MathWorks engineers took home the first prize, basically thanks to feature extraction and signal processing. Their submission stood out as unique. They outperformed runner-ups in some of the most important metrics, and they only used a relatively simple deep network.
The second handout on feature extraction is a MathWorks user story from Daihatsu in Japan. Here, a team of automotive engineers was able to use deep learning while also getting up to speed with it. The task was the detection of engine knocking sounds. And feature extraction enabled to achieve the required accuracy while also keeping model complexity very much under control.
Now, if I review what we've seen so far, with transfer learning, we'd said that despite potentially using large models, we only needed to learn the few weights of the layers that we replaced. We've just seen how using feature extraction effectively helps lower the complexity of the learning models. So both methods clearly point to requiring much less training data, which is a true game changer given how instinctively I'd say we all learn to associate deep learning with big data.
It's always true that an AI model is only as good as the data used to train it. But while it's impossible, practically, to improve the quality of extremely large data sets, with reasonably-sized data sets, domain experts can really make a difference. And by improving the quality of the training data, they can effectively drive up the quality of the AI models.
And so to conclude, I'd like to quickly review a few techniques for domain experts to have an impact on the quality of AI models through improving the quality of the data. When the raw data is available, the first thought should be to produce high-quality ground truth labels, for example, through MATLAB's new labeling apps, notably signalLabeler here.
When you're setting out to acquiring new data, then it may be that your best bet is to connect MATLAB to some data acquisition hardware, build your own app with App Designer, and enable engineers to label while they acquire it. Situations where getting real-world data is too complicated, it's very common to use simulations to synthesize prelabeled signals. And the MATLAB dock itself is rich with examples involving even things like radar and wireless communication scenarios.
And, finally, in some situations, people do have real-world data. But they know that it doesn't represent the whole gamut of variability that their AI models need to learn. And in these cases, they use data augmentation to add that variability with a degree of randomization. In MATLAB, for example, audioDataAugmenter does just that with algorithms specific to speech and audio signals.
If you're interested in the topic of signal labeling in particular, and you're watching Expo from the US West Coast or the APIC region, I encourage you to watch the upcoming talk by Honeywell Technology Solutions on the use of signalLabeler to annotate noisy speech signals from aircraft pilots and air traffic control. Just keep in mind that this would be part of the different Expo track, AI and Engineering.
And with that, we don't have much left to say, if not for quickly coming back to our three most recurring challenges. I think that we've managed to see how the unbiggen effect of data-centric AI removes or significantly reduces all of these. We started to address model complexity with transfer learning through just repurposing pretrained models that already exist. And, in addition, we saw how feature extraction enables to not require complex models even when learning from scratch. Either way, we saw that having large amounts of data available is not always required to adopt AI.
And with model engineering losing importance, domain and data expertise start to play the key role as they maximize the impact of both feature extraction and data quality. For signal data, most of this is achieved through signal processing methods and tools, which also explains why MATLAB is a natural home for these methodologies.
I agree with everything you said here, Gabriele. And I just want to end with a key idea. Over the last years, as AI has become more prevalent, people have wondered if AI will replace signal processing algorithms. The truth is that the majority of successful AI stories in industry have been made possible by clever combinations of AI models and signal processing algorithms acting together.
I like the conclusion of yours. Thank you, Frantz. And also, thank you to all of you at the other end of the screen for watching. Please take a few minutes to post your comments and questions, as we do have some time left for Q&A. Thank you.