Can you explain how I will compress the audio file with lossless compression and dct

4 views (last 30 days)
can you help me in my project (audio compression with lossless compression ) Because I am confused and I read a lot, but I do not know where to start, as I was able to read the audio file in matlab, but I do not know the next step

Accepted Answer

Walter Roberson
Walter Roberson on 19 Dec 2022
Suppose that you had a situation where when you examined the source data to be compressed, that at any point, the next bit to be read had a 50% uniform random chance to be 0 or 1. In such a case, the content of the next bit adds one bit to the "information" content, and there would be no way to represent that choice in less than 1 bit (even on average over the long term.)
Imagine, for example, that you had a series of coin flips of a "fair" coin, 0 = tail, 1 = head, and you had recorded 0100001 so far. Is there any way to predict what the next bit will be that is better than chance? No, not unless the coin is not a "fair" coin. If the coin had, for example, a 6/8 chance of being tail instead of a 4/8 chance of being tail, then over the long run for sufficiently long inputs, you would be able to able to represent the "information" about each flip using an average of less than 1 bit per flip -- but only because the flips are not equally likely.
If you were compressing a bunch of English text, you can do better than 1 representation bit per input bit because in practice English is biased -- for example the letter 'e' is about 12.7% of all English text instead of being roughly 3.5% of all English text.
We can therefore say that if we are examining an input in which the probability of any given input is not just (1 divided by the number of different possible inputs), then we have the potential for compressing the data.
Generally speaking, at any given step, you start with a model, and you use the model to make a prediction about what is next, and then you read what is actually next, and then you find the difference between what you predicted and what you actually got, and you represent that difference internally in you record of inputs, and then you might update the model (optionally).
During this process it is common that you are temporarily internally using more space than the absolute minimum necessary, for efficiency.
At some point, either during the above phase or after you have read the entire input, you would output the internal representation in the most compact method you can think of, often writing it to a file. The file should contain everything needed to recreate the inputs -- everything except the algorithm itself and the initial states of the model. If, for example, you did a huffman encoding, then the file should end up with a version of the huffman dictionary stored inside it. You should be able to take the file over to a different, unconnected system, and use only the knowledge of the initial model and the content of the file to reproduce the original data. That as-packed-as-you-can-think-of file is the compressed version of the input.
What model should you use?
For audio it is typical that the data samples tend to cluster around the mean. So if you subtract out the mean, then the differences from the mean could be huffman encoded for a net gain.
For audio, over the long run, it is typical that the difference between samples is not all that large. So you can do huffman encoding of the differences between samples for a net gain.
For audio, if you take "windows" onto the data, then there might be a small number of dominant frequencies -- so if you were to break the signal up into groups and do analysis of the peak frequencies in each window, and use those peaks to predict samples in the window, then the record of differences between prediction and actual might be compressible (but it would have to save more space than the information needed to represent the frequency and phase.)
There are a number of different models you can use. And huffman is only one encoding method. LZ methods and arithmetic encoding can gain a fair bit.
  1 Comment
Walter Roberson
Walter Roberson on 20 Dec 2022
In the case of dct, you can do something like,
  1. break the input up into segments
  2. do a dct on each segment
  3. quantize the coefficients (for example perhaps you can represent them as 16 bits)
  4. use the quantized coefficients to predict the values of the samples
  5. take the difference between predicted and actual. If all has gone well, these differences tend to be small-ish and centered around 0
  6. encode the quantized coefficients and the differences using something like huffman or LZ or arithmetic encoding
  7. hope that the amount of storage required for the quantized coefficients and differences are less than what is required for the raw data

Sign in to comment.

More Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!