Can you explain how I will compress the audio file with lossless compression and dct

Question

salah on 17 Dec 2022

0
Link

Direct link to this question

https://www.mathworks.com/matlabcentral/answers/1880402-can-you-explain-how-i-will-compress-the-audio-file-with-lossless-compression-and-dct

Commented: Walter Roberson on 20 Dec 2022

can you help me in my project (audio compression with lossless compression ) Because I am confused and I read a lot, but I do not know where to start, as I was able to read the audio file in matlab, but I do not know the next step

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Sign in to answer this question.

Answer 1

Walter Roberson on 19 Dec 2022

0
Link

Direct link to this answer

https://www.mathworks.com/matlabcentral/answers/1880402-can-you-explain-how-i-will-compress-the-audio-file-with-lossless-compression-and-dct#answer_1131912

Suppose that you had a situation where when you examined the source data to be compressed, that at any point, the next bit to be read had a 50% uniform random chance to be 0 or 1. In such a case, the content of the next bit adds one bit to the "information" content, and there would be no way to represent that choice in less than 1 bit (even on average over the long term.)

Imagine, for example, that you had a series of coin flips of a "fair" coin, 0 = tail, 1 = head, and you had recorded 0100001 so far. Is there any way to predict what the next bit will be that is better than chance? No, not unless the coin is not a "fair" coin. If the coin had, for example, a 6/8 chance of being tail instead of a 4/8 chance of being tail, then over the long run for sufficiently long inputs, you would be able to able to represent the "information" about each flip using an average of less than 1 bit per flip -- but only because the flips are not equally likely.

If you were compressing a bunch of English text, you can do better than 1 representation bit per input bit because in practice English is biased -- for example the letter 'e' is about 12.7% of all English text instead of being roughly 3.5% of all English text.

We can therefore say that if we are examining an input in which the probability of any given input is not just (1 divided by the number of different possible inputs), then we have the potential for compressing the data.

Generally speaking, at any given step, you start with a model, and you use the model to make a prediction about what is next, and then you read what is actually next, and then you find the difference between what you predicted and what you actually got, and you represent that difference internally in you record of inputs, and then you might update the model (optionally).

During this process it is common that you are temporarily internally using more space than the absolute minimum necessary, for efficiency.

At some point, either during the above phase or after you have read the entire input, you would output the internal representation in the most compact method you can think of, often writing it to a file. The file should contain everything needed to recreate the inputs -- everything except the algorithm itself and the initial states of the model. If, for example, you did a huffman encoding, then the file should end up with a version of the huffman dictionary stored inside it. You should be able to take the file over to a different, unconnected system, and use only the knowledge of the initial model and the content of the file to reproduce the original data. That as-packed-as-you-can-think-of file is the compressed version of the input.

What model should you use?

For audio it is typical that the data samples tend to cluster around the mean. So if you subtract out the mean, then the differences from the mean could be huffman encoded for a net gain.

For audio, over the long run, it is typical that the difference between samples is not all that large. So you can do huffman encoding of the differences between samples for a net gain.

For audio, if you take "windows" onto the data, then there might be a small number of dominant frequencies -- so if you were to break the signal up into groups and do analysis of the peak frequencies in each window, and use those peaks to predict samples in the window, then the record of differences between prediction and actual might be compressible (but it would have to save more space than the information needed to represent the frequency and phase.)

There are a number of different models you can use. And huffman is only one encoding method. LZ methods and arithmetic encoding can gain a fair bit.

1 Comment
Show -1 older commentsHide -1 older comments

Walter Roberson on 20 Dec 2022

In the case of dct, you can do something like,

break the input up into segments
do a dct on each segment
quantize the coefficients (for example perhaps you can represent them as 16 bits)
use the quantized coefficients to predict the values of the samples
take the difference between predicted and actual. If all has gone well, these differences tend to be small-ish and centered around 0
encode the quantized coefficients and the differences using something like huffman or LZ or arithmetic encoding
hope that the amount of storage required for the quantized coefficients and differences are less than what is required for the raw data

Sign in to comment.

Can you explain how I will compress the audio file with lossless compression and dct

0 Comments
Show -2 older commentsHide -2 older comments

Accepted Answer

1 Comment
Show -1 older commentsHide -1 older comments

More Answers (0)

See Also

Categories

Tags

Community Treasure Hunt

Can you explain how I will compress the audio file with lossless compression and dct

0 Comments Show -2 older commentsHide -2 older comments

Accepted Answer

1 Comment Show -1 older commentsHide -1 older comments

More Answers (0)

See Also

Categories

Tags

Community Treasure Hunt

0 Comments
Show -2 older commentsHide -2 older comments

1 Comment
Show -1 older commentsHide -1 older comments