How can I convert all files to txt?
12 views (last 30 days)
Show older comments
I'm working on a project which consists on developing a matlab version of the lz77 algorithm and it's extension lzss. The algortihms are already implemented and fully working for text file imported as char array. In the project I should also zip and unzip all the other types of data (like images or sounds) but I haven't found anything like that on the internet, in this forum, etc.. Is there a way to convert all files to text and than back to their original extension in MatLab? I've attemped to open files from a text editor but the algortihm seems not to work for the huge number of symbols. Thank you for all the help.
4 Comments
Guillaume
on 12 May 2017
Edited: Guillaume
on 12 May 2017
To me the question shows a lack of understanding over what a data type is or how data is encoded on a computer.
Fundamentaly, 6here is no difference between text and binary data such as an image or audio file. They are both encoded as numbers. The only difference comes from how the reader interprets these numbers.
The sequence of number 65:72 in a file can be interpreted as char as the sequence 'ABCDEFGH'. It could of course be interpreted as the sequence of 8-bit integers 65 to 72. It could also be interpreted as the sequence of 16-bit integers [16961, 17475, 17989, 18503], or it could be interpreted as the double 1.5839e40. All of these are encoded exactly the same way on a (little-endian) computer.
Therefore, saying that an algorithm only work with text is meaningless. text is just a sequence of number like any other data type. The only difference may be that the sequence of numbers used for text is limited to a certain range of values. For example, text using US-ASCII encoding is limited to the range of 8-bit values 0 to 127. However, since that wasn't specified in the question, it must be assumed that text means the char type of matlab which has no restriction on the range.
Note that in matlab, text is encoded as a sequence of 16-bit integers.
So, really, to convert a file to text, you just have to read the sequence of numbers and tell matlab they are characters, as per dpb's answer.
Answers (2)
dpb
on 12 May 2017
Edited: dpb
on 12 May 2017
To consider any file as simply a stream of bytes (char), write
dat=fread(fid,inf,'*char');
What this ignores with special file formats such as audio, image, etc., etc., is that there is header information in the file as well as just the data.
So it depends upon whether you're wanting to compress the file itself or the content of the file.
Walter Roberson
on 13 May 2017
"is there a Matlab routine which manipulates the bits of any data type and then group them 8 by 8 so I can interpret them as chars and then all the way back to reconsider them as their original data type?"
No. All of the MATLAB data types except char, logical, and the numeric ones, use the internal equivalent of pointers into memory. You might be able to get access to the pointer value, but when you reconstruct it afterwards, it is not likely to be pointing to a useful area of memory.
Because of that, some people have written serialization and deserialization routines, such as https://www.mathworks.com/matlabcentral/fileexchange/34564-fast-serialize-deserialize . Those are able to encode data as streams of bytes, and when they are deserialized, the objects are reconstructed as best feasible. But aspects such as internal serialization numbers might not get reconstructed, and the memory pointers are not going to be the same, so "==" might not work between the original version and the reconstructed version.
"As an extension of that algorithm I need to make the algorithm work with also images and sounds"
If those images or sounds are stored in files, the same way that your text is stored in files, then you do not need to do any of the serialization that I discuss above. Instead, just read the files as a sequence of chars like Duane showed, dat=fread(fid,inf,'*char'); or using dat = fileread('NameOfFile');
If you have data that has already been read into memory and is stored in numeric arrays (e.g., you are not trying to encode a drawing of an image, just the data of the image), then you can use
Class_of_variable = class(YourNumericVariable);
Variable_as_char = char( typecast(YourNumericVariable, 'uint8') )
and to reverse that,
Restored_numeric_variable = typecast( uint8(0 + Variable_as_char), Class_of_variable );
0 Comments
See Also
Categories
Find more on Audio and Video Data in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!