How can I convert all files to txt?

12 views (last 30 days)
Simone De Vecchi
Simone De Vecchi on 12 May 2017
Answered: Walter Roberson on 13 May 2017
I'm working on a project which consists on developing a matlab version of the lz77 algorithm and it's extension lzss. The algortihms are already implemented and fully working for text file imported as char array. In the project I should also zip and unzip all the other types of data (like images or sounds) but I haven't found anything like that on the internet, in this forum, etc.. Is there a way to convert all files to text and than back to their original extension in MatLab? I've attemped to open files from a text editor but the algortihm seems not to work for the huge number of symbols. Thank you for all the help.
  4 Comments
Guillaume
Guillaume on 12 May 2017
Edited: Guillaume on 12 May 2017
To me the question shows a lack of understanding over what a data type is or how data is encoded on a computer.
Fundamentaly, 6here is no difference between text and binary data such as an image or audio file. They are both encoded as numbers. The only difference comes from how the reader interprets these numbers.
The sequence of number 65:72 in a file can be interpreted as char as the sequence 'ABCDEFGH'. It could of course be interpreted as the sequence of 8-bit integers 65 to 72. It could also be interpreted as the sequence of 16-bit integers [16961, 17475, 17989, 18503], or it could be interpreted as the double 1.5839e40. All of these are encoded exactly the same way on a (little-endian) computer.
Therefore, saying that an algorithm only work with text is meaningless. text is just a sequence of number like any other data type. The only difference may be that the sequence of numbers used for text is limited to a certain range of values. For example, text using US-ASCII encoding is limited to the range of 8-bit values 0 to 127. However, since that wasn't specified in the question, it must be assumed that text means the char type of matlab which has no restriction on the range.
Note that in matlab, text is encoded as a sequence of 16-bit integers.
So, really, to convert a file to text, you just have to read the sequence of numbers and tell matlab they are characters, as per dpb's answer.
Simone De Vecchi
Simone De Vecchi on 13 May 2017
Edited: Simone De Vecchi on 13 May 2017
Thank you Guillaume for your entusiasm. I know how data is stored and all the things you wrote. Nevertheless I developed an algorithm that works with chars because the requirement of the project was to compress text files. As an extension of that algorithm I need to make the algorithm work with also images and sounds. My question was: is there in matlab a routine which automatically converts images and sounds in chars? And then back into their original data type?
To be more precise and to follow your comment: is there a Matlab routine which manipulates the bits of any data type and then group them 8 by 8 so I can interpret them as chars and then al the way back to reconsider them as their original data type?
Hope that everything is clear, I'm italian, I know my english is not perfect, sorry for that.

Sign in to comment.

Answers (2)

dpb
dpb on 12 May 2017
Edited: dpb on 12 May 2017
To consider any file as simply a stream of bytes (char), write
dat=fread(fid,inf,'*char');
What this ignores with special file formats such as audio, image, etc., etc., is that there is header information in the file as well as just the data.
So it depends upon whether you're wanting to compress the file itself or the content of the file.
  1 Comment
Simone De Vecchi
Simone De Vecchi on 13 May 2017
Thank you dpb for your answer, but I'm looking for something different, if you want check my last comment. Thank you anyway

Sign in to comment.


Walter Roberson
Walter Roberson on 13 May 2017
"is there a Matlab routine which manipulates the bits of any data type and then group them 8 by 8 so I can interpret them as chars and then all the way back to reconsider them as their original data type?"
No. All of the MATLAB data types except char, logical, and the numeric ones, use the internal equivalent of pointers into memory. You might be able to get access to the pointer value, but when you reconstruct it afterwards, it is not likely to be pointing to a useful area of memory.
Because of that, some people have written serialization and deserialization routines, such as https://www.mathworks.com/matlabcentral/fileexchange/34564-fast-serialize-deserialize . Those are able to encode data as streams of bytes, and when they are deserialized, the objects are reconstructed as best feasible. But aspects such as internal serialization numbers might not get reconstructed, and the memory pointers are not going to be the same, so "==" might not work between the original version and the reconstructed version.
"As an extension of that algorithm I need to make the algorithm work with also images and sounds"
If those images or sounds are stored in files, the same way that your text is stored in files, then you do not need to do any of the serialization that I discuss above. Instead, just read the files as a sequence of chars like Duane showed, dat=fread(fid,inf,'*char'); or using dat = fileread('NameOfFile');
If you have data that has already been read into memory and is stored in numeric arrays (e.g., you are not trying to encode a drawing of an image, just the data of the image), then you can use
Class_of_variable = class(YourNumericVariable);
Variable_as_char = char( typecast(YourNumericVariable, 'uint8') )
and to reverse that,
Restored_numeric_variable = typecast( uint8(0 + Variable_as_char), Class_of_variable );

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!