Converting a .txt into a series of matrices

1 view (last 30 days)
Hello, I am new matlab user and I need to convert mol2 files (a text document that stores data on positions of atoms in a molecule) into multiple matrices, so that I can manipulate the data. A sample mol2 is shown below. I've also attached the full mol2 file.
@<TRIPOS>MOLECULE
*****
83 89 0 0 0
SMALL
GASTEIGER
@<TRIPOS>ATOM
1 C -2.1071 -0.8238 0.0543 C.ar 1 LIG1 -0.0157
2 C -0.8284 -1.4433 0.0053 C.ar 1 LIG1 -0.0265
3 C 0.3551 -0.6761 -0.0339 C.ar 1 LIG1 0.0903
4 C 0.2486 0.7084 -0.0225 C.ar 1 LIG1 0.0691
5 C -0.9965 1.3355 0.0209 C.ar 1 LIG1 -0.0355
@<TRIPOS>BOND
1 1 2 ar
2 2 3 ar
3 3 4 ar
4 4 5 ar
5 5 6 ar
I want to convert the mol2 file into 3 arrays holding the information in @<TRIPOS>MOLECULE, @<TRIPOS>ATOM, and @<TRIPOS>BOND. For example, I want the "Molecule" array to look like {[83 89 0 0 0], SMALL, GASTEIGER}, and the "Atom" array to look like {[1, C, -2.1071, -0.8238, 0.0543, C.ar, 1, LIG1, -0.0157]...}.
Any help would be greatly appreciated.

Accepted Answer

Cedric
Cedric on 6 Nov 2013
Edited: Cedric on 6 Nov 2013
Here is one way to achieve this:
fileLocator = 'Bip_LAMMPS_Pract.txt' ;
content = fileread( fileLocator ) ;
tokens = regexp( content, 'MOLECULE\s*\*+([\s\d]+)([^@]+)', 'tokens' ) ;
text = strtrim( regexprep( tokens{1}{2}, '\s+', ' ' )) ;
molecule = { sscanf( tokens{1}{1}, '%d' ), text } ;
tokens = regexp( content, 'ATOM([^@]+)', 'tokens' ) ;
atom = textscan( tokens{1}{1}, '%f %s %f %f %f %s %f %s %f' ) ;
tokens = regexp( content, 'BOND(.+)', 'tokens' ) ;
bond = textscan( tokens{1}{1}, '%f %f %f %s' ) ;
Have a look at the output and let me know if you have any question. As you are new to MATLAB, note that regular/numeric arrays cannot store mixed data (numeric and strings). For this kind of data, we use cell arrays. In short, they they have to be indexed using curly brackets for accessing cells content. In the present case, molecule is a cell array which contains two cells:
>> molecule
molecule =
[5x1 double] 'SMALL GASTEIGER'
For accessing the content of the first cell (which is a numeric array):
>> molecule{1}
ans =
83
89
0
0
0
For accessing element 2 of this numeric array:
>> molecule{1}(2)
ans =
89
  1 Comment
Eric
Eric on 6 Nov 2013
Thanks for the help. The output files were what I was looking for.

Sign in to comment.

More Answers (0)

Products

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!