How to properly use textscan and regexp to sift through a poorly formatted text file?

1 view (last 30 days)
I am trying to read in a text file that has different formatting for each of it's 1000+ lines into matlab so that I can pull very specific information out. I am able to read in the text file using textscan (skipping first 23 since I do not need that information), but I have not been able to use regexp to find the strings I am looking for. I think this is because the way each line is formatted (char) is not being recognized by regexp as consecutive characters. See how strangely my "proc" variable is formatted. I attached my text file.
fileID = fopen('FIST_fMRI_Task_Run1_adultpilot-8395-1 (1).txt');
inputtxt = textscan(fileID, '%s', 'delimiter', '\n', 'HeaderLines', 23);
proc = inputtxt{1}{3}
y = regexp(proc, '\w*[Proc]', 'match')
x =regexp(proc, 'FlexibilityProc', 'match')
proc =
P r o c e d u r e : F l e x i b i l i t y P r o c
y =
'P' 'r' 'o' 'c' 'r' 'P' 'r' 'o' 'c'
x =
{}
  1 Comment
Guillaume
Guillaume on 6 Jun 2016
Edited: Guillaume on 6 Jun 2016
I strongly suspect that
y = regexp(proc, '\w*[Proc]', 'match')
does not do what you want. It matches any number (including none) of letters, numbers or _ followed by either 'P', 'r', 'o' or 'c'.

Sign in to comment.

Accepted Answer

Walter Roberson
Walter Roberson on 3 Jun 2016
Change
fileID = fopen('FIST_fMRI_Task_Run1_adultpilot-8395-1 (1).txt');
to
fileID = fopen('FIST_fMRI_Task_Run1_adultpilot-8395-1 (1).txt', 'n', 'UTF16-LE');
and ignore the warning that says UTF16 is not supported (it is writing of the files that is not supported.)
  3 Comments
Guillaume
Guillaume on 6 Jun 2016
Walter missed the permission argument in the fopen call. It should have read:
fopen(xxx, 'r', 'n', 'UTF16-LE')
Note that your file appears to have a BOF marker that matlab reads as the 1st character of the 1st line. In your case, it does not matter as you ignore the first 23 lines anyway.

Sign in to comment.

More Answers (0)

Categories

Find more on Resizing and Reshaping Matrices in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!