How to properly use textscan and regexp to sift through a poorly formatted text file?
1 view (last 30 days)
Show older comments
Dina Dajani
on 3 Jun 2016
Commented: Dina Dajani
on 9 Jun 2016
I am trying to read in a text file that has different formatting for each of it's 1000+ lines into matlab so that I can pull very specific information out. I am able to read in the text file using textscan (skipping first 23 since I do not need that information), but I have not been able to use regexp to find the strings I am looking for. I think this is because the way each line is formatted (char) is not being recognized by regexp as consecutive characters. See how strangely my "proc" variable is formatted. I attached my text file.
fileID = fopen('FIST_fMRI_Task_Run1_adultpilot-8395-1 (1).txt');
inputtxt = textscan(fileID, '%s', 'delimiter', '\n', 'HeaderLines', 23);
proc = inputtxt{1}{3}
y = regexp(proc, '\w*[Proc]', 'match')
x =regexp(proc, 'FlexibilityProc', 'match')
proc =
P r o c e d u r e : F l e x i b i l i t y P r o c
y =
'P' 'r' 'o' 'c' 'r' 'P' 'r' 'o' 'c'
x =
{}
1 Comment
Accepted Answer
Walter Roberson
on 3 Jun 2016
Change
fileID = fopen('FIST_fMRI_Task_Run1_adultpilot-8395-1 (1).txt');
to
fileID = fopen('FIST_fMRI_Task_Run1_adultpilot-8395-1 (1).txt', 'n', 'UTF16-LE');
and ignore the warning that says UTF16 is not supported (it is writing of the files that is not supported.)
3 Comments
Guillaume
on 6 Jun 2016
Walter missed the permission argument in the fopen call. It should have read:
fopen(xxx, 'r', 'n', 'UTF16-LE')
Note that your file appears to have a BOF marker that matlab reads as the 1st character of the 1st line. In your case, it does not matter as you ignore the first 23 lines anyway.
More Answers (0)
See Also
Categories
Find more on Resizing and Reshaping Matrices in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!