How to properly use textscan and regexp to sift through a poorly formatted text file?

Question

Dina Dajani on 3 Jun 2016

0
Link

Direct link to this question

https://www.mathworks.com/matlabcentral/answers/287520-how-to-properly-use-textscan-and-regexp-to-sift-through-a-poorly-formatted-text-file

Commented: Dina Dajani on 9 Jun 2016

I am trying to read in a text file that has different formatting for each of it's 1000+ lines into matlab so that I can pull very specific information out. I am able to read in the text file using textscan (skipping first 23 since I do not need that information), but I have not been able to use regexp to find the strings I am looking for. I think this is because the way each line is formatted (char) is not being recognized by regexp as consecutive characters. See how strangely my "proc" variable is formatted. I attached my text file.

fileID = fopen('FIST_fMRI_Task_Run1_adultpilot-8395-1 (1).txt');
inputtxt = textscan(fileID, '%s', 'delimiter', '\n', 'HeaderLines', 23);
proc = inputtxt{1}{3}
y = regexp(proc, '\w*[Proc]', 'match')
x =regexp(proc, 'FlexibilityProc', 'match')
proc =
         P r o c e d u r e :   F l e x i b i l i t y P r o c 
y = 
      'P'    'r'    'o'    'c'    'r'    'P'    'r'    'o'    'c'
x = 
       {}

1 Comment
Show -1 older commentsHide -1 older comments

Guillaume on 6 Jun 2016

Edited: Guillaume on 6 Jun 2016

Open in MATLAB Online

I strongly suspect that

y = regexp(proc, '\w*[Proc]', 'match')

does not do what you want. It matches any number (including none) of letters, numbers or _ followed by either 'P', 'r', 'o' or 'c'.

Sign in to comment.

Sign in to answer this question.

Answer 1

Walter Roberson on 3 Jun 2016

0
Link

Direct link to this answer

https://www.mathworks.com/matlabcentral/answers/287520-how-to-properly-use-textscan-and-regexp-to-sift-through-a-poorly-formatted-text-file#answer_224398

Open in MATLAB Online

Change

fileID = fopen('FIST_fMRI_Task_Run1_adultpilot-8395-1 (1).txt');

to

fileID = fopen('FIST_fMRI_Task_Run1_adultpilot-8395-1 (1).txt', 'n', 'UTF16-LE');

and ignore the warning that says UTF16 is not supported (it is writing of the files that is not supported.)

3 Comments
Show 1 older commentHide 1 older comment

Guillaume on 6 Jun 2016

Open in MATLAB Online

Walter missed the permission argument in the fopen call. It should have read:

fopen(xxx, 'r', 'n', 'UTF16-LE')

Note that your file appears to have a BOF marker that matlab reads as the 1st character of the 1st line. In your case, it does not matter as you ignore the first 23 lines anyway.

Dina Dajani on 9 Jun 2016

This worked, thank you

Sign in to comment.

How to properly use textscan and regexp to sift through a poorly formatted text file?

1 Comment
Show -1 older commentsHide -1 older comments

Accepted Answer

3 Comments
Show 1 older commentHide 1 older comment

More Answers (0)

See Also

Categories

Tags

Community Treasure Hunt

How to properly use textscan and regexp to sift through a poorly formatted text file?

1 Comment Show -1 older commentsHide -1 older comments

Accepted Answer

3 Comments Show 1 older commentHide 1 older comment

More Answers (0)

See Also

Categories

Tags

Community Treasure Hunt

1 Comment
Show -1 older commentsHide -1 older comments

3 Comments
Show 1 older commentHide 1 older comment