How do i find one string with another?

2 views (last 30 days)
Hi Guys,
I am trying to sequentially look for a string in a document and was wondering how I would go about doing that.
Essentially I have a large file called A.csv with a bunch of columns [date], [Open], [Low], [Close], [Volume], [Adj.Close], [Ret]
I want to write a script that will find a date 4/5/2000 and will pull the corresponding return for that date.
This is the trick: the day is a variable. All of the dates have a different month and year so it looks something like 1/3/2000, 2/4/2000, 3/1/2000. How do I find a match using the year and month. For example, I want to pull 1/*/2006, 7/*/2007, but I don't know what the * is (it could be 1, 2, 3, 4, 5, etc...)
The first row for example (skipping the header), looks like: 1/3/2000,78.75,78.94,58.13,66.19,1642300,62.37,0.569183903
Thank you for all of your help guys!
-Larry G.
  1 Comment
Laurentiu Galan
Laurentiu Galan on 10 Jan 2012
I want to be able to pull the return (0.569183903) and place it in a vector.

Sign in to comment.

Accepted Answer

Andrew Newell
Andrew Newell on 11 Jan 2012
The trick is to use regular expressions. The first line below searches for any string that has one or more integers between '1/' and '/2000'. One line at a time is examined and the number extracted if there is a match.
match_str = '1/[0-9]+/2000';
match_vector = zeros(32000,1); % Use whatever size you're sure is large enough
fid=fopen('yourfile.m');
count=0;
tline = fgetl(fid);
while ischar(tline)
if regexp(tline,match_str)
A = textscan(tline,'%*s %*f %*f %*f %*f %*d %*f %f','delimiter',',');
count = count+1;
match_vector(count) = A{1};
end
end
fclose(fid);
  5 Comments
Andrew Newell
Andrew Newell on 11 Jan 2012
Ah - I found the error: I should be updating match_vector inside the IF-END block.

Sign in to comment.

More Answers (2)

Walter Roberson
Walter Roberson on 11 Jan 2012
regexp(STRING, '^(?<=1/\d+/2006/.*,)[^,]+$', 'match', 'dotexceptnewline', 'lineanchors')

Laurentiu Galan
Laurentiu Galan on 11 Jan 2012
Thanks Andrew!! This is great. Now comes the really tough part. I want to loop the whole process.
Basically I want to open a file 'A' which contains a series of Ticker symbols (A.csv, B.csv, (about 8000 of them) and month and date information).
Then I want to open each individual file in a directory which are named based on the ticker symbols in file 'A'. Finally, I want to pull returns from each file using the month and date information also located in file A.
I don't expect you to be able to help me with this task as it is really extensive, but I was wondering if you wouldn't mind some sharing some insight as to how I can improve the whole process?
Your answer was more than sufficient as it has helped organize some of my thinking process. Thanks a bunch!! Any additional insight is greatly appreciated.
I attached my existing code to try and better explain what I mean:
%Code to Get Matrix
%fid=fopen('C:\Users\Laurentiu Galan\Desktop\pca1.csv');
C = textscan(fid, '%s %s %s %*s %*s %*s %*s', 'delimiter', ',', ...
'HeaderLines', '1');
fclose(fid);
%Strcat Identifier
tickername=C{1}
year=C{2};
month=C{3};
%Get Size of Loop for filepath
D=size(tickername);
numval=D(1,1);
%Create Loop for filepaths
for i=1:numval;
filepath(i,1) = strcat('C:\Users\Laurentiu Galan\Desktop\', tickername(i,1), '.csv');
end;
%Create Matching value
for i=1:numval;
ssearch(i,1) = strcat(month(i,1), '/[0-9]+/', year(i,1));
end;
%Open file where search will take place (path name will be looped)
match_str = '1/[0-9]+/2000'; %(This will also be looped based ssearch
match_vector = zeros(32000,1);
fid=fopen('-------'); %<- Loop for each file goes here
count=0;
tline = fgetl(fid);
while ischar(tline)
if regexp(tline,match_str)
A = textscan(tline,'%*s %*f %*f %*f %*f %*d %*f %f','delimiter',',');
count = count+1;
match_vector(count) = A{1};
end
end
fclose(fid);
%Then I want to output a file with all the returns and ticker symbols on
%the desktop
  1 Comment
Andrew Newell
Andrew Newell on 11 Jan 2012
I don't see anything obvious. For any code I suggest the following sequence: (1) test it thoroughly to make sure it works; (2) run it with the MATLAB Profiler and see where the code is spending most of its time; and (3) look for ways to speed up that part of the code.

Sign in to comment.

Categories

Find more on Loops and Conditional Statements in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!