Append lines to interrupted CSV file by calculating new values while ignoring lines that have text information
Show older comments
I have a file that's almost a CSV, except that on a random range between 100 to 200 there exists a line that contains some random text. Each line has 40 floating point values formatted in scientific form, and I need to add 2 columns to the end of each line to be calculated based on the prior 40 floating point values on the same line.
I have attempted to re-format the CSV using sed on Linux via
sed -i '1,30d' sample.dat
sed -i '/[ZIJF]/d' sample.dat
(which deletes the header as well as any line containing 'Z', 'I', 'J', or 'F') however doing so breaks compatability with anything reading the data.
Currently my code to read the files is of the form
disp("Loading (what should be) sample data")
fin = fopen('sample.dat');
data = textscan(fin, '%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f', 'HeaderLines',0,'Delimiter', '\n', 'CollectOutput',1);
modeldata = cell2mat(data);
which ONLY works if I strip out the text lines with sed and will stop short of parsing the whole file otherwise. The rest of my code takes the modeldata matrix and adds the requisite 2 columns (average and standard deviation) before printing a new CSV file.
I would like to see Matlab load the csv file, read line-by-line and add the requisite 2 columns to any line that does not contain text, and print the final file keeping the header and random text entries in place. I know this is possible in matlab (I'm imagining having to write a loop calling fgetl, checking the line for text, running textscan if it doesn't have text and extending it as required before writing a new line to some new file) however I'm not sure what approach will be easiest.
1 Comment
dpb
on 25 Feb 2020
Think you'll need to attach a sample text file that illustrates the problem and then explain with it what you think result should be instead.
If I understand correctly without seeing the actual data, sounds like just reading the file as cellstr() array and some searching should be relatively simple. But, we need data!!!
Accepted Answer
More Answers (1)
Functional, but somewhat more in the MATLAB spirit...using vectorized operations in contains function and input/output...
chrs=["y","\","Z","I","J","F","i"]; % list of strings to find in input
fio=fopen('Output.dat','w');
fin=fopen('sample.dat','r');
while ~feof(fin)
ln = fgets(fin); % read line including newline
if contains(ln,chrs) % magic characters exist
fwrite(OutFile,ln) % echo back out including newline
continue % skip trying to convert nonnumeric data
end
data=sscanf(ln,'%f'); % sscanf will read until runs out of data into array
mnvar=mean(data);
sdvar=std(data);
fmt=[repmat('%f ',1,numel(data)+1,1) '%f\n']; % build format string w/ count--include other two
fprintf(fio,fmt,data,mnvar,sdvar); % and write
end
I hadn't thought wanted to keep the strings in the middle of the file before...the processing by line is probably as good as any if do. My solution otherwise was going to be something like
chrs=["y","\","Z","I","J","F","i"]; % list of strings to find in input
fio=fopen('Output.dat','w');
data=textread('sample.dat','%s','delimiter',''); % import cellstr() array of lines in file
data=data(~contains(ln,chrs)); % throw away text lines
data=str2num(char(data)); % convert to data
mnvar=mean(data,2); % means by row
sdvar=std(data,[],2); % ditto std dev (default weighting)
fmt=[repmat('%f ',1,numel(data)+1,1) '%f\n']; % build format string w/ count--include other two
fprintf(fio,fmt,data,mnvar,sdvar); % and write
As noted, the above would eliminate the text rows; to save them would have to save the logical vector from contains and process the numeric as above but not eliminate the other lines. To write most expeditiously would depend on whether the text rows are contiguous or scattered about in the file.
Categories
Find more on Characters and Strings in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!