"Interpolation requires at least two sample points" Genetic Algorithm to solve ODE

14 views (last 30 days)
I am attempting to solve a 13 parameter ODE using a genetic algorithm. Please bear with me, I know it's a lot of code, but I've been working for several months trying to fix this on my own. It would probably be easier to skip down to the full heirarchy, or to my section called *to the crux of the matter* and then use the above code for reference.
Most of this code was obtained from a tutorial I am working with: http://matlabgeeks.com/tips-tutorials/ode-tips-tutorials/modeling-with-odes-in-matlab-part-4b/
Here is the code for the GA:
function [P,best] = ga(pop_size,chrom_len,pm,pc,max_gen)
% /---------------------------------------------------------------------\
% | Copyright (C) 2009 George Bezerra |
% | |
% | This program is free software: you can redistribute it and/or modify |
% | it under the terms of the GNU General Public License as published by |
% | the Free Software Foundation, either version 3 of the License, or |
% | (at your option) any later version. |
% | |
% | This program is distributed in the hope that it will be useful, |
% | but WITHOUT ANY WARRANTY; without even the implied warranty of |
% | MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the |
% | GNU General Public License for more details. |
% | |
% | You should have received a copy of the GNU General Public License |
% | along with this program. If not, see <http://www.gnu.org/licenses/>.|
% \ ---------------------------------------------------------------------/
%
% Inputs:
% pop_size => population size
% chrom_len => chromosome length
% pm => probability of mutation
% pc => probability of crossover
% max_gen => maximum number of generations
%
% Outputs:
% P => population
% best => best individual of the population
%
% suggested run: [P,best] = ga(100,100,0.01,0.5,200);
% INITIALIZE POPULATION
P = initialize(pop_size,chrom_len);
% EVALUATION
fit = maxones_fitness(P);
gen = 1;
while gen<=max_gen & max(fit)<chrom_len
% SELECTION
P = tournament_selection(P,fit,2);
% CROSSOVER
P = two_point_crossover(P,pc);
% MUTATION
P = point_mutation(P,pm);
% EVALUATION
fit = fitness(P);
% record data
max_fit(gen) = max(fit);
mean_fit(gen) = mean(fit);
% display information
disp_info(gen,max_fit(gen));
gen = gen+1;
end
disp(sprintf('Generation: %d',gen));
disp(sprintf('Best fitness: %d\n',max_fit(end)));
% plot evolution curve
plot(1:length(max_fit), max_fit,'b');
hold on;
plot(1:length(mean_fit), mean_fit,'g');
hold off;
xlabel('Generations');
ylabel('Fitness');
legend('Best fitness','Average fitness','Location','SouthEast');
% output best individual
[m,ind] = max(fit);
best = P(ind,:);
function [P] = initialize(pop_size,chrom_length)
P = round(rand(pop_size,chrom_length));
function [fit] = maxones_fitness(P)
for i=1:size(P,1)
fit(i) = length(find(P(i,:)));
end
function [P_new] = tournament_selection(P,fit,tourn_size)
for i=1:size(P,1)
t = ceil(rand(1,tourn_size)*size(P,1));
[max_fit,winner] = max(fit(t));
P_new(i,:) = P(t(winner),:);
end
function [P_new] = two_point_crossover(P,pc)
mating_list = randperm(size(P,1));
P_new = [];
while ~isempty(mating_list)
pair = mating_list(1:2);
mating_list(1:2) = [];
if rand<pc
crossover_points = ceil(rand(1,2)*(size(P,2)));
point1 = min(crossover_points);
point2 = max(crossover_points);
individual1 = P(pair(1),:);
individual2 = P(pair(2),:);
individual1(point1:point2) = P(pair(2),point1:point2);
individual2(point1:point2) = P(pair(1),point1:point2);
P_new = [P_new;individual1;individual2];
else
P_new = [P_new;P(pair,:)];
end
end
function [P_new] = point_mutation(P,pm)
r = rand(size(P));
mutation_list = find(r<pm);
P_new = P;
P_new(mutation_list(find(P(mutation_list)==1))) = 0;
P_new(mutation_list(find(P(mutation_list)==0))) = 1;
function [] = disp_info(gen,fit)
if mod(gen,10)==0
disp(sprintf('Generation: %d',gen));
disp(sprintf('Best fitness: %d\n',fit));
end
function [P_new] = roulette_selection(P,fit)
fit = (fit - min(fit)).^2;
fit = cumsum(fit);
fit = fit/max(fit);
P_new = [];
for i=1:size(P,1)
f = find(fit>rand);
P_new = [P_new;P(f(1),:)];
end
I am confident this code itself is ok, as I have used it for a much smaller (two equations and four parameters)
In the "hierarchy" of errors the first error indicates that the problem is in line 52 where it calls the fitness function. Here is the fitness function:
% fitness(P)
% Returns a vector with the fitness scores for each member of the
% population, P
%
% Inputs:
% P - A 2D array of the population genomes
% gene_len - number of binary digits per number (1/2 chrom_len)
% Output:
% fit = A 1D array of fitness scores
function fit = fitness(P)
% Use globals so we can send the values into our tnf.m funciton
global alpha beta gamma zeta delta eta_h eta_c theta epsilon omega psi phi Ko;
% I like to initialize any arrays I use
fit = zeros(size(P,1), 1);
% Loop through each individual in the population
for i=1:size(P,1)
% Convert the current chromosome to real values. I chose to interperet
% the chromosome as two binary representations of numbers. You may
% also want to consider a Grey Code representation (or others).
[alpha, beta, gamma, zeta, delta, eta_h, eta_c, theta, epsilon, omega, psi, phi, Ko] = gene_to_values(P(i,:));
% Get the model output for the given values of beta and delta
% Changed 50000 to 132
[t, y] = ode15s('TNF', [0 20], [132, 0, 0, 400]);
% The fitness is related to the squared error between the model and
% the data, so we use our squared_error.m function. Remember we are
% interested in fitting the predicted infected populatoin, y(:,2), to
% the predicted empricial data, sick*100.
%
%NOTE removed the 100* before second list of data points (actually
%changed to 1)
error = squared_error([0 1 2 3 6 8 20] , [132 183 358 448 563 526 0], t,...
y(:,2));
% The GA maximizes values, so we return the inverse of the error to
% trick the GA into minimzing the error.
fit(i) = error^-1;
end
Line 39 is calling the next bit of code. I will rewrite here for clarity:
error = squared_error([0 1 2 3 6 8 20] , [132 183 358 448 563 526 0], t,...
y(:,2));
[0 1 2 3 6 8 20] are my time points and [132 183 358 448 563 526 0] are my data points that I am trying to solve for.
The ODE being solved with ODE15s is TNF.m and is given here:
% TNF.m
%
function dx = TNF(t,x)
global alpha beta gamma zeta delta eta_h eta_c theta epsilon omega psi phi Ko;
dx = zeros(4,1);
dx(1) = alpha*x(1)*(1-power(x(1)/x(4),1))-beta*x(1)-gamma*x(3)*x(1);
dx(2) = beta*x(1)+zeta*gamma*x(3)*x(1)-delta*x(2);
dx(3) = theta*x(2)+epsilon-omega*x(3) - phi*(power(x(4)/Ko,1));
dx(4) = eta_c*x(3)*x(1)+eta_h*x(3)-psi*x(4)*power(x(4)/Ko,4);
And the last bit of code not native to Matlab is the squared error function:
% squared_error(data_time, data_points, fn_time, fn_points)
% Calculates the squared error between a set of data points and a
% series of points representing a function output.
%
% Inputs:
% data_time - A 1D array containing the time points of the data
% data_points - A 1D array of the data values
% fit_time - A 1D array of the time points for the function output
% fit_points - A 1D array of the function values
% Output:
% error = the sum of the squared residuals of the data vs. the function
function error = squared_error(data_time, data_points, fn_time, fn_points)
% Intepolate the fit to match the time points of the data
fn_values = interp1(fn_time, fn_points, data_time);
% Square the error values and sum them up
error = sum((fn_values - data_points).^2);
Here are the errors:
*Warning: Matrix is singular, close to singular or badly scaled. Results may be inaccurate. RCOND = NaN. > In ode15s at 590 In fitness at 29 In ga at 52 Warning: Failure at t=0.000000e+00. Unable to meet integration tolerances without reducing the step size below the smallest value allowed (7.905050e-323) at time t. > In ode15s at 669 In fitness at 29 In ga at 52 Error using griddedInterpolant Interpolation requires at least two sample points in each dimension.
Error in interp1>Interp1D (line 346) F = griddedInterpolant(Xext,V,method);
Error in interp1 (line 241) Vq = Interp1D(X,V,Xq,method);
Error in squared_error (line 16) fn_values = interp1(fn_time, fn_points, data_time);
Error in fitness (line 39) error = squared_error([0 1 2 3 6 8 20] , [132 183 358 448 563 526 0], t,...
Error in ga (line 52) fit = fitness(P);*
The warnings may not be important. My question (finally!) is on the errors. I can trace the problem as follows:
ga.m calls fitness.m, which calls squared_error.m as follows:
error = squared_error([0 1 2 3 6 8 20] , [132 183 358 448 563 526 0], t,y(:,2));
The first line of squared_error.m provides the syntax:
% squared_error(data_time, data_points, fn_time, fn_points)
Note that fn_time in this place is just t. This is an autonomous ODE that does not depend on t.
So let's look at squared error (the troublesome line 16):
fn_values = interp1(fn_time, fn_points, data_time);
squared_error is calling interp1 with these parameters, basically like this:
interp1(t, [0 1 2 3 6 8 20] , [132 183 358 448 563 526 0])
interp1.m has a syntax as follows (from the comments)
Vq = INTERP1(X,V,Xq) interpolates to find Vq, the values of the
% underlying function V=F(X) at the query points Xq. X must
% be a vector of length N.
so X=t V=[0 1 2 3 6 8 20] Xq = [132 183 358 448 563 526 0]
Now interp1.m is calling interp1d on line 241. (here's where I start to get confused)
Vq = Interp1D(X,V,Xq,method);
Now, I know: X=t V=[0 1 2 3 6 8 20] Xq = [132 183 358 448 563 526 0]
I don't know what the method is. Interp1D doesn't seem to have an m file. Though we're not at the error yet. interp1.m also calls griddedInterpolant.m on like 346. This is where we are getting an error.
F = griddedInterpolant(Xext,V,method);
Now, Xext = X which is still t V is still [0 1 2 3 6 8 20]
*TO THE CRUX OF THE MATTER**
Matlab is complaining that griddedinterpolant.m "requires two sample points in each dimension". I think it is complaining about t. But my ODEs are autonomous and do not depend upon t. I'm clearly not understanding something, and I'm over my head in much of this code.
I think the problem is most likely in the fitness function, and in my lack of understanding in how all of this fits together. The tutorial above will give you an idea what I'm trying to do. I took a basic example of an ODE with two equations and two parameters and tried to adopt it to one with 4 equations and 13 parameters. (gulp!). The code works fine for the simpler case.
I will be eternally grateful for help on this.
-Dave K
  5 Comments
David
David on 12 Aug 2014
Sorry about the attachment. It should be there this time.
132 is the starting "population" (actually viable tumor cells in my current work, but very much like a [complicated] population model).
Where I might be misunderstanding something is this [just typing it out helps me, so bear with me]:
The original example was about a population that did not change, but some became susceptible, infectious, and then recovered. The data points were the numbers of the infected population.
In my case I start out with a population(volume) of 132, which changes, i.e. [132,183,358,448,563,526,0] with respect to time timepts=[0,1,2,3,6,8,20].
But is an autonomous equation. I can relate other biological facts that might be relevant, but I don't want to obfuscate too much. However it may be the way I'm thinking about the problem. My "data point" is the same as my "initial condition." It's the first measurement taken on the tumor volume. Thanks again.
Geoff Hayes
Geoff Hayes on 12 Aug 2014
You realize that the population size will not change over time? It will remain fixed at 132 members (unless the code has been modified to , where each member eventually evolves to the optimal (or sub-optimal) solution.

Sign in to comment.

Accepted Answer

Geoff Hayes
Geoff Hayes on 12 Aug 2014
Hi David,
I looked at the gene_to_values function and noticed a couple of things. The first is that you have the right idea in parsing out the 13 variables/genes from the binary array chromosome, but the code is not quite correct. In your case, the binary chromosome is a 1x130 array of doubles, with each gene/variable allotted 10 bits. The code parses these values as
alpha_decimal = bi2de(gene(1:gene_len));
beta_decimal = bi2de(gene((gene_len+1):end));
gamma_decimal = bi2de(gene((gene_len+2):end));
% etc.
In the above, the alpha_decimal variable is assigned the decimal equivalent of the first 10 bits of gene (could be renamed to chromosome) as expected. But beta_decimal is assigned the decimal equivalent of bits 11 through 130 (!) because of the indexing into gene with gene_len+1:end. And gamma_decimal is assigned bits 12 through 130, etc.
What the code needs to do, is extract the bits in blocks of 10. This can be done in a couple of ways (i.e. using a for loop and an array to store the data) but in keeping with how the code is already designed, I will suggest that you define the starting indices (into gene) for each variable as
% constants for indices into the array for each variable
IDX_ALPH = 1;
IDX_BETA = IDX_ALPH + gene_len;
IDX_GAMM = IDX_BETA + gene_len;
IDX_ZETA = IDX_GAMM + gene_len;
IDX_DELT = IDX_ZETA + gene_len;
IDX_ETAH = IDX_DELT + gene_len;
IDX_ETAC = IDX_ETAH + gene_len;
IDX_THET = IDX_ETAC + gene_len;
IDX_EPSI = IDX_THET + gene_len;
IDX_OMEG = IDX_EPSI + gene_len;
IDX_PSII = IDX_OMEG + gene_len;
IDX_PHII = IDX_PSII + gene_len;
IDX_KOOO = IDX_PHII + gene_len;
The above naming may look a little weird, but I just wanted each variable name to be 8 characters long. Note how each index makes use of the one before, and just adds the gene length (10) to it. The code to extract these bits can now be modified to
alpha_decimal = bi2de(gene(IDX_ALPH:IDX_BETA-1));
beta_decimal = bi2de(gene(IDX_BETA:IDX_GAMM-1));
gamma_decimal = bi2de(gene(IDX_GAMM:IDX_ZETA-1));
zeta_decimal = bi2de(gene(IDX_ZETA:IDX_DELT-1));
delta_decimal = bi2de(gene(IDX_DELT:IDX_ETAH-1));
eta_h_decimal = bi2de(gene(IDX_ETAH:IDX_ETAC-1));
eta_c_decimal = bi2de(gene(IDX_ETAC:IDX_THET-1));
theta_decimal = bi2de(gene(IDX_THET:IDX_EPSI-1));
epsilon_decimal = bi2de(gene(IDX_EPSI:IDX_OMEG-1));
omega_decimal = bi2de(gene(IDX_OMEG:IDX_PSII-1));
psi_decimal = bi2de(gene(IDX_PSII:IDX_PHII-1));
phi_decimal = bi2de(gene(IDX_PHII:IDX_KOOO-1));
Ko_decimal = bi2de(gene(IDX_KOOO:end)) ;
I may have made a typo in the above so please review - for every variable we start at the index defined for that variable and proceed to one less than the starting index for the next variable (and so we should get the ten bits).
Unfortunately, I don't have the Communications System Toolbox with bi2de and so can't fully test out the above. But I did make use of bin2dec and other functions to get the right bits and was able to (sloooowly) process the algorithm and get past the point where you were experiencing errors with the interpolation. So hopefully, you will be able to too!
The next thing I noticed was the minimum and maximum limits for the variables, again defined in gene_to_values.m
alpha_min_exponent = -10;
alpha_max_exponent = 10;
beta_min_exponent = -10;
beta_max_exponent = 10;
% etc.
There were two bugs with for the eta_h_exponent and the ko_exponent with the ordering of the max and min values (for the first) and the min exponent from another variable being used (for the second). Please change both to
eta_h_exponent = eta_h_min_exponent + ...
((eta_h_max_exponent-eta_h_min_exponent) * ...
eta_h_decimal / 2^gene_len);
and
Ko_exponent = Ko_min_exponent + ...
((Ko_max_exponent-Ko_min_exponent) * ...
Ko_decimal / 2^gene_len);
Given the above equations, what does that tell us about the interval (range of data) for each variable?
Each variable is 10 bits long, so the minimum "bit" value is zero (all bits are zeros) and the maximum "bit" value is 1023 (all bits are ones). According to the equations (like the two above) this means that the minimum and maximum values prior to the base 10 conversion for any variable is
min_value = -10 + ((10-(-10)) * 0/2^10) = -10
max_value = -10 + ((10-(-10)) * 1023/2^10) = 9.980468750000000
Then we do the base 10 conversion, and our min and maximum values become
min_value = 10^min_value = 1e-10
max_value = 10^max_value = 9.5602e+09
What that seems to imply is that we are spreading our 1024 possible variable values within the (huge!) range from 1e-10 to 9.5602e+09.
In the original code, the min and max exponents for the beta and delta variables were -7,1 and -4,2 respectively. These corresponded to intervals of [1e-07,9.8217] and [0.0001,98.66] (assuming I've done the math correctly!) which seem much more reasonable to spread the 1024 possible variable values across.
So we can do one of two things - increase the number of bits in the gene length from 10 to 15 (or 20) OR modify the minimum and maximum exponents to something closer to what you expect for each variable. I prefer the latter solution - if we increase the number of bits, the CPU processing will more than likely increase and it will take longer to run the simulation. If we restrict the intervals for each variable, then we can avoid nonsense values that do nothing to help solve the problem. I don't know what each variable represents so can't say which min and max exponents to use, but given the examples above, some reasonable values (or maybe those of the original code) should be found easily enough.
And one final thing, David - since you are running the code with bi2de, that must mean that you have the Communications System Toolbox. Do you happen to have the Global Optimization Toolbox as well? If so, it has its own implementation of the genetic algorithm which should be easier to use. You really only have to worry about defining the fitness function (and selecting the methods for crossover, mutation, etc.) and leave the rest to the toolbox. Its method of doing crossover allows you to ignore all this binary representation, and you just define the intervals for each variable...without considering whether it is base 10 or otherwise.
Hope the above helps!
  5 Comments
David
David on 15 Aug 2014
Wow Geoff. I feel a bit dense. I was really misunderstanding of "population" in the GA had to do with the GA itself. Probably a signal that I am a little over my head in this project. But I am also learning very quickly.
I also realized today that one of my parameters (Ko) is not a parameter to be solved for, but a given value. So I "only" have 12 parameters to solve for.
Anyway, we've gone beyond solving my initial problem and I'm well on my way. Can't thank you enough!

Sign in to comment.

More Answers (0)

Products

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!