if-else statement to check the claim identity of URL

1 view (last 30 days)
How to check whether there is more than 1 URL (2 or 3 URLs...) exist in 1 URL? My purpose for this feature is to check whether there is 2 or 3...URLs hide within 1 URL, if yes then return 1, else return 0. e.g. www.abc.com/www.koko.my, http://www.abc.com=https://www.koko.my, www.abc.com.www.koko.my....etc. Here is my code, I face prob in checking the condition of URL. I have about 100++ data which save as 'URL' file. Then I want that data use 'is_double_url' function to check the results
| *is_double_url.m* |
function out = is_double_url(url_path1)
f1 = strfind(url_path1,'www.');
if isempty(f1)
out = 0;
return;
end
f2 = strfind(url_path1,'/');
f3 = bsxfun(@minus,f2,f1');
count_dots = zeros(size(f3,1),1);
for k = 1:size(f3,1)
[x,y] = find(f3(k,:)>0,1);
str2 = url_path1(f1(k):f2(y));
if ~isempty(strfind(str2,'..'))
continue
end
count_dots(k) = nnz(strfind(str2,'.'));
end
out = ~any(count_dots(2:end)<2);
if any(strfind(url_path1,'://')>f2(1))
out = true;
end
return;
| *f10.m* |
data = importdata('url');
[sizeData b] = size(data);
for i = 1:sizeData
feature10(i) = is_double_url(data{i});
end

Answers (1)

Walter Roberson
Walter Roberson on 21 Mar 2014
This turns out to be quite tough to get right.
You need to consider percent-encoding, and UTF-8 encoding, and Unicode strings, Then you have to worry about Internationalized Domain Name encoding.
Note: your example,
http://www.abc.com=https://www.koko.my
is not a valid URL. The "com=https:" would be considered to be all one component, but neiter "=" nor ":" are permitted as characters in host name components.

Tags

No tags entered yet.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!