if-else statement to check the claim identity of URL
1 view (last 30 days)
Show older comments
How to check whether there is more than 1 URL (2 or 3 URLs...) exist in 1 URL? My purpose for this feature is to check whether there is 2 or 3...URLs hide within 1 URL, if yes then return 1, else return 0. e.g. www.abc.com/www.koko.my, http://www.abc.com=https://www.koko.my, www.abc.com.www.koko.my....etc. Here is my code, I face prob in checking the condition of URL. I have about 100++ data which save as 'URL' file. Then I want that data use 'is_double_url' function to check the results
| *is_double_url.m* |
function out = is_double_url(url_path1)
f1 = strfind(url_path1,'www.');
if isempty(f1)
out = 0;
return;
end
f2 = strfind(url_path1,'/');
f3 = bsxfun(@minus,f2,f1');
count_dots = zeros(size(f3,1),1);
for k = 1:size(f3,1)
[x,y] = find(f3(k,:)>0,1);
str2 = url_path1(f1(k):f2(y));
if ~isempty(strfind(str2,'..'))
continue
end
count_dots(k) = nnz(strfind(str2,'.'));
end
out = ~any(count_dots(2:end)<2);
if any(strfind(url_path1,'://')>f2(1))
out = true;
end
return;
| *f10.m* |
data = importdata('url');
[sizeData b] = size(data);
for i = 1:sizeData
feature10(i) = is_double_url(data{i});
end
0 Comments
Answers (1)
Walter Roberson
on 21 Mar 2014
This turns out to be quite tough to get right.
You need to consider percent-encoding, and UTF-8 encoding, and Unicode strings, Then you have to worry about Internationalized Domain Name encoding.
Note: your example,
http://www.abc.com=https://www.koko.my
is not a valid URL. The "com=https:" would be considered to be all one component, but neiter "=" nor ":" are permitted as characters in host name components.
0 Comments
See Also
Categories
Find more on Workspace Variables and MAT-Files in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!