Is there a way to determine illegal characters for file names based on the computer operating system?

17 views (last 30 days)
I am writing a code where the user can pass in strings, and a file name is returned (similar to fullfile, though the pieces make up a file name). I want to make sure the file name is valid and does not contain any illegal characters. I know about regexp, however that is dependent on hard-coding certain characters to look for. There could be different illegal characters across different operating systems (ie. Windows v.s Mac). Does anyone know of a way to determine what characters are illegal based on whichever operating system is running?

Accepted Answer

Jan
Jan on 27 Oct 2017
Edited: Jan on 27 Oct 2017
You have to write the checking for the different operating systems separately:
function [Valid, Msg] = CheckFileName(S)
Msg = '';
if ispc
BadChar = '<>:"/\|?*';
BadName = {'CON', 'PRN', 'AUX', 'NUL', 'CLOCK$', ...
'COM1', 'COM2', 'COM3', 'COM4', 'COM5', 'COM6', ...
'COM7', 'COM8', 'COM9', ...
'LPT1', 'LPT2', 'LPT3', 'LPT4', 'LPT5', 'LPT6', ...
'LPT7', 'LPT8', 'LPT9'};
bad = ismember(BadChar, S);
if any(bad)
Msg = ['Name contains bad characters: ', BadChar(bad)];
elseif any(S < 32)
Msg = ['Name contains non printable characters, ASCII:', sprintf(' %d', S(S < 32))];
elseif ~isempty(S) && (S(end) == ' ' || S(end) == '.')
Msg = 'A trailing space or dot is forbidden';
else
% "AUX.txt" fails also, so extract the file name only:
[~, name] = fileparts(S);
if any(strcmpi(name, BadName))
Msg = ['Name not allowed: ', name];
end
end
else % Mac and Linux:
if any(S == '/')
Msg = '/ is forbidden in a file name';
elseif any(S == 0)
Msg = '\0 is forbidden in a file name';
end
end
Valid = isempty(Msg);
end
[EDITED] 'CLOCK$' added, but this might be accepted by modern Windows systems. Using these names might let your Windows system crash also: $Mft, $MftMirr, $LogFile, $Volume, $AttrDef, $Bitmap, $Boot, $BadClus, $Secure, $Upcase, $Extend, $Quota, $ObjId and $Reparse. But they are valid.
There are good reasons to avoid # $ % ! & ' { } @ also in files, e.g. when they are saved for web access. Leading dots and hyphens have an extra meaning. Spaces at the beginning and the end are evil, because they can drive users crazy. But they are not forbidden in general. The length of the file name matters also and you have to decide, if Unicode characters are valid or not.
The function above has an inconsistency: For Windows it is checked, if the file name is one of the reserved device names (DOS! We have 2017 and suffer from bad design ideas from the 80th.) Therefore an optional file path is removed from the name. But for the Linux part, the file separator '/' is rejected. Well, I'm too tired currently to decide, if the names are accepted with or without path.
It might be a good idea to avoid characters forbidden under Windows if you are working under Linux, because it impedes the interchange of files.
  4 Comments

Sign in to comment.

More Answers (1)

Clément Vasseur
Clément Vasseur on 26 Feb 2024
A more global solution is to test it like this:
function is_valid = verify_filename(my_string)
% Verify if string can be a filename.
% In windows (for example), some characters are forbiden like ? or /.
%
% Inputs:
% -my_string (1,1) string % the string to be verify
%
% Outputs:
% -is_valid (1,1) boolean % true if it can be a filename.
arguments
my_string (1,1) string
end
temp = tempname;
mkdir(temp);
tp_filename = fullfile(temp, my_string);
try
fileID = fopen(tp_filename, 'w');
fclose(fileID);
% isfile(tp_filename)
is_valid = true;
delete(tp_filename)
rmdir(temp)
catch ME
if (strcmp(ME.identifier,'MATLAB:FileIO:InvalidFid'))
is_valid = false;
rmdir(temp)
else
error('opt:verify_filename', 'Bad identification of error.')
end
end

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!