(To be removed) Create local BLAST database
blastformat will be removed in a future release.
blastformat('Inputdb', InputdbValue)
blastformat(..., 'FormatPath', FormatPathValue,
...)
blastformat(..., 'Title', TitleValue,
...)
blastformat(..., 'Log', LogValue,
...)
blastformat(..., 'Protein', ProteinValue,
...)
blastformat(..., 'FormatArgs', FormatArgsValue,
...)
InputdbValue | Character vector or string specifying a file name or path and file name of a
FASTA file containing a set of sequences to be formatted as a blastable database. If
you specify only a file name, that file must be on the MATLAB® search path or in the current folder. (This corresponds to the
formatdb option -i.) |
FormatPathValue | Character vector or string specifying the full path to the
formatdb executable file, including the name and extension of the
executable file. Default is the system path. |
TitleValue | Character vector or string specifying the title for the local database. Default
is the input FASTA file name. (This corresponds to the formatdb
option -t.) |
LogValue | Character vector or string specifying the file name or path and file name for the
log file associated with the local database. Default is
formatdb.log. (This corresponds to the
formatdb option -l.) |
ProteinValue | Specifies whether the sequences formatted as a local BLAST database are protein
or not. Choices are true (default) or false.
(This corresponds to the formatdb option
-p.) |
FormatArgsValue | NCBI formatdb command, that is, a character vector or string
containing one or more instances of -
and the option associated with it, used to specify input arguments. |
Warning
blastformat is not supported for macOS version 10.15 (Catalina) or later.
Note
To use the blastformat function, you must have a local copy of the
NCBI formatdb executable file available from your system. You can
download the formatdb executable file by accessing BLAST executables. Run the downloaded executable and configure it for your system.
For convenience, consider placing the NCBI formatdb executable file on
your system path.
blastformat('Inputdb', calls
a local version of the NCBI InputdbValue)formatdb executable file with
InputdbValue, a file name or path and file name of a FASTA file
containing a set of sequences. If you specify only a file name, that file must be on the
MATLAB search path or in the current folder. (This corresponds to the
formatdb option -i.)
It then formats the sequences as a local, blastable database, by creating multiple files,
each with the same name as the InputdbValue FASTA file, but with
different extensions. The database files are placed in the same location as the FASTA
file.
Note
If you rename the database files, make sure they all have the same name.
blastformat(..., ' calls
PropertyName',
PropertyValue, ...)blastformat with optional properties that use property name/property
value pairs. You can specify one or more properties in any order. Each
PropertyName must be enclosed in single quotation marks and is
case insensitive. These property name/property value pairs are as follows.
blastformat(..., 'FormatPath',
specifies the full path to the FormatPathValue,
...)formatdb executable file, including the name
and extension of the executable file. Default is the system path.
blastformat(..., 'Title',
specifies the title for the local database. Default is the input FASTA file name. (This
corresponds to the TitleValue,
...)formatdb option -t.)
Note
The 'Title' property does not change the file name of the database
files. This title is used internally only, and appears in the report structure returned by
the blastlocal function.
blastformat(..., 'Log',
specifies the file name or path and file name for the log file associated with the local
database. Default is LogValue,
...)formatdb.log. The log file captures the progress of
the database creation and formatting. (This corresponds to the formatdb
option -l.)
blastformat(..., 'Protein',
specifies whether the sequences formatted as a local BLAST database are protein or not.
Choices are ProteinValue,
...)true (default) or false. (This corresponds
to the formatdb option -p.)
blastformat(..., 'FormatArgs',
specifies options using the input arguments for the NCBI FormatArgsValue,
...)formatdb function.
FormatArgsValue is a character vector or string containing one or
more instances of - and the option associated
with it. For example, to specify that the input is a database in ASN.1 format, instead of a
FASTA file, you would use the following syntax:x
blastformat('Inputdb', 'ecoli.asn', 'FormatArgs', '-a T')Tip
Use the 'FormatArgs' property to specify formatdb
options for which there are no corresponding property name/property value pairs.
Note
For a complete list of valid input arguments for the NCBI formatdb
function, make sure that the formatdb executable file is located on your
system path or current folder, then type the following at your system's command prompt.
formatdb -
You can also use the syntax and input arguments accepted by the NCBI
formatdb function, instead of the property name/property value pairs
listed previously. To do so, supply a character vector or string containing multiple options
using the -
xoption syntax. For example, you can specify the
ecoli.nt FASTA file, a title of myecoli, and that
the sequences are not protein by using
blastformat('-i ecoli.nt -t myecoli -p F')Note
For a complete list of valid input arguments for the NCBI formatdb
function, make sure that the formatdb executable file is located on
your system path or current folder, then type the following at your system's command
prompt.
formatdb -
The following example assumes you have a FASTA nucleotide file, such as the E.
coli file NC_004431.fna. For FASTA files from NCBI, visit
ftp://ftp.ncbi.nlm.nih.gov/genomes/genbank/bacteria/.
Create a local blastable database from the NC_004431.fna FASTA file
and give it a title using the 'title' property.
blastformat('inputdb', 'NC_004431.fna', 'protein', 'false',...
'title', 'myecoli_nt');Create a local blastable database from the NC_004431.faa FASTA file
and rename the title and log file using formatdb syntax and input
arguments.
blastformat('inputdb', 'NC_004431.faa',...
'formatargs', '-t myecoli_aa -l ecoli_aa.log');[1] Altschul, S.F., Gish, W., Miller, W., Myers, E.W., and Lipman, D.J. (1990). Basic local alignment search tool. J. Mol. Biol. 215, 403–410.
[2] Altschul, S.F., Madden, T.L., Schäffer, A.A., Zhang, J., Zhang, Z., Miller, W., and Lipman, D.J. (1997). Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402.