ords -T mytrain -t mytestThis assumes one nearest-neighbor and that all variables are categorical. This command
ords -T mytrain -t mytest -N -k 1 3 5 7 9 -vtells the program that all the variables are numeric and that five different numbers of nearest neighbors (1, 3, 5, 7, and 9) should be tried. The -v flag produces "verbose" output.
ords -T mytrain -t mytest -duses a different technique, in which both classes and categories are mapped into real numbers in a space of dimension (number of classes - 1) and then a discriminant analysis is performed to do the classification. Computations are similar. Details can be found in Buttrey (2001).
|-C||Name of cost file. Default: none; all misclassifications cost 1.|
|-c||Use classification (rather than discrimination). The default.|
|-d||Use discrimination (rather than classification). By default nearest-neighbor classification is used.|
|-F||Name of status file, to which "verbose"-type messages are sent. Default: messages are sent to stderr. Even when verbose is 0, there is a small set of messages (like mappings and error rates) you can expect. Messages arising from problems encountered while reading the options are always sent to stderr (since these might arise before the -F option has been parsed).|
|-f||Name of fascinating file. The fascinating file gives information about each of the variables in the case where some variables are categorical (or ordered) and others numeric. Default: no file, all variables are unordered categorical (see -N and -U).|
|-i||Improvement. This is the minimum amount by which the first eigenvalues must go up in order for the stepwise addition to continue. That is, if the increase in the first eigenvalue is smaller than "improvement," the model is complete. Default: 1e-9. This is ignored if the permutation test is used (see -P).|
|-K||Number of knots per continuous variable. Default: 5.|
|-k||Vector of k's for k-nearest-neighbor classification. Default: k = 1 only.|
|-m||Missing value threshold. Any item smaller than this number is presumed to represent a missing value. Default: -99.|
|-N||All variables are numeric. Default: not true (see -U).|
|-o||Once out, always out. If present, any variable that is deleted from the model because it failed the permutation test is excluded from the model permanently to save time. Default: not true.|
|-p||Prior probabilities, one for each class. Default: prior probability for class i is estimated by the proportion of training set observations in class i.|
|-P||Permutation test (instead of using an improvement threshold; see -i). This should be followed by two numbers: the first is the number of permutations to perform, and the second is the "permute_tail." This item tells how many of the permutations need to be better than the unpermuted before a variable is excluded. Default: 0 (that is, use "improvement" (see -i)); if present, permute_tail defaults to 1.|
|-r||Ridge value, used to make some matrices invertible. Default .003: 0 not currently permitted but it should be (?)|
|-s||Slots, that is, number of nearest neighbors to actually look for. (This is a little bigger than the largest value in k_vec because of possible ties.) Default: the larget k + 15. Note: this argument currently has no effect!|
|-S||If present, the training and test sets are identical. In that case row i of the training set is not considered as a possible neighbor to row i of the test set. Default: not present. This is not implemented for discrimination, so using this and -d together will give artificially small error rates.|
|-T||Name of the training set.|
|-t||Name of the test set.|
|-U||All variables are unordered. Default: true.|
|-v||Verbosity level. Default is 0; each "-v" adds 1.|
|-x||Number of cross-validations to do when deciding which k and which variables to use. Default: 10.|
0 5 1 0In this example, classifying a class 0 observation wrongly costs five times as much as classifying a class 1 observation wrongly. As the paper notes, the program handles the 2x2 case, or the case when the cost matrix has constant rows (excepting the diagonal), but not the general case.
# Fascinating file example # Lines starting with "#" are ignored 0 numeric 1 unordered 2 ordered 3 numeric"Ordered" will be changed to "unordered" in a discrimination problem since ordering is only supported in the classification problem. Actually it is currently never supported (see ordered variables above).
Buttrey (2001): Buttrey, S.E., "Discrimination with Categorical Variables," in submission.