|
|
|
InputSelection/X 5.0
|
Input Variable Selection ActiveX Control and COM Object |
Do you need to determine
relevant inputs to a dynamic modeling problem?
The Input Selection/X ActiveX DLL will enable you to quickly
add the
capability to dynamically select relevant inputs (features)
for your Windows application. With full source code samples you will
be able to easily implement input variable selection in your programs. Download
Input Selection/X Trial Version now and you can try it out, even compile programs.
Input Variable Selection
The problem
of input variable selection is well known in the task of modeling real world
data. In many real world modeling problems, for example in the context of
biomedical, industrial, or environmental systems, a problem can occur when
developing multivariate models and the best set of inputs to use are not
known.
This is particularly true when using neural networks. In
this case, unrequired inputs can significantly increase learning complexity.
Input variable selection (IVS) is aimed at determining which input variables are
required for a model.
|
Screen
shot of an application built in Visual Basic using Input Selection/X. |
The task is to determine a set of inputs which will lead
to an optimal model in some sense. Problems which can occur due to poor
selection of inputs include the following:
- As the input dimensionality increases, the
computational complexity and memory requirements of the model increase.
- Learning is more difficult with unrequired inputs.
- Misconvergence and poor model accuracy may result
from additional unrequired inputs.
- Understanding complex models is more difficult than
simple models which give comparable results.
This component implements an input variable selection algorithm using
higher order cross statistics for each of the individual variables. In this
component, the relevant inputs are determined
directly, without using independent component analysis as documented in [1]. It is
also possible to combine both components to select a potentially smaller number
of relevant inputs for modeling.
The input variable selection used in this component is based on performing a
statistical test between each of the input variable(s)
and the desired output from the model. In some situations there may be
dependence between input variables which leads to an overestimation of the
number of inputs required. One method to overcome this is to use independent
component analysis (ICA) as a preprocessing method. Input Selection/X is based
on the method described in [1], but does not use ICA, and is different in
several respects. Note that it is possible to use ICA/X as a preprocessor.
In order to assess the dependence between inputs and the desired system
output, we use a method based on higher order cross moments, up to a
specified order among the individual terms, and normalized in such a manner as
to allow their direct comparison. This statistical measure can be used to
establish the independence or otherwise of non-Gaussian signals. These cross
moments are defined between the inputs x1,x2,...,xn,
individually at time t, and the target output y, with powers up to p=3. Not all cross terms are
used, but a selection. The model implements only instantaneous moments, without employing time delays,
however it is possible to use lagged regression vectors as inputs to achieve
this result. The resulting output is a score vector indicating the
dependence of each input on the output. This vector is then classified into to
classes using the k-means algorithm to give a binary classification vector. In
many instances it may be desirable to consider the score vector results, since
the k-means algorithm will not always indicate the most appropriate inputs to
consider. For example, it may leave out some inputs, but inspection of the score
vector will display inputs which may also be reasonably considered as inputs to
the model. Thus, it is possible to use human judgement for the classification of
inputs as relevant.
Because the algorithm uses higher order statistics, it
is capable of finding inputs in nongaussian and nonlinear processes. At the
present time however, we do not implement higher order cross moment terms
between inputs. This ensures the computational requirements are relatively low,
while providing a reasonable chance of determining the inputs required. Note
however, that depending on the statistical nature of the system being considered
and the statistical relationships observed in the data, complex functions of
inputs may not necessarily be determined. Thus, caution needs to be exercised.
It is recommended that periodic testing be carried out to determine the
effectiveness of the inputs being used. Also, spurious correlations or
dependencies may exist between unrelated variables and hence could lead to
falsely included inputs, eg: generated by coupled systems.
Input Selection/X ActiveX and COM Control
Input
Selection/X is an ActiveX DLL that can be used in wide range of Windows applications. It
requires no user interface and can be accessed by any ActiveX compatible development environment, including VB, Excel,
VBA and VC++.
Input Selection/X supports threaded blocking and non-blocking
modes. This means for lengthy computations, you can use the control in
a program, pass it some data for processing and the program can
then run other tasks and respond to user input while the computations are taking
place. When processing is complete, an event is fired and the program
continues from the data processing step. This blocking/non-blocking mode is
under program control. Error codes are returned from the event indicating the
success or otherwise of the data processing. The computations can also be
interrupted under program control by the user, for example, it is straight
forward to implement a "Stop" button to direct the computations to be
stopped.
Matrix data used with Input Selection/X and
returned from the control can have different index starting values. This means
that you can choose to index your data from 0 or 1. Input Selection/X will pass the data
back in an array indexed from the value you specify in a property of the
control. All data used and returned with Input Selection/X is in double format. This means
it is suitable for use with Visual Basic and Visual C++. Moreover, the data is
in a format compatible with further numeric processing. Hence, if you wish to
use the data with other controls that can use double format arrays, this
presents no problems.
References
-
A.D.
Back and T.P. Trappenberg, "Selecting inputs for modelling using normalized
higher order statistics and independent component analysis", IEEE Trans. on
Neural Networks, Vol. 12, No. 3, pp. 612-617, May, 2001. https://andrewback.com
|
|