ActiveX Software for Visual Basic 6/.NET, C++ 6/.NET, Delphi, Borland C++ Builder: Matrix Maths, Time Series
 
 Home   |   Products   |   Testimonials   |   Prices   |   Support   |   Contact   |   Publications   |   About   |   Buy Now
Quick Links   Home   Purchase   Support
Products   Product Home   ActiveX/COM Components   .NET Components   Version History
Support   Support Home   Installation Help
About Us   Company Info   Clients   Testimonials   Publications   Contact Us

   Input Variable - Feature Selection Software
 

    InputSelection/X 5.0
Input Variable Selection ActiveX Control and COM Object

Product Features  Download  Product FAQ  Screen Shots!   Prices Buy Now

Do you need to determine relevant inputs to a dynamic modeling problem? The Input Selection/X ActiveX DLL will enable you to quickly add the capability to dynamically select relevant inputs (features) for your Windows application. With full source code samples you will be able to easily implement input variable selection in your programs. Download Input Selection/X Trial Version now and you can try it out, even compile programs.

Input Variable Selection

The problem of input variable selection is well known in the task of modeling real world data. In many real world modeling problems, for example in the context of biomedical, industrial, or environmental systems, a problem can occur when developing multivariate models and the best set of inputs to use are not known.

This is particularly true when using neural networks. In this case, unrequired inputs can significantly increase learning complexity. Input variable selection (IVS) is aimed at determining which input variables are required for a model.

Screen shot of an application built in Visual Basic using Input Selection/X.

The task is to determine a set of inputs which will lead to an optimal model in some sense. Problems which can occur due to poor selection of inputs include the following:

    • As the input dimensionality increases, the computational complexity and memory requirements of the model increase.
    • Learning is more difficult with unrequired inputs.
    • Misconvergence and poor model accuracy may result from additional unrequired inputs.
    • Understanding complex models is more difficult than simple models which give comparable results.

This component implements an input variable selection algorithm using higher order cross statistics for each of the individual variables. In this component, the relevant inputs are determined directly, without using independent component analysis as documented in [1]. It is also possible to combine both components to select a potentially smaller number of relevant inputs for modeling.

The input variable selection used in this component is based on performing a statistical test between each of the input variable(s) and the desired output from the model. In some situations there may be dependence between input variables which leads to an overestimation of the number of inputs required. One method to overcome this is to use independent component analysis (ICA) as a preprocessing method. Input Selection/X is based on the method described in [1], but does not use ICA, and is different in several respects. Note that it is possible to use ICA/X as a preprocessor. 

In order to assess the dependence between inputs and the desired system output, we use a method based on higher order cross moments, up to a specified order among the individual terms, and normalized in such a manner as to allow their direct comparison. This statistical measure can be used to establish the independence or otherwise of non-Gaussian signals. These cross moments are defined between the inputs x1,x2,...,xn, individually at time t, and the target output y, with powers up to p=3. Not all cross terms are used, but a selection. The model implements only instantaneous moments, without employing time delays, however it is possible to use lagged regression vectors as inputs to achieve this result. The resulting output is a score vector indicating the dependence of each input on the output. This vector is then classified into to classes using the k-means algorithm to give a binary classification vector. In many instances it may be desirable to consider the score vector results, since the k-means algorithm will not always indicate the most appropriate inputs to consider. For example, it may leave out some inputs, but inspection of the score vector will display inputs which may also be reasonably considered as inputs to the model. Thus, it is possible to use human judgement for the classification of inputs as relevant.

Because the algorithm uses higher order statistics, it is capable of finding inputs in nongaussian and nonlinear processes. At the present time however, we do not implement higher order cross moment terms between inputs. This ensures the computational requirements are relatively low, while providing a reasonable chance of determining the inputs required. Note however, that depending on the statistical nature of the system being considered and the statistical relationships observed in the data, complex functions of inputs may not necessarily be determined. Thus, caution needs to be exercised. It is recommended that periodic testing be carried out to determine the effectiveness of the inputs being used. Also, spurious correlations or dependencies may exist between unrelated variables and hence could lead to falsely included inputs, eg: generated by coupled systems.

Input Selection/X ActiveX and COM Control

Input Selection/X is an ActiveX DLL that can be used in wide range of Windows applications. It requires no user interface and can be accessed by any ActiveX compatible development environment, including VB, Excel, VBA and VC++.

Input Selection/X supports threaded blocking and non-blocking modes. This means for lengthy computations, you can use the control in a program, pass it some data for processing and the program can then run other tasks and respond to user input while the computations are taking place. When processing is complete, an event is fired and the program continues from the data processing step. This blocking/non-blocking mode is under program control. Error codes are returned from the event indicating the success or otherwise of the data processing. The computations can also be interrupted under program control by the user, for example, it is straight forward to implement a "Stop" button to direct the computations to be stopped.

Matrix data used with Input Selection/X and returned from the control can have different index starting values. This means that you can choose to index your data from 0 or 1. Input Selection/X will pass the data back in an array indexed from the value you specify in a property of the control. All data used and returned with Input Selection/X is in double format. This means it is suitable for use with Visual Basic and Visual C++. Moreover, the data is in a format compatible with further numeric processing. Hence, if you wish to use the data with other controls that can use double format arrays, this presents no problems.

References

  1. A.D. Back and T.P. Trappenberg, "Selecting inputs for modelling using normalized higher order statistics and independent component analysis", IEEE Trans. on Neural Networks, Vol. 12, No. 3, pp. 612-617, May, 2001. http://andrewback.com