Home | Products | Testimonials | Prices | Support | Contact | Publications | About | | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Quick Links
Home
Purchase
Support
Products Product Home ActiveX/COM Components .NET Components Version History Support Support Home Installation Help About Us Company Info Clients Testimonials Publications Contact Us |
This version of the paper is for reading online. For a printed version please download either the Postscript or PDF file to obtain best results. If you cannot see the equations online please read these notes.
Input Variable Selection Using Independent Component Analysis and Higher Order StatisticsAndrew D. Back, Andrzej Cichocki and Thomas P. TrappenbergRIKEN Brain Science Institute
|
| (1) |
More recently, model based approaches using nonlinear models such as neural networks have been proposed. The Automatic Relevance Detection (ARD) approach proposed by Neal [2] is based on Bayesian statistics. Intuitively, the aim of the ARD algorithm is to reduce all unnecessary weights in the network to zero. Another neural network approach using backward elimination was proposed by Moody and Utans . In this case, input variables which have the smallest sensitivity with respect to the network output(s) are removed.
A model-free IVS method based on mutual information was proposed by Bonnlander [1]. The idea in this case is that the relevant inputs are found by estimating the mutual information between the inputs and the desired outputs. This approach performs significant computations in order to make a simple binary decision about the relevance of each input to the desired output. However, since some of the inputs may be dependent, for n input variables, it is necessary to test 2n - 1 subsets of inputs.
While many methods have been proposed, typically they all rely on some form of statistical test between the inputs and the desired output(s). Implicit in these methods is the assumption that the inputs observed are `true' signals that can be used in the model. Assume we have a generating system given by
| (2) |
| (3) |
In this paper, we consider a different problem, but one which may arise readily in practice. Here we consider the situation where we do not have access to the true outputs from a system at all, but only to some mixture of those signals. Such mixtures may arise in various ways. For example, coupled nonlinear differential equations or coupled feedback models. This will be discussed further in Section 3.2. In such cases, we would like to model the observed signals using the available measured data, however conventional statistical tests will not necessarily reveal which inputs are required. Instead, the dependence between measured signals will simply indicate that all input variables may be required.
We propose that a method of input variable transformation (IVT) is required to ensure that the inputs applied to a model are made as independent as possible for the purpose of modelling multivariate time series. Such a method can be obtained using the recently developed procedure known as independent component analysis (ICA). While ICA has been applied to transforming data in the past [4], our application of ICA together with IVS appears to be new. In this paper, we aim to provide an understanding of the issues involved in selecting input variables and moreover, to indicate how ICA can be effectively applied to the problem.
The paper is organized as follows. A brief overview of ICA is given in Section 2. In Section 3 we describe the proposed methodology of input variable transformation using ICA. An example using the method is given in Section 4. Conclusions are given in Section 5.
Independent component analysis extends the method of principal component analysis (PCA) from decorrelating mixed signals (using second order statistics) to unmixing non-Gaussian signals such that they become statistically independent. Independent component analysis of instantaneously mixed signals is defined as follows. A multivariate time series x(t) = [ x1(t),x2(t),...,xn(t)]¢ is obtained from a mixture of signals defined by
| (4) |
| (5) |
If W = A-1, y(t) = s(t) (here we ignore the problem of permutations and scaling factors) then the input signals will be obtained without error [5]. To find such a matrix W, the assumptions made are that the sources { sj(t)} are statistically independent, at most one signal has a Gaussian pdf, and that the signals are stationary.
Various algorithms for ICA have been proposed in the literature, see for example [6,7,8,9,10]. For the purposes of this paper, we assume the user selects an algorithm which is suited to the type of signals being separated.
The problem we consider in this case is as follows. Suppose there exists a system described by
| (6) |
| (7) |
| (8) |
For input variable selection, it is normally assumed that there exists some subset of inputs xa required by the model which can be used to accurately approximate the unknown system. In the past, there have been many input variable selection methods proposed to determine which inputs { xi} to a model are required. However for equivalent model and system structures, since the true inputs are sa not xa, there may not exist such a subset at all. If the true inputs sa are not observable, then regardless of the variable selection tests applied, there does not necessarily exist any set of inputs xa which is the correct set, apart from the whole set of inputs.
Hence we require a method of transforming x to an estimate of sa and hence a form in which we can apply an input variable selection test. Whereas in the past only input variable selection methods have been considered, in this paper, we propose that a method of input variable transformation is also required. This ensures that the inputs applied to a model are independent (or as close as possible) and hence the input variable tests can proceed on each input independently. We propose that a method of finding [^(s)] = Wx as required can be achieved using independent component analysis. This may seem to be a small advance in the use of ICA, however the issue of input variable selection is by no means trivial. Moreover, this is the first time that we are aware of, that ICA has been applied to IVS.
Consider a system as described in (7). The true inputs sa are unobservable but instead we have available a vector of measurements x. Since sa Í s, instead of the model given in (9), we seek a model
| (9) |
In contrast to the system identification case above, where there is a desired output available as in (7), the prediction case can be considered. Here the desired output signal can be generated using the one step ahead prediction of an input. In this manner we may consider prediction models.
We assume that there exists a general nonlinear dynamic system which generates a time series {y(t)} given by
| (10) |
Note that implicit in the model Fo, we assume there exists the possibility of tapped delay lines occurring in exactly the same manner as a linear system. That is, if the input is y(t), then the model may obtain as inputs, y(t-1),y(t-2)...y(t-m).
When Fo is some highly nonlinear dynamic system and we cannot observe u(t), the problem of estimating or predicting y(t) is difficult. There are a number of common simplifications and assumptions that are typically and often implicitly, applied to prediction models. A common assumption is that u(t) either does not exist, or that it plays a minor part in determining y(t). Hence we may have a prediction model given by
| (11) |
| (12) |
| (13) |
Consideration of the generating models in this framework allows us to intelligently prescribe models for the prediction task. We consider below two situations where a method of input variable selection is required for prediction models which may have dependencies between the inputs.
Consider a feedback system of the form
| (14) |
We may begin by asking how strongly these systems are coupled. If the submodels are fully interconnected, then all outputs are fedback as inputs to all other submodels. If the submodels are not fully interconnected, then the system can be written as
| (15) |
| (16) |
In this situation, we need to apply a statistical test to each yi(t-1) to estimate the dependencies and hence the coupling between modules in the generating system. For non-Gaussian signals y, an obvious approach is to use a test based on higher order statistics.
The problem of input variable selection can be clearly compounded by dependencies between the input variables. This may arise due to coupling between inputs, for example, due to coupled feedback systems or it may arise due to the observed data being mixed. In the past, this problem has been recognized (see for example [1]). The general approach has evidently been to use an exhaustive subset selection method to choose the best subset of input variables.
In the case where the observed data is simply a mixed version of the `true' signals, we may observe z, where
| (17) |
| (18) |
| (19) |
In this section we propose a method of overcoming the input variable selection problems identified previously. As shown above, input variable selection may suffer from the problem of dependent inputs which could be due to various reasons. In the past, it has been assumed that because of the dependencies between the inputs, the only feasible solution to determine the best input subset is to use exhaustive tests. In addition, there is a strong school of thought which dictates that model-based testing methods give better indications of the suitability of the input subset than model-free methods which further adds to the computational burden and difficulty of IVS.
Our approach is to apply ICA to preprocess the observed data. This in itself is not new. However, instead of simply filtering out the noisy data or removing artifacts on some arbitrary basis, it is suggested that ICA be used as a preprocessing method to obtain new input signals that are as independent as possible. If all the ICs are completely independent then we are satisifed also. However, even we if we can obtain groups of signals which are independent and some signals within the groups are dependent in some sense, then we can reduce the computational burden by only considering subsets within those groups. Thus, ICA is used as an integral part of IVS.
The particular IVS test we propose is based on the fourth order cross cumulants. This statistical measure can be used to establish the independence or otherwise of two non-Gaussian signals. The fourth order cross cumulant is defined as [14]
| (20) |
In practice, we find that taking a slice of the fourth order cumulants is sufficient to perform the test. Hence, for an input signal x, obtained as the output from the ICA algorithm, we compute the fourth order cross cumulant with the desired output y. For example, we have
| (21) |
| (22) |
To the best of our knowledge, this test has not been proposed in the literature previously. The reason for this is likely to be the fact that it depends on the inputs to be independent, hence without such a mechanism being available in the past to ensure this independence, such a test would have failed. The test provide a methods of avoiding the expensive model-based tests for IVS that have been very commonly used in the past, presumably as a means of overcoming the problem of dependent inputs. Thus, it can be observed that ICA provides a means of introducing a very simple, yet powerful IVS test which could not be used otherwise.
The proposed algorithm is summarized below.
ICA-HOS IVT Algorithm
| (23) |
Here we give an example which demonstrates the ICA-HOS input variable selection method proposed in the paper. The example we choose is an artificial problem which nevertheless demonstrates the method.
Consider a set of four independent signals y = [y1,y2, y3,y4]¢ which are assumed to be generated by four independent systems, where
| (24) |
Now let us suppose that there is a nonlinear model Fo, in which
| (25) |
Out of the four signals, only two are used to generate the output of interest. We observe z which is an n×1 vector of inputs to our model. The aim of the proposed method is to determine a transformation of z which will lead to a more accurate prediction.
Using the method proposed in this paper, we apply ICA to the measured signals z to obtain [^(y)], an estimate of y. We use the fourth order cumulant test on input variables as described in Section 4. The cross-cumulants are taken between the input signals and desired output. High values imply dependence and that the signal is required. In this case, we select K = 2 as the threshold value.
Applying the IVS test to the raw inputs z, we find that the fourth order cumulants values are:
| (26) |
The cross-cumulant values2 obtained are
| (27) |
To effectively model and predict multivariate time series data such as that generated by coupled dynamical systems, it is important to determine the necessary inputs to the model and remove those inputs not required. Traditional methods of input variable selection may not deal with signals that become mixed and hence dependent. Hence in this paper, we show that the recently introduced ICA method, in conjunction with a proposed new method of input variable selection, is ideally suited to reducing the dimensionality of the input space and hece in the performance in prediction problems.
This test has not been proposed before due to the fact that the inputs are often dependent. In this way, it overcomes computationally expensive tests such as model-based methods and in addition, reduces the overall number of tests required from 2n-1 to n for an n input model. ICA provides a powerful approach to the problem of input variable selection in complex model building.
The first author would like to thank Thomas Trappenberg and Sethu Vijayakumar for valuable discussions.
1 It may appear that at that point all we achieve is a model of the input, however there are at least two major approaches that we may use for k-step prediction: (i) (classical) time delayed input training [12] and (ii) iterative prediction [13].
2 Note that the cross-cumulants may be negative. We distinguish dependent from independent signals by means of the absolute magnitude of the cumulants.
|