Ridge und LASSO eine Kovarianzstruktur gegeben?

11

(yXβ)TV1(yXβ)+λf(β),   (1)

(yXβ)(yXβ)+λf(β).            (2)
This was mainly motivated by the fact that in my particular application, we have different variances for the y (and sometimes even a covariance structure that can be estimated) and I would love to include them in the regression. I did it for ridge regression: at least with my implementation of it in Python/C, I see that there are important differences in the paths that the coefficients trace, which is also notable when comparing the cross-validation curves in both cases.

I was now preparing to try to implement the LASSO via Least Angle Regression, but in order to do it I have to prove first that all its nice properties are still valid when minimizing (1) instead of (2). So far, I haven't seen any work that actually does all this, but some time ago I also read a quote that said something like "those who don't know statistics are doomed to rediscover it" (by Brad Efron, perhaps?), so that's why I'm asking here first (given that I'm a relative newcomer to the statistics literature): is this already done somewhere for these models? Is it implemented in R in some way? (including the solution and implementation of the ridge by minimizing (1) instead of (2), which is what's implemented in the lm.ridge code in R)?

Thanks in advance for your answers!

Néstor
quelle
The previous answer is also reported with more details in en.wikipedia.org/wiki/Generalized_least_squares The solution can be implemented using a Feasible Generalised Least Square (FGLS) approach
Nicola Jean

Antworten:

13

If we know the Cholesky decomposition V1=LTL, say, then

(yXβ)TV1(yXβ)=(LyLXβ)T(LyLXβ)
and we can use standard algorithms (with whatever penalization function one prefers) by replacing the response with the vector Ly and the predictors with the matrix LX.
NRH
quelle