Comparison

with Knockoff Filters in Low-dimensional Case

False discovery rate (FDR), formally defined

as the expected fraction of false chosen features over all selected variables,

is of high importance when we carry out a model selection procedure.

Controlling this criteria at low-level guarantees most of the selected

variables are true and reproducible. In this chapter, we are going to introduce

a method, called Knockoff Filters, to achieve this target when dealing with low

dimension cases, i.e. when there are more observations than candidate variables

(n>p). This method could also be generalized to work on high-dimensional

logistic regression, but will not be covered in this paper. (For details, see

)

Knockoff

Filter (KF)

As in the previous chapters, we are going

to build the method on lasso regression. Our target is to build some sensible

test statistics, which could be used to test the null hypothesis: ?_j=0 for

each candidate variable. An important observation is that this method does not

require any knowledge on the noise level ?, neither in the dummy variable

construction nor in the FDR controlling theory. Steps of the method is given as

following:

Construct

knockoff variables

For each candidate variable X_j?jth column of the n×p design matrix X?, we

normalize it such that the Gram matrix ?=X^T X satisfies? ??_jj=?||X_j ||?_2^2=1. Then we construct a

knockoff copy (X_j ) ? obeying the following properties: X ?^T X ?=?, X^T X ?=?-diag{s}, where s is a pre-determined p-dimensional non-negative

vector. By definition, X ? has the same correlation structure as the original

matrix since X_j^T (X_k ) ?=X_j^T X_k for all j?k. To ensure that the Knockoff

Filter is powerful enough to differentiate the true variables from noise ones,

the entries in s should be as large as possible so that (X_j ) ? is not too

similar to X_j.

There are various strategies to construct

such knockoff variables: Fixed-X knockoffs, Model-X Gaussian knockoffs etc. A

tradeoff exists when choosing from these two methods, as the former does not

require knowledge of the data generating process, at an expense that the

complementary statistics should satisfy the “sufficiency” and “antisymmetry”

property to perform FDR control.

Model-X Gaussian knockoffs X ? is

constructed obeying the following two properties:

For

any subset S?{1,2,…,p}, ?(X,X ?)?_(swap(S))~(X,X ?). This property is

called the pairwise exchangeability: swapping the columns of any subset of

variables and their knockoffs keeps the joint distribution invariant.

X

??Y|X. Note this is guaranteed if Y is not used in the construction.

Note that ?(X,X ?)?_(swap(S)) is obtained from (X,X ?) by swapping the columns X_j and

(X_j ) ? for every j?S. For example, ?(X_1,X_2,X_3,(X_1

) ?,(X_2 ) ?,(X_3 ) ?)?_(swap({1,2))=((X_1 ) ?,(X_2 )

?,X_3,X_1,X_2 ,(X_3 ) ?).

Sequential Conditional Independent Pairs

gives an explicit construction:

For (j in 1: p) {sample (X_j ) ? from

L(X_j ?| X_(-j),X ?_(1:j-1)) }

To see why this algorithm produces knockoff

variables satisfying the pairwise exchangeability condition, refer to

Appendix B.

Determine

appropriate statistic

In , it is defined that a statistic W

has 1) the sufficiency property if W depends only on the Gram matrix and on

feature-response inner products. 2) the antisymmetry property if swapping X_j

and (X_j ) ? would result in a change of sign of W_j.

We are going to introduce two test

statistics that are most related to this paper:

Importance

statistics based on the lasso with cross-validation

Fit a linear regression model via penalized

maximum likelihood and cross-validation. Then, compute the difference statistic

W_j=|Z_j |-|(Z_j ) ? |, where Z_j and (Z_j ) ? are the coefficient estimates

for the jth variable and its knockoff, respectively. However, this statistic

does not satisfy the “sufficiency” condition, which potentially fails the

mechanism of FDR controlling, in particular, when pairing with the Fixed-X

knockoffs.

Penalized

linear regression statistics for knockoff

Compute the signed maximum statistic W_j=?max?(Z?_j,(Z_j ) ?)×?sign?(Z?_j-(Z_j ) ?), where Z_j and (Z_j ) ? are

the maximum values of ? at which the jth variable and its knockoff,

respectively, enter the penalized linear regression model.

We would expect Z_j and (Z_j ) ? to be large for most of the true variables

and small for null features, because a large value indicates that this feature

enters the Lasso model early. On the other hand, a positive value of W_j would

suggest X_j being selected before its knockoff (X_j ) ?. As a result, to reject

the null hypothesis that a candidate variable is a noise: X_j=0, we need to

have a large positive value of W_j.

Calculate

data-dependent statistic

In this section, we are going to focus on

the second statistic from the previous section and explain briefly why the

model selecting procedure would perform FDR control. Let’s remind ourselves how

FDR is defined: FDP=(#{j:?_j=0 and j?S ?})/(#{j:j?S ? } ), FDR=E(FDP).

Let W={|W_j |:j=1,…,p} and suppose q is the

target FDR, we can define a variable T, which depends on the data, to be a

threshold:

(Knockoff) T=min?{t?W:

(#{j:W_j?-t})/(#{j:W_j?t} )?q}, with model S ?={j:W_j?T}

For a noise feature, it is equally likely

by our construction whether the original variable X_j or its knockoff (X_j ) ?

being selected first into the mode: #{null j:W_j?-t} is equal in distribution

to #{null j:W_j?t}.

(FDP) ?(t)?(#{j:W_j?-t})/(#{j:W_j?t} )?(#{null j:W_j?-t})/(#{j:W_j?t} )?(#{null

j:W_j?t})/(#{j:W_j?t} )=:FDP

Note that the inequality is usually tight

since most impactful signals will be selected earlier than their knockoffs,

i.e. #{j:?_j?0 and W_j?-t} is small (only one red square in the example of

figure ?). Hence (FDP) ?(t) can be used as an estimate of FDP

under knockoff filter whose magnitude is upper-bounded, according to the

definition of threshold t. This result therefore inspires us to control a

quantity that is very close to FDR:

Theorem: For q?0,1, the knockoff method

satisfies E(#{j:?_j=0 and j?S ?})/(#{j:j?S ? }+q^(-1) )?q,

where expectation is taken over the

Gaussian noise while treating X and X ? fixed.

This quantity converges to the real FDR asymptotically,

since q^(-1) on the denominator would have little impact as the model size

increases. However, we would still like to manage to control the exact FDR, and

this could be achieved by setting the threshold in a slightly conservative way

as following (Conservative meaning that T_+?T):

(Knockoff+) T_+=min?{t?W:

(1+#{j:W_j?-t})/(#{j:W_j?t} )?q}, with model S ?={j:W_j?T_+}

The additional “1” on the numerator is

essential to derive FDR control theory when there are extremely few

discoveries.

Theorem: For q?0,1, the knockoff+ method

satisfies FDR=E(#{j:?_j=0 and j?S ?})/(#{j:j?S ? } )?q,

where expectation is taken over the

Gaussian noise while treating X and X ? fixed.