## Comparison rate (FDR), formally defined as the expected fraction

Comparison
with Knockoff Filters in Low-dimensional Case

False discovery rate (FDR), formally defined
as the expected fraction of false chosen features over all selected variables,
is of high importance when we carry out a model selection procedure.
Controlling this criteria at low-level guarantees most of the selected
variables are true and reproducible. In this chapter, we are going to introduce
a method, called Knockoff Filters, to achieve this target when dealing with low
dimension cases, i.e. when there are more observations than candidate variables
(n>p). This method could also be generalized to work on high-dimensional
logistic regression, but will not be covered in this paper. (For details, see
)

We Will Write a Custom Essay Specifically
For You For Only \$13.90/page!

order now

Knockoff
Filter (KF)

As in the previous chapters, we are going
to build the method on lasso regression. Our target is to build some sensible
test statistics, which could be used to test the null hypothesis: ?_j=0 for
each candidate variable. An important observation is that this method does not
require any knowledge on the noise level ?, neither in the dummy variable
construction nor in the FDR controlling theory. Steps of the method is given as
following:

Construct
knockoff variables

For each candidate variable X_j?jth column of the n×p design matrix X?, we
normalize it such that the Gram matrix ?=X^T X satisfies?  ??_jj=?||X_j ||?_2^2=1. Then we construct a
knockoff copy (X_j ) ? obeying the following properties: X ?^T X ?=?, X^T X ?=?-diag{s}, where s is a pre-determined p-dimensional non-negative
vector. By definition, X ? has the same correlation structure as the original
matrix since X_j^T (X_k ) ?=X_j^T X_k for all j?k. To ensure that the Knockoff
Filter is powerful enough to differentiate the true variables from noise ones,
the entries in s should be as large as possible so that (X_j ) ? is not too
similar to X_j.

There are various strategies to construct
such knockoff variables: Fixed-X knockoffs, Model-X Gaussian knockoffs etc. A
tradeoff exists when choosing from these two methods, as the former does not
require knowledge of the data generating process, at an expense that the
complementary statistics should satisfy the “sufficiency” and “antisymmetry”
property to perform FDR control.

Model-X Gaussian knockoffs X ? is
constructed obeying the following two properties:

For
any subset S?{1,2,…,p}, ?(X,X ?)?_(swap(S))~(X,X ?). This property is
called the pairwise exchangeability: swapping the columns of any subset of
variables and their knockoffs keeps the joint distribution invariant.

X
??Y|X. Note this is guaranteed if Y is not used in the construction.

Note that ?(X,X ?)?_(swap(S)) is obtained from (X,X ?) by swapping the columns X_j and
(X_j ) ? for every j?S. For example, ?(X_1,X_2,X_3,(X_1
) ?,(X_2 ) ?,(X_3 ) ?)?_(swap({1,2))=((X_1 ) ?,(X_2 )
?,X_3,X_1,X_2  ,(X_3 ) ?).

Sequential Conditional Independent Pairs
gives an explicit construction:

For (j in 1: p) {sample (X_j ) ? from
L(X_j  ?| X_(-j),X ?_(1:j-1)) }

To see why this algorithm produces knockoff
variables satisfying the pairwise exchangeability condition, refer to
Appendix B.

Determine
appropriate statistic

In , it is defined that a statistic W
has 1) the sufficiency property if W depends only on the Gram matrix and on
feature-response inner products. 2) the antisymmetry property if swapping X_j
and (X_j ) ? would result in a change of sign of W_j.

We are going to introduce two test
statistics that are most related to this paper:

Importance
statistics based on the lasso with cross-validation

Fit a linear regression model via penalized
maximum likelihood and cross-validation. Then, compute the difference statistic
W_j=|Z_j |-|(Z_j ) ? |, where Z_j and (Z_j ) ? are the coefficient estimates
for the jth variable and its knockoff, respectively. However, this statistic
does not satisfy the “sufficiency” condition, which potentially fails the
mechanism of FDR controlling, in particular, when pairing with the Fixed-X
knockoffs.

Penalized
linear regression statistics for knockoff

Compute the signed maximum statistic W_j=?max?(Z?_j,(Z_j ) ?)×?sign?(Z?_j-(Z_j ) ?), where Z_j and (Z_j ) ? are
the maximum values of ? at which the jth variable and its knockoff,
respectively, enter the penalized linear regression model.

We would expect Z_j and (Z_j ) ?  to be large for most of the true variables
and small for null features, because a large value indicates that this feature
enters the Lasso model early. On the other hand, a positive value of W_j would
suggest X_j being selected before its knockoff (X_j ) ?. As a result, to reject
the null hypothesis that a candidate variable is a noise: X_j=0, we need to
have a large positive value of W_j.

Calculate
data-dependent statistic

In this section, we are going to focus on
the second statistic from the previous section and explain briefly why the
model selecting procedure would perform FDR control. Let’s remind ourselves how
FDR is defined: FDP=(#{j:?_j=0 and j?S ?})/(#{j:j?S ? } ), FDR=E(FDP).

Let W={|W_j |:j=1,…,p} and suppose q is the
target FDR, we can define a variable T, which depends on the data, to be a
threshold:

(Knockoff) T=min?{t?W:
(#{j:W_j?-t})/(#{j:W_j?t} )?q}, with model S ?={j:W_j?T}

For a noise feature, it is equally likely
by our construction whether the original variable X_j or its knockoff (X_j ) ?
being selected first into the mode: #{null j:W_j?-t} is equal in distribution
to #{null j:W_j?t}.

(FDP) ?(t)?(#{j:W_j?-t})/(#{j:W_j?t} )?(#{null j:W_j?-t})/(#{j:W_j?t} )?(#{null
j:W_j?t})/(#{j:W_j?t} )=:FDP

Note that the inequality is usually tight
since most impactful signals will be selected earlier than their knockoffs,
i.e. #{j:?_j?0 and W_j?-t} is small (only one red square in the example of
figure ?). Hence (FDP) ?(t) can be used as an estimate of FDP
under knockoff filter whose magnitude is upper-bounded, according to the
definition of threshold t. This result therefore inspires us to control a
quantity that is very close to FDR:

Theorem: For q?0,1, the knockoff method
satisfies E(#{j:?_j=0 and j?S ?})/(#{j:j?S ? }+q^(-1) )?q,

where expectation is taken over the
Gaussian noise while treating X and X ? fixed.

This quantity converges to the real FDR asymptotically,
since q^(-1) on the denominator would have little impact as the model size
increases. However, we would still like to manage to control the exact FDR, and
this could be achieved by setting the threshold in a slightly conservative way
as following (Conservative meaning that T_+?T):

(Knockoff+) T_+=min?{t?W:
(1+#{j:W_j?-t})/(#{j:W_j?t} )?q}, with model S ?={j:W_j?T_+}

The additional “1” on the numerator is
essential to derive FDR control theory when there are extremely few
discoveries.

Theorem: For q?0,1, the knockoff+ method
satisfies FDR=E(#{j:?_j=0 and j?S ?})/(#{j:j?S ? } )?q,

where expectation is taken over the
Gaussian noise while treating X and X ? fixed.