Convergence of Hard Thresholding Algorithm

We present a greedy-based approximation algorithm designed to reconstruct the vector $x$. It is applicable to matrices $A$ that satisfy the condition $\delta_{3s}\leq\frac 1{12}$. Additionally, the article includes a proof demonstrating the convergence of this algorithm.

Context

Let ARN×pA\in\mathbb{R}^{N\times p} be a “compressing matrix,” where N<<pN<<p. Let Az=yAz=y. It is generally not possible to recover xx without any further constraint. The field of compressed sensing adds the constraint that z is ssparses-sparse and asks the following two questions.

(1) What matrix A should we use to ensure the perfect recovery of the sparse vector xx?

(2) How do we go about finding such algorithm?

For what follows, an excellent reference is available here.

It can be shown that the problem of

minz{s-sparse}z1subject toAz=Ax for an s-sparse vector x \min_{z \in \{s\text{-sparse}\}} \| z \|_1 \quad \text{subject to} \quad Az = Ax \text{ for an } s\text{-sparse vector } x

has unique solution if and only if A satisfies the restricted nullspace property. There are two sufficient conditions of AA that enable this.

(1) Pairwise incoherence ASTASI12s,Ss\lVert A_S^TA_S-I\rVert_\infty\leq \dfrac 1{2s},\forall\lvert S\rvert\leq s

(2) Restricted Isometry property δ=maxS:S=sASTASI213\delta=\max_{S:\lvert S\rvert=s}\lVert A^T_SA_S-I\rVert_2\leq \dfrac 13

One way to obtain such matrix AA in (1) is to utilize the randomized matrix by Johnson-Lindenstrauss type inequality. However, the resulting dimension NN is not generally small enough. Specifically, we want O(s)O(s) but we have an order of O(s2)O(s^2).

For condition (2), while the complexity of NN is more favorable with O(sconstant)O(s*constant), the difficulty lies in the fact that the only matrices known to satisfy this condition are those that are randomized by normal distribution.

Given these theoretical challenges, I want to introduce an elegant greedy-based approximation algorithm to recover xx that applies to the matrix AA with δ3s112\delta_{3s}\leq\frac 1{12}, and provides proof of its convergence.

Iterative Hard Thresholding Algorithm

Algorithm: Iterative Sparse Minimization

Objective: Minimize the L1L_1-norm of zz subject to Az=yAz=y.

Initialization: Choose an initial guess z0z_0 and set t=0t=0.

Repeat until convergence:

  1. Gradient Step:

    Update aa by:

    at+1=ztAT(Azty) a_{t+1} = z_t - A^T(Az_t - y)

  2. Sparsity Constraint:

    Update zz by finding the closest ss-sparse vector to at+1a_{t+1}:

    zt+1=argminz{s-sparse}at+1z2 z_{t+1} = \arg\min_{z \in \{s\text{-sparse}\}} \| a_{t+1} - z \|_2

  3. Update Iteration:

    Set t = t + 1 .

Output: Return zt z_t as the solution.

Theorem Let RIP constant δ3s112\delta_{3s}\leq\frac 1{12}. Then we have linear convergence of the above algorithm. That is,

zt+1z2βz0z2 \lVert z_{t+1}-z^*\rVert_2\leq \beta\lVert z_0-z^*\rVert_2

for some β<1\beta<1.

proof: Let S=supp(z)supp(zt+1)S=supp(z^*)\cup supp(z_{t+1}). We have

zt+1z2=zt+1,SzS2zt+1,Sat+1,S2+at+1,SzS2zSat+1,S2+at+1,SzS2=2zSat+1,S2 \begin{aligned} \lVert z_{t+1} - z^*\rVert_2 &= \lVert z_{t+1, S} - z^*_S\rVert_2 \\ &\leq \lVert z_{t+1, S}-a_{t+1,S}\rVert_2+\lVert a_{t+1,S}-z^*_S\rVert_2 \\ &\leq \lVert z_S^*-a_{t+1,S}\rVert_2+\lVert a_{t+1,S}-z_S^*\rVert_2 \\ &=2\lVert z_S^*-a_{t+1,S}\rVert_2 \end{aligned}

Now, we have

zSat+1.S2=zSzt,S+ASTA(ztz)2=rt,S+ASTArt2=rt,S+ASTArt,S+ASTArt,SC2=rt,S+ASTArt,S2+ASTArt,SC2=(IASTAS)rt+ASTASCrtδ3srt2+ASTArt,S/S2=δ3srt2+ASTAS/Srt,2 \begin{aligned} \lVert z^*_S-a_{t+1.S}\rVert_2 &= \lVert z_S^*-z_{t,S}+A_S^T A(z_t-z^*)\rVert_2\\ &= \lVert -r_{t,S}+A_S^TAr_t\rVert_2 \\ &= \lVert -r_{t,S}+A_S^T Ar_{t,S}+A_S^T Ar_{t,S^C}\rVert_2 \\ &=\lVert -r_{t,S}+A_S^T Ar_{t,S} \rVert_2 + \lVert A_S^T Ar_{t,S^C}\rVert_2 \\ &=\lVert (I-A_S^T A_S)r_t\rVert + \lVert A_S^T A_{S^C}r_t\rVert \\ &\leq \delta_{3s}\lVert r_t\rVert_2 + \lVert A_S^T A r_{t, S^-/S}\rVert_2 \\ &= \delta_{3s}\lVert r_t\rVert_2 + \lVert A_S^T A_{S^-/S} r_t, \rVert_2 \end{aligned}

where rtr_t is supported on SS^-, rt:=ztzr_t:=z_t-z^* and ASA_S is a matrix with zero on columns S.

Now, it suffices to show that ASTAS/S22δ3s\lVert A_S^T A_{S^-/S}\rVert_2\leq 2\delta_{3s}, since we would then have the desired inequality with β=12t\beta=\frac 1{2^t} by combining two inequalities above.

To this end, let x2,y2=1\lVert x\rVert_2, \lVert y\rVert_2=1. We have

xT(ASTAS/S)y=xST(ATA)yS/S=14(AxS+AyS/S22AxSAyS/S22)=14(xS+yS/S2+(xS+yS/S)T(ATAI)(xS+yS/S)2xSyS/S22(xSyS/S)T(ATAI)(xSyS/S))=14((xS+yS/S)T(ATAI)(xS+yS/S)(xSyS/S)T(ATAI)(xSyS/S))2δ3s \begin{aligned} x^T(A_S^T A_{S^-/S}) y &= x_S^T (A^T A) y_{S^-/S} \\ &=\frac 14(\lVert Ax_S+Ay_{S^-/S}\rVert_2^2-\lVert Ax_S-Ay_{S^-/S}\rVert_2^2) \\ &=\frac 14(\lVert x_S+y_{S^-/S}\rVert^2+(x_S+y_{S^-/S})^T(A^TA-I) \\ & (x_S+y_{S^-/S})-\lVert_2 x_S-y_{S^-/S}\rVert_2^2-(x_S-y_{S^-/S})^T \\ & (A^TA-I)(x_S-y_{S^-/S})) \\ &=\frac 14((x_S+y_{S^-/S})^T(A^TA-I)(x_S+y_{S^-/S}) \\ & -(x_S-y_{S^-/S})^T (A^TA-I)(x_S-y_{S^-/S})) \\ & \leq 2 \delta_{3s} \end{aligned}

where the last inequality due to the fact that xS±yS/Sx_S\pm y_{S^-/S} is 3s-sparse (recall that S=supp(rt)supp(z)supp(zt)S^-=supp(r_t)\subset supp(z^*)\cup supp(z_t)). Therefore, we have established the claim zt+1z2βz0z2\lVert z_{t+1}-z^*\rVert_2\leq \beta\lVert z_0-z^*\rVert_2 with β=12t\beta=\frac 1{2^t}.

Conclusion

We have established the convergence of the Hard Thresholding algorithm for reconstructing the vector x. In today’s landscape, where data storage is relatively inexpensive, the practicality of compressed sensing, or reconstruction problems, may not be immediately relevant for everyday data scientists. Nevertheless, the underlying mathematics of this field is quite fascinating, particularly in the context of exploring different trade-offs.

References

Simon Foucart, H. R. (2012). A Mathematical Introduction to Compressive Sensing. Springer.