Home
Blog

Convergence of Hard Thresholding Algorithm

We present a greedy-based approximation algorithm designed to reconstruct the vector $x$. It is applicable to matrices $A$ that satisfy the condition $\delta_{3s}\leq\frac 1{12}$. Additionally, the article includes a proof demonstrating the convergence of this algorithm.

Published

06 November 2023

Tag

Context

Let $A\in\mathbb{R}^{N\times p}$ be a “compressing matrix,” where $N<<p$ . Let $Az=y$ . It is generally not possible to recover $x$ without any further constraint. The field of compressed sensing adds the constraint that z is $s-sparse$ and asks the following two questions.

(1) What matrix A should we use to ensure the perfect recovery of the sparse vector $x$ ?

(2) How do we go about finding such algorithm?

For what follows, an excellent reference is available here.

It can be shown that the problem of

$\min_{z \in \{s\text{-sparse}\}} \| z \|_1 \quad \text{subject to} \quad Az = Ax \text{ for an } s\text{-sparse vector } x$

has unique solution if and only if A satisfies the restricted nullspace property. There are two sufficient conditions of $A$ that enable this.

(1) Pairwise incoherence $\lVert A_S^TA_S-I\rVert_\infty\leq \dfrac 1{2s},\forall\lvert S\rvert\leq s$

(2) Restricted Isometry property $\delta=\max_{S:\lvert S\rvert=s}\lVert A^T_SA_S-I\rVert_2\leq \dfrac 13$

One way to obtain such matrix $A$ in (1) is to utilize the randomized matrix by Johnson-Lindenstrauss type inequality. However, the resulting dimension $N$ is not generally small enough. Specifically, we want $O(s)$ but we have an order of $O(s^2)$ .

For condition (2), while the complexity of $N$ is more favorable with $O(s*constant)$ , the difficulty lies in the fact that the only matrices known to satisfy this condition are those that are randomized by normal distribution.

Given these theoretical challenges, I want to introduce an elegant greedy-based approximation algorithm to recover $x$ that applies to the matrix $A$ with $\delta_{3s}\leq\frac 1{12}$ , and provides proof of its convergence.

Iterative Hard Thresholding Algorithm

Algorithm: Iterative Sparse Minimization

Objective: Minimize the $L_1$ -norm of $z$ subject to $Az=y$ .

Initialization: Choose an initial guess $z_0$ and set $t=0$ .

Repeat until convergence:

Gradient Step:

Update $a$ by:

$a_{t+1} = z_t - A^T(Az_t - y)$
Sparsity Constraint:

Update $z$ by finding the closest $s$ -sparse vector to $a_{t+1}$ :

$z_{t+1} = \arg\min_{z \in \{s\text{-sparse}\}} \| a_{t+1} - z \|_2$
Update Iteration:

Set t = t + 1 .

Output: Return $z_t$ as the solution.

Note that $- A^T(Az_t - y)$ is like gradient descent step.

Theorem Let RIP constant $\delta_{3s}\leq\frac 1{12}$ . Then we have linear convergence of the above algorithm. That is,

$\lVert z_{t+1}-z^*\rVert_2\leq \beta\lVert z_0-z^*\rVert_2$

for some $\beta<1$ .

proof: Let $S=supp(z^*)\cup supp(z_{t+1})$ . We have

$\begin{aligned} \lVert z_{t+1} - z^*\rVert_2 &= \lVert z_{t+1, S} - z^*_S\rVert_2 \\ &\leq \lVert z_{t+1, S}-a_{t+1,S}\rVert_2+\lVert a_{t+1,S}-z^*_S\rVert_2 \\ &\leq \lVert z_S^*-a_{t+1,S}\rVert_2+\lVert a_{t+1,S}-z_S^*\rVert_2 \\ &=2\lVert z_S^*-a_{t+1,S}\rVert_2 \end{aligned}$

Now, we have

$\begin{aligned} \lVert z^*_S-a_{t+1.S}\rVert_2 &= \lVert z_S^*-z_{t,S}+A_S^T A(z_t-z^*)\rVert_2\\ &= \lVert -r_{t,S}+A_S^TAr_t\rVert_2 \\ &= \lVert -r_{t,S}+A_S^T Ar_{t,S}+A_S^T Ar_{t,S^C}\rVert_2 \\ &=\lVert -r_{t,S}+A_S^T Ar_{t,S} \rVert_2 + \lVert A_S^T Ar_{t,S^C}\rVert_2 \\ &=\lVert (I-A_S^T A_S)r_t\rVert + \lVert A_S^T A_{S^C}r_t\rVert \\ &\leq \delta_{3s}\lVert r_t\rVert_2 + \lVert A_S^T A r_{t, S^-/S}\rVert_2 \\ &= \delta_{3s}\lVert r_t\rVert_2 + \lVert A_S^T A_{S^-/S} r_t, \rVert_2 \end{aligned}$

where $r_t$ is supported on $S^-$ , $r_t:=z_t-z^*$ and $A_S$ is a matrix with zero on columns S.

Now, it suffices to show that $\lVert A_S^T A_{S^-/S}\rVert_2\leq 2\delta_{3s}$ , since we would then have the desired inequality with $\beta=\frac 1{2^t}$ by combining two inequalities above.

To this end, let $\lVert x\rVert_2, \lVert y\rVert_2=1$ . We have

$\begin{aligned} x^T(A_S^T A_{S^-/S}) y &= x_S^T (A^T A) y_{S^-/S} \\ &=\frac 14(\lVert Ax_S+Ay_{S^-/S}\rVert_2^2-\lVert Ax_S-Ay_{S^-/S}\rVert_2^2) \\ &=\frac 14(\lVert x_S+y_{S^-/S}\rVert^2+(x_S+y_{S^-/S})^T(A^TA-I) \\ & (x_S+y_{S^-/S})-\lVert_2 x_S-y_{S^-/S}\rVert_2^2-(x_S-y_{S^-/S})^T \\ & (A^TA-I)(x_S-y_{S^-/S})) \\ &=\frac 14((x_S+y_{S^-/S})^T(A^TA-I)(x_S+y_{S^-/S}) \\ & -(x_S-y_{S^-/S})^T (A^TA-I)(x_S-y_{S^-/S})) \\ & \leq 2 \delta_{3s} \end{aligned}$

where the last inequality due to the fact that $x_S\pm y_{S^-/S}$ is 3s-sparse (recall that $S^-=supp(r_t)\subset supp(z^*)\cup supp(z_t)$ ). Therefore, we have established the claim $\lVert z_{t+1}-z^*\rVert_2\leq \beta\lVert z_0-z^*\rVert_2$ with $\beta=\frac 1{2^t}$ .

Conclusion

We have established the convergence of the Hard Thresholding algorithm for reconstructing the vector x. In today’s landscape, where data storage is relatively inexpensive, the practicality of compressed sensing, or reconstruction problems, may not be immediately relevant for everyday data scientists. Nevertheless, the underlying mathematics of this field is quite fascinating, particularly in the context of exploring different trade-offs.

References

Simon Foucart, H. R. (2012). A Mathematical Introduction to Compressive Sensing. Springer.