Hi,
For the first part it is due to the i.i.d assumption, so when taking the square, all the cross terms have expectation of zero and disappear.
For the second part, this should answer your question: https://www.oknoname.com/EPFL/CS-439/topic/1833/lecture-5-slide-13/#c4
Best.
Scott

I got confused by the fact that we use the formula \( Var[g_t] = \mathbb{E}[ \| g_t - \mathbb{E}[g_t] \|^2] = \mathbb{E}[ (g_t - \mathbb{E}[g_t])^T(g_t - \mathbb{E}[g_t])\) that is a scalar in many different contexts, but the only formula and properties I knew for multivariate variance so far was the classic variance matrix, i.e., \( Var[g_t] = \mathbb{E}[ (g_t - \mathbb{E}[g_t])(g_t - \mathbb{E}[g_t])^T]\). Do you know where we could find resources concerning the latter approach? I tried googling "scalar variance multivariate random variable" and related without much success.. Thank you for the help!

Hi,
You are correct the variance (or more often called covariance matrix) of a multidimensional random vector is a matrix. Here there is a slight abuse of notation with the notation "Var".

## Variance development

Hello,

When rereading the course, I noticed that the following development (p. 93) does not look straightforward to me:

\(Var[g_t] = \mathbb{E}[\| \tilde{g_t} - \nabla f(x_t) \|^2] = \mathbb{E}[\|\frac{1}{m}\sum^m_{j=1} g_t^j - \nabla f(x_t) \|^2] = \frac{1}{m}\mathbb{E}[\|g_t^1 -\nabla f(x_t) \|^2] = \frac{1}{m}\mathbb{E}[\|g_t^1\|^2]- \frac{1}{m} \|\nabla f(x_t) \|^2 \)

I manage to extract the \( \frac{1}{m^2} \) from the expectation, but get blocked at \( \frac{1}{m^2}\mathbb{E}[\|\sum g_t^j - \nabla f(x_t) \|^2] \)

Furthermore, in the last equality, how come we can take the norm of the difference as the difference of the norm?

Thanks very much for any help!

Cheers,

Yann

## 1

Hi,

For the first part it is due to the i.i.d assumption, so when taking the square, all the cross terms have expectation of zero and disappear.

For the second part, this should answer your question: https://www.oknoname.com/EPFL/CS-439/topic/1833/lecture-5-slide-13/#c4

Best.

Scott

## 1

I see, thank you!

I got confused by the fact that we use the formula \( Var[g_t] = \mathbb{E}[ \| g_t - \mathbb{E}[g_t] \|^2] = \mathbb{E}[ (g_t - \mathbb{E}[g_t])^T(g_t - \mathbb{E}[g_t])\) that is a scalar in many different contexts, but the only formula and properties I knew for multivariate variance so far was the classic variance matrix, i.e., \( Var[g_t] = \mathbb{E}[ (g_t - \mathbb{E}[g_t])(g_t - \mathbb{E}[g_t])^T]\). Do you know where we could find resources concerning the latter approach? I tried googling "scalar variance multivariate random variable" and related without much success.. Thank you for the help!

Hi,

You are correct the variance (or more often called covariance matrix) of a multidimensional random vector is a matrix. Here there is a slight abuse of notation with the notation "Var".

## Add comment