Let $x\in\mathbb R^d$, and $s=\operatorname{softmax}(x)$. Let $y$ be a fixed one-hot vector such that
$$u = s-y \\v =(\operatorname{diag}(s) - ss^\top)x$$
I am interested in the inequality $u^\top (u + v)\geq 0$. Specifically, the conditions on $x$ which satisfy (or violate) the inequality.
One can rearrange the inequality to$$u^\top Ax \geq -\Vert u\Vert^2$$where $A=\operatorname{diag}(s) - ss^\top$. Note that $A$ is a covarinace matrix for $s$ so it would be symmteric positive semidefinite. However, $x$ is the only independent variable, and both $A$ and $u$ depend on it.
Update:
I ran a simulation varying the dimensionality of $x$, as well as $\Vert x\Vert$. Seen below is the plot of $\operatorname{E}\left[u^\top (u + Ax)\right]$. For each $d$, I uniformly sampled $1000$ vectors from the unit-hypersphere and scaled them on the y-axis.
As can be seen, there exists a threshold on $k$ for a given $d$, such that the inequality holds on average.
My question then, is as follows: Given that $x$ is normally distributed, what is $\operatorname{E}\left[u^\top (u + Ax)\right]$? How does $d$ feature into the expression?
What if $x$ is distributed uniformly on a $k$-hypersphere?