I have a k*n matrix X, and an k*k matrix A. For each column of X, I'd like to calculate the scalar
X[:, i].T.dot(A).dot(X[:, i])
(or, mathematically, Xi' * A * Xi).
Currently, I have a for loop:
out = np.empty((n,))
for i in xrange(n):
out[i] = X[:, i].T.dot(A).dot(X[:, i])
but since n is large, I'd like to do this faster if possible (i.e. using some NumPy functions instead of a loop).
