Contra vs covariant

The concept of covariant and contravariant quantities is very important in tensor calculus, but as far as I can tell, the concept usually isn’t explained very well in most physics classes.  I’ll try to remedy that here.  I will provide a vague conceptual motivation, then I’ll illustrate the principle with a demonstration, define the two terms mathematically, and give explicit examples of covariant and contravariant vectors.

First, I should address a possible source of confusion caused by sloppy vocabulary. The word “covariant” has two meanings:  (1) The mathematical meaning explained here, and (2) the physical idea of invariance under some transformation.  When we say that we are formulating a covariant theory, we typically mean that the theory is Lorentz invariant.  Also, the “covariant derivative” is an invariant derivative.  This sloppy vocabulary can be rather confusing because it enables nonsensical phrases like “contravariant vectors and covariant vectors are both covariant.”  One way of clearing this up is to refer to quantities which transform covariantly as differential forms or “n-forms.”  For instance, a covariant vector is a rank-one covariant tensor and thus it is also referred to as a one-form.  A covariant tensor of rank two is called a two-form, and so on. So, rather than saying “contravariant vectors and covariant vectors are both covariant,” we can say “vectors and one-forms are both Lorentz invariant.”

The motivating concept

I’ll start with an imprecise (and somewhat misleading) illustration to motivate the difference between covariance and contravariance.  Suppose you wish to compute the components of a vector, $$P$$, in a coordinate system that is rotated with respect to your preferred coordinate system.  Conceptually, you have two options:

Option 1: Maintain a single coordinate system, $$S$$,  and physically rotate the vector so that it has new components in $$S$$.  In this case, the vector changes because its orientation in space has changed; it becomes $$P’$$.

Option 2: Keep the vector fixed in space and transform the coordinate system from $$S$$ to $$ S’$$.  The components of $$P’$$ in the $$S$$ basis are the same as the components of $$P$$ in the $$S’$$ basis. The two situations are illustrated below.

rot-rot

The point is that the rotation occurs in opposite directions depending on whether the vector or the coordinate system (the basis) has been rotated. In a sense, the two cases are inverses of one another. The above discussion is misleading, since one of the operations actually changed the position vector by physically rotating it.  It was only meant to illustrate a situation in which the components of a vector and the basis vectors of a coordinate system change in opposite ways.

Now we address the full concept.  We use a non-orthogonal coordinate system because the difference between covariance and contravariance becomes much more obvious in this case.  We show that a vector P can be represented using two different conventions.  In the covariant convention, we use the coordinates $$(P_1,P_2)$$.  We draw lines from the tip of the vector to the axes such that the lines intersect the axes X1 and X2 perpendicularly. The value of X1 and X2 at the points of intersection give us $$P_1$$ and $$P_2$$.  In the contravariant convention, we use coordinates $$(P^1,P^2)$$.  We draw lines parallel to the coordinate axes.  The points at which these parallel lines intersect X1 and X2 give us $$P^1$$ and $$P^2$$.

This is not meant to be drawn in perspective.  The axes X1 and X2 are NOT perpendicular.

The axes X1 and X2 are NOT perpendicular. This is not meant to be drawn in perspective or projection.

Thus, we have two representations for the same vector in which the components of the vector are different because they were found using different (parallel and perpendicular) conventions.  In a Cartesian coordinate system, the two conventions are of course identical.  If we now consider how these two different types of coordinates transform under a rotation, it should be obvious that the coordinates transform differently.  The difference is analogous to the rotations discussed above. This discussion is still somewhat misleading because we have focussed on the coordinate systems. A more fundamental discussion can be found in section 2.5 of the book, Gravitation by Misner, Thorne, and Wheeler.

A comment on notation is in order here.  By convention, contravariant coordinates are written with the upper index notation $$v^i$$ while the covariant coordinates are denoted by lowered indices $$v_i$$

A Demonstration

In order to demonstrate the concept, suppose there are two coordinate systems, $$S$$ and $$ S’$$, with basis vectors given by $$\{\mathbf{e}_1,\mathbf{e}_2,\mathbf{e}_3,\cdots,\mathbf{e}_n\}$$ and $$\{\mathbf{e}’_1,\mathbf{e}’_2,\mathbf{e}’_3,\cdots,\mathbf{e}’_n\}$$ respectively. Note that the basis vectors have been written with the covariant (lower index) notation. Let the transformation between these two sets of basis vectors be given by $$!\displaystyle{\mathbf{e}’_j=A^i_j\mathbf{e}_i}$$ A vector $$\mathbf{v}$$ in the unprimed basis is given by $$!\displaystyle{\mathbf{v}=v^i\mathbf{e}_i}$$ Where the $$v^i$$ are the scaler components of the vector with respect to the unprimed basis. Notice that the components have been written with the contravariant (upper index) notation. The vector can also be written in the primed basis as $$!\displaystyle{\mathbf{v}=v’^j\mathbf{e}’_j=v’^jA^i_j\mathbf{e}_i}$$ The two expressions for $$\mathbf{v}$$ are equivalent, so $$!\displaystyle{v^i\mathbf{e}_i=v’^jA^i_j\mathbf{e}_i}$$ this can be written as $$!\displaystyle{\left(v^i-v’^jA^i_j\right)\mathbf{e}_i=0}$$ Since the basis vectors are linearly independent, the term in parentheses vanishes separately for each value of the index i, which implies $$!\displaystyle{v^i=v’^jA^i_j}$$ or $$!\displaystyle{v’^j=v^i\mathcal{A}^j_i}$$ where $$\mathcal{A}^j_i$$ is the inverse of $$A^i_j$$. Therefore, the transformation rule for the coordinates is in a sense the opposite of the transformation rule for the basis vectors: $$!\displaystyle{v’^j=v^i\mathcal{A}^j_i}$$ versus $$!\displaystyle{\mathbf{e}’_j=A^i_j\mathbf{e}_i}.$$ We say that the components transform contravariantly while the basis vectors transform covariantly.  Next we will see how to find the transformation coefficients $$A^i_j$$ and $$\mathcal{A}^j_i$$.

Definitions

Suppose that we know the explicit form of the change of coordinates from system $$S’$$ to $$S$$, $$!\displaystyle x^i = x^i(x’^1,x’^2,x’^3,\cdots,x’^n)$$ and the inverse $$!\displaystyle x’^i = x’^i(x^1,x^2,x^3,\cdots,x^n)$$ A contravariant vector is defined by the transformation property: $$!\displaystyle V’^j = V^i\frac{\partial x’^j}{\partial x^i}$$ A covariant vector is defined by $$!\displaystyle V’_j = V_i\frac{\partial x^i}{\partial x’^j}$$

Examples

Example 1: Contravariance

The tangent vector to a curve is a contravariant vector.  Let the curve be given by the parameterization $$!\displaystyle x^i = x^i(t)$$ The tangent vector to the curve is $$!\displaystyle T^i = \frac{dx^i}{dt}$$ Under a change of coordinates, the curve is $$!\displaystyle x’^i=x’^i (t) = x’^i(x^1(t),x^2(t),x^3(t),\cdots,x^n(t))$$ and the tangent vector is $$!\displaystyle T’^i = \frac{dx’^i}{dt}$$ By the chain rule, $$!\displaystyle\frac{dx’^i}{dt}=\frac{\partial x’^i}{\partial x^j}\frac{dx^j}{dt}$$ Therefore, $$!\displaystyle T’^i = T^j\frac{\partial x’^i}{\partial x^j}$$ which shows that the tangent vector transforms contravariantly and thus it is a contravariant vector.

Example 2: Covariance

The gradient of a scalar field is a covariant vector field. Let $$\Phi(\mathbf{x})$$ be a scalar field.  Let the gradient of $$\Phi(\mathbf{x})$$ be the vector field $$\mathbf{G}(\mathbf{x})$$ $$!\displaystyle \mathbf{G} = \nabla\Phi = \left(\frac{\partial\Phi}{\partial x^1}, \frac{\partial\Phi}{\partial x^2},\frac{\partial\Phi}{\partial x^3},\cdots,\frac{\partial\Phi}{\partial x^n}\right)$$ thus $$!\displaystyle G_i = \frac{\partial \Phi}{\partial x^i}$$ In the primed coordinate system, the gradient is $$!\displaystyle G’_i = \frac{\partial \Phi’}{\partial x’^i}$$ where $$!\displaystyle \Phi’=\Phi'(\mathbf{x}’)=\Phi(\mathbf{x}(\mathbf{x}’))$$ By the chain rule, $$!\displaystyle \frac{\partial\Phi’}{\partial x’^i}=\frac{\partial\Phi}{\partial x^j}\frac{\partial x^j}{\partial x’^i}$$ Thus $$!\displaystyle G’_i=G_j\frac{\partial x^j}{\partial x’^i}$$ which shows that the gradient is a covariant vector.

Concluding remarks and generalizations

In general, most things that we think of as vectors, such as a position or displacement, are contravariant vectors. These are represented as column vectors in linear algebra. Covariant vectors, or one-forms, are represented by row vectors in linear algebra. It may also be helpful to realize that contravariant vectors are associated with kets $$v^i\sim| v\rangle$$ in the language of quantum mechanics while covariant vectors are associated with bras $$v_i\sim \langle v |.$$

The inner product of a vector with itself forms a scalar: $$!\displaystyle \mathbf{v}\cdot\mathbf{v}= v^i v_i= \langle v | v \rangle = |v|^2.$$ In order to change a contravariant vector into a one-form or vise-versa (so that we can compute the inner product), we use the metric tensor as a lowering or raising operator: $$!\displaystyle v^i = g^{ij} v_j,\qquad v_i = g_{ij} v^j.$$ The definition of contravariant tensor of rank-$$n$$ is a simple generalization.  For instance, a rank-2 contravariant tensor has the transformation property $$!\displaystyle T’^{ij} = T^{k\ell}\frac{\partial x’^i}{\partial x^k}\frac{\partial x’^j}{\partial x^\ell}$$ and a two-form transforms as $$!\displaystyle T’_{ij} = T_{k\ell}\frac{\partial x^k}{\partial x’^i}\frac{\partial x^\ell }{\partial x’^j}.$$ Finally, a mixed tensor transforms as $$!\displaystyle T’^i_j = T^k_\ell\frac{\partial x’^i}{\partial x^k}\frac{\partial x^\ell }{\partial x’^j}.$$