Linear connections

Also called affine connections, covariant derivative, covariant derivative operator.

General framework

An affine connection on a manifold M is a vector bundle connection on the tangent bundle of M. Equivalently, it is a principal connection on the frame bundle (which is a principal bundle) of M. This equivalence is due to this.

Let's call P to the frame bundle and ω to the connection 1-form of this connection. What is a vector vTpP? The point pP represents a point x in the manifold together with a choice of a basis for TxM, and v represents the beginning of a curve α leaving x and a choice of a basis for every Tα(t)M. The value of ωp(v) tells us how the basis is changing (if it is the case) when we move along α. This is "not natural" to M, and must be introduced by hand. This "change" is infinitesimal, since it correspond to an infinitesimal step along α, so it is measured by an element of gl(n). See Cartan geometry#Generalization of manifolds with affine connections for more info about "the big picture".

Motivational introduction

Consider first the situation in Rn. Let X,Y:RnRn be vector fields. To define the directional derivative of the vector field X in the direction of the vector field Y at a point pRn, we can mimic usual definition of directional derivative:

(YX)(p):=limt0X(p+tY(p))X(p)t.

The result (YX) is a vector field on Rn. You can check that the operation defined as above satisfies the following two properties:

  1. fY(X)=fYX.
  2. Y(fX)=(Yf)X+fYX.
    Here, X,Y:RnRn are vector fields and f:RnR is a scalar function. The function Yf (at a point p) is the directional derivative of f at p in the direction Y(p).
    Now let us try and mimic the above construction on a general manifold. Given vector fields X,YX(M), we try to use the same formula and define
(YX)(p):=limt0X(p+tY(p))X(p)t.

However, we see that there are two problems. First, the expression X(p+tY(p)) is not defined because we don't have a way of adding a point pM to a tangent vector tY(p)TpM. This is not so bad because we can actually replace the expression p+tY(p) with any curve "which goes in the direction Y(p)" such as the flow φtY(p). The more serious problem is that we need to subtract the tangent vector X(p)TpM from the tangent vector X(φtY(p))TφtY(p) and those are two tangent vectors that belong to different vector spaces.

In general, without any extra data, we have no way of identifying tangent spaces at different points of M.

To summarize, we see that we can differentiate vector fields along vector fields without any problem on Rn but we encounter problems when we try and do it on a general manifold. But Rn is also a manifold so what makes it special? We need extra data.

The definition of an affine connection is meant to supply the manifold M "externally" with an operation :X(M)×X(M)X(M) which satisfies properties (1)(2) and so allows us to differentiate vector fields along vector fields. That is, instead of defining the directional derivative of a vector field along a vector field, we require that somebody handles us a mechanism which satisfies the properties that the familiar derivative satisfied on Rn and then we will think of it as a directional derivative.

How to get one?

This extra data could be provided:

Dvw=P[vw]=vw(nvw)n

where P is the projection into the surface along its normal n. For more see Gauss' Equation, Theorem 1.

Definition as operator

(See relationship parallel transport, covariant derivatives and metrics).

For vector fields

We call covariant derivative operator or affine connection or linear connection to an operator :X(M)×X(M)X(M), where X(M) is the set of all vector fields in M, satisfying:

  1. Commute with addition.
  2. Leibniz rule.
  3. Tensorial with respect to the first X(M).
  4. Commute with index contraction.
  5. Applied to scalar fields, coincides with directional derivative respect to the vector.
  6. For scalar fields we have commutation: abα=baα.

Obviously this raises quite a lot of questions:

  1. Does such mechanism always exists? (Yes).
  2. It it unique? (No).
  3. Is there a natural choice of such differentiation mechanism? (Yes, under certain circumstances).
  4. Can we use this mechanism to recover the ability to identify tangent vectors at different points that was necessary to define the regular directional derivative in Rn? (Yes, at least along curves. This leads to the notion of parallel transport).

In this context, we call covariant derivative to the result

(XY)(p)

This result depends on the values of the vector field X in a neighbourhood of p, but only on the specific vector Y(p), since it is linear in this argument (property 3 above). The Lie derivative of vector fields, on the contrary, depends on the values of X in a neighbourhood. See the relation of Lie derivative, covariant derivative and torsion.

Proof
The value of XY depends only on the values of X and Y in a neighborhood U of p, so you can write in a local frame X=XiEi such that Xi(p)=0, then as you said :

XZ=XiEiZ

So evaluating in p :

XZ|p=Xi(p)EiZ|p

which only depends of the value of X at p.

If we perform this construction not on TM but in any vector bundle EM we obtain the notion of vector bundle connection, which is a particular case of a connection on a fiber bundle.

To specify a covariant derivative operator we only need to fix a coordinate chart {xi} of TM and provide the functions Γijk

xjxi=Γijkxk

which are called the Christoffel symbols when the connection comes from a (pseudo)-Riemannian metric.

Extension to tensor fields

Once we have a covariant derivative on the tangent vector fields of a manifold, it can be extended to any tensor field with the same Christoffel symbols. In this video is explained how can be computed the covariant derivative of 1-forms, and in this part of the same video it is applied to any tensor.
For example, given the vector v=viei, the 1-form α=αiϵi and the (0,2)-tensor g=gijϵiϵj:

i(v)=(vkui+vjΓijk)eki(α)=(αkuiαjΓikj)ϵki(g)=[grsuigksΓirkgrkΓisk](ϵrϵs)

In components, and with the notation of F. Schuller in GR:

(XY)i=XYi+Γ(x)jkiXkYj,(Xω)i=XωiΓ(x)ikjXkωj.

and for a (1-2)-tensor

(XT)ijk=XTijk+Γ(x)miXTmjkΓ(x)jmXTimkΓ(x)kmXTmjm.

Anyway, we could have defined directly a covariant derivative X of a tensor field T by the following properties:

  1. For any smooth function fC(M),Xf=X(f).
  2. For any tensor fields T and S,X(T+S)=XT+XS.
  3. For any tensor field T, 1-form ω, and vector field Y,X(T(ω,Y))=(XT)(ω,Y)+T(Xω,Y)+T(ω,XY).
  4. For any smooth function fC(M) and vector fields X and Z,fX+ZT=fXT+ZT.

Worked example

Another approach to understanding the problem is as follows:

In R2 with Cartesian coordinates (x,y), we have a basis {x,y} for the different tangent spaces. Moreover, the tangent vector x at point P=(1,2) and the vector x at point Q=(0,1) are "the same," in the sense that I can translate it from P to Q. This is because we are assuming the notion of traditional parallelism in R2.

Now, let's consider other coordinates, for example, (t,a) given by the transformation:

ϕ:(x,y)(t,a)=(x,yx2)

The vector t at point P=(1,1) (note that this P is the same as in the previous paragraph but expressed in coordinates (t,a)), and the vector t at point Q=(0,1) are now not the same, from the perspective of traditional parallelism in R2. Let's see this:

dϕ1(t)=(102t1)(10)|P=(12t)|P=x+4y dϕ1(t)=(102t1)(10)|Q=(12t)|Q=x

Therefore, t is not constant, even though its components are. This implies that if we want to differentiate a vector field in this new coordinate system, we cannot simply differentiate each component. For example, the derivative of t would be 0, but we have just seen that it is not constant.
The way to fix this is to add correction terms to the traditional component-wise derivative that reflect the deformation of the axes themselves. Let's see this in the specific case of coordinates (t,a) (note that for coordinates (x,y), it would be sufficient to differentiate each component because we assume that the basis vectors are constant):
Consider the vector field X=FXt+GXa and the field Y=FYt+GYa. A consistent way to differentiate Y with respect to X would be an operation that should satisfy:

XY=FXt(Y)+GXa(Y)==FXt(FYt+GYa)+GXa(FYt+GYa)

Since the Leibniz rule should also hold for vectors, the above expression becomes:

FXt(FY)t+FXFYt(t)++FXt(GY)a+FXGYt(a)++GXa(FY)t+GXFYa(t)++GXa(GY)a+GXGYa(a)==X(FY)t+X(GY)a+correction terms

These "correction terms" will be fully determined when we calculate t(t), t(a), a(t), and a(a). Let's translate everything into Cartesian coordinates, where we can differentiate because parallelism exists.

dϕ1(t)=(102t1)(10)=x+2xydϕ1(a)=(102t1)(01)=y

Therefore:

Thus, the "differentiation" becomes:

XY=X(FY)t+X(GY)a+2FXFYa

The components of t(t), t(a), a(t), and a(a) in the basis t,t are called Christoffel symbols and depend on the notion of parallelism and the chosen coordinates. They are symbolized by Γijk, and in our case, Γ112=2, and all others are 0.

Abstract Index Notation

This whole approach can be given using Penrose abstract index notation. In this notation, we would denote:

ξccΨa=ξΨ

In Penrose notation, it becomes:
derivadacovariante.png
And, for example, the Leibniz rule (and more):
notacionpenrose3 1.png
Ultimately, a connection will determine a way to identify tangent vectors at one point pM with those at another point qM, although it will depend on the curve connecting them (parallel transport).