Linear connections

Also called affine connections, covariant derivative, covariant derivative operator.

General framework

An affine connection on a manifold $M$ is a vector bundle connection on the tangent bundle of M. Equivalently, it is a principal connection on the frame bundle (which is a principal bundle) of $M$ . This equivalence is due to this.

Let's call $P$ to the frame bundle and $ω$ to the connection 1-form of this connection. What is a vector $v \in T_{p} P$ ? The point $p \in P$ represents a point $x$ in the manifold together with a choice of a basis for $T_{x} M$ , and $v$ represents the beginning of a curve $α$ leaving $x$ and a choice of a basis for every $T_{α (t)} M$ . The value of $ω_{p} (v)$ tells us how the basis is changing (if it is the case) when we move along $α$ . This is "not natural" to $M$ , and must be introduced by hand. This "change" is infinitesimal, since it correspond to an infinitesimal step along $α$ , so it is measured by an element of $gl (n)$ . See Cartan geometry#Generalization of manifolds with affine connections for more info about "the big picture".

Motivational introduction

Consider first the situation in $R^{n}$ . Let $X, Y : R^{n} \to R^{n}$ be vector fields. To define the directional derivative of the vector field $X$ in the direction of the vector field $Y$ at a point $p \in R^{n}$ , we can mimic usual definition of directional derivative:

(\nabla_{Y} X) (p) := lim_{t \to 0} \frac{X (p + t Y (p)) - X (p)}{t} .

The result $(\nabla_{Y} X)$ is a vector field on $R^{n}$ . You can check that the operation $\nabla$ defined as above satisfies the following two properties:

$\nabla_{f Y} (X) = f \nabla_{Y} X$ .
$\nabla_{Y} (f X) = (Y f) X + f \nabla_{Y} X$ .
Here, $X, Y : R^{n} \to R^{n}$ are vector fields and $f : R^{n} \to R$ is a scalar function. The function $Y f$ (at a point $p$ ) is the directional derivative of $f$ at $p$ in the direction $Y (p)$ .
Now let us try and mimic the above construction on a general manifold. Given vector fields $X, Y \in X (M)$ , we try to use the same formula and define

(\nabla_{Y} X) (p) := lim_{t \to 0} \frac{X (p + t Y (p)) - X (p)}{t} .

However, we see that there are two problems. First, the expression $X (p + t Y (p))$ is not defined because we don't have a way of adding a point $p \in M$ to a tangent vector $t Y (p) \in T_{p} M$ . This is not so bad because we can actually replace the expression $p + t Y (p)$ with any curve "which goes in the direction $Y (p)$ " such as the flow $φ_{t}^{Y} (p)$ . The more serious problem is that we need to subtract the tangent vector $X (p) \in T_{p} M$ from the tangent vector $X (φ_{t}^{Y} (p)) \in T_{φ_{t}^{Y} (p)}$ and those are two tangent vectors that belong to different vector spaces.

In general, without any extra data, we have no way of identifying tangent spaces at different points of $M$ .

To summarize, we see that we can differentiate vector fields along vector fields without any problem on $R^{n}$ but we encounter problems when we try and do it on a general manifold. But $R^{n}$ is also a manifold so what makes it special? We need extra data.

The definition of an affine connection is meant to supply the manifold $M$ "externally" with an operation $\nabla : X (M) \times X (M) \to X (M)$ which satisfies properties $(1) - (2)$ and so allows us to differentiate vector fields along vector fields. That is, instead of defining the directional derivative of a vector field along a vector field, we require that somebody handles us a mechanism $\nabla$ which satisfies the properties that the familiar derivative satisfied on $R^{n}$ and then we will think of it as a directional derivative.

How to get one?

This extra data could be provided:

externally
by a parallel transport.
Or it can be inherited from the ambient (for example if the manifold is immersed in $R^{N}$ ). Covariant derivative operator on surfaces can be seen like inherited from the absolute parallelism of $R^{3}$ and the metric of $R^{3}$ :

D_{v} w = P [\nabla_{v} w] = \nabla_{v} w - (n \cdot \nabla_{v} w) n

where $P$ is the projection into the surface along its normal $n$ . For more see Gauss' Equation, Theorem 1.

Definition as operator

(See relationship parallel transport, covariant derivatives and metrics).

For vector fields

We call covariant derivative operator or affine connection or linear connection to an operator $\nabla : X (M) \times X (M) \to X (M)$ , where $X (M)$ is the set of all vector fields in $M$ , satisfying:

Commute with addition.
Leibniz rule.
Tensorial with respect to the first $X (M)$ .
Commute with index contraction.
Applied to scalar fields, coincides with directional derivative respect to the vector.
For scalar fields we have commutation: $\nabla_{a} \nabla_{b} α = \nabla_{b} \nabla_{a} α$ .

Obviously this raises quite a lot of questions:

Does such mechanism always exists? (Yes).
It it unique? (No).
Is there a natural choice of such differentiation mechanism? (Yes, under certain circumstances).
Can we use this mechanism to recover the ability to identify tangent vectors at different points that was necessary to define the regular directional derivative in $R^{n}$ ? (Yes, at least along curves. This leads to the notion of parallel transport).

In this context, we call covariant derivative to the result

(\nabla_{X} Y) (p)

This result depends on the values of the vector field $X$ in a neighbourhood of $p$ , but only on the specific vector $Y (p)$ , since it is linear in this argument (property 3 above). The Lie derivative of vector fields, on the contrary, depends on the values of $X$ in a neighbourhood. See the relation of Lie derivative, covariant derivative and torsion.

Proof
The value of $\nabla_{X} Y$ depends only on the values of $X$ and $Y$ in a neighborhood $U$ of $p$ , so you can write in a local frame $X = X^{i} E_{i}$ such that $X^{i} (p) = 0$ , then as you said :

\nabla_{X} Z = X^{i} \nabla_{E_{i}} Z

So evaluating in $p$ :

\nabla_{X} Z |_{p} = X^{i} (p) \nabla_{E_{i}} Z |_{p}

which only depends of the value of $X$ at $p$ .
$◼$

If we perform this construction not on $T M$ but in any vector bundle $E \to M$ we obtain the notion of vector bundle connection, which is a particular case of a connection on a fiber bundle.

To specify a covariant derivative operator we only need to fix a coordinate chart ${x_{i}}$ of $T M$ and provide the functions $Γ_{i j}^{k}$

\nabla_{\partial_{x_{j}}} \partial_{x_{i}} = Γ_{i j}^{k} \partial_{x_{k}}

which are called the Christoffel symbols when the connection comes from a (pseudo)-Riemannian metric.

Extension to tensor fields

Once we have a covariant derivative on the tangent vector fields of a manifold, it can be extended to any tensor field with the same Christoffel symbols. In this video is explained how can be computed the covariant derivative of 1-forms, and in this part of the same video it is applied to any tensor.
For example, given the vector $v = v^{i} e_{i}$ , the 1-form $α = α_{i} ϵ^{i}$ and the (0,2)-tensor $g = g_{i j} ϵ^{i} \otimes ϵ^{j}$ :

\begin{array}{l} \nabla_{\partial_{i}} (\vec{v}) = (\frac{\partial v^{k}}{\partial u^{i}} + v^{j} Γ_{i j}^{k}) \vec{e_{k}} \\ \nabla_{\partial_{i}} (α) = (\frac{\partial α_{k}}{\partial u^{i}} - α_{j} Γ_{i k}^{j}) ϵ^{k} \\ \nabla_{\partial_{i}} (g) = [\frac{\partial g_{r s}}{\partial u^{i}} - g_{k s} Γ_{i r}^{k} - g_{r k} Γ_{i s}^{k}] (ϵ^{r} \otimes ϵ^{s}) \end{array}

In components, and with the notation of F. Schuller in GR:

\begin{aligned} (\nabla_{X} Y)^{i} & = X ⟨ Y^{i} ⟩ + Γ_{(x) j k}^{i} X^{k} Y^{j}, \\ (\nabla_{X} ω)_{i} & = X ⟨ ω_{i} ⟩ - Γ_{(x) i k}^{j} X^{k} ω_{j} . \end{aligned}

and for a (1-2)-tensor

{(\nabla_{X} T)^{i}}_{j k} = X ⟨ {T^{i}}_{j k} ⟩ + Γ_{(x) m ℓ}^{i} X^{ℓ} {T^{m}}_{j k} - Γ_{(x) j ℓ}^{m} X^{ℓ} {T^{i}}_{m k} - Γ_{(x) k ℓ}^{m} X^{ℓ} {T^{m}}_{j m} .

Anyway, we could have defined directly a covariant derivative $\nabla_{X}$ of a tensor field $T$ by the following properties:

For any smooth function $f \in C^{\infty} (M)$ , $\nabla_{X} f = X (f) .$
For any tensor fields $T$ and $S$ , $\nabla_{X} (T + S) = \nabla_{X} T + \nabla_{X} S .$
For any tensor field $T$ , 1-form $ω$ , and vector field $Y$ , $\nabla_{X} (T (ω, Y)) = (\nabla_{X} T) (ω, Y) + T (\nabla_{X} ω, Y) + T (ω, \nabla_{X} Y) .$
For any smooth function $f \in C^{\infty} (M)$ and vector fields $X$ and $Z$ , $\nabla_{f X + Z} T = f \nabla_{X} T + \nabla_{Z} T .$

Worked example

Another approach to understanding the problem is as follows:

In $R^{2}$ with Cartesian coordinates $(x, y)$ , we have a basis ${\frac{\partial}{\partial x}, \frac{\partial}{\partial y}}$ for the different tangent spaces. Moreover, the tangent vector $\frac{\partial}{\partial x}$ at point $P = (1, 2)$ and the vector $\frac{\partial}{\partial x}$ at point $Q = (0, 1)$ are "the same," in the sense that I can translate it from $P$ to $Q$ . This is because we are assuming the notion of traditional parallelism in $R^{2}$ .

Now, let's consider other coordinates, for example, $(t, a)$ given by the transformation:

ϕ : (x, y) \to (t, a) = (x, y - x^{2})

The vector $\frac{\partial}{\partial t}$ at point $P = (1, 1)$ (note that this $P$ is the same as in the previous paragraph but expressed in coordinates $(t, a)$ ), and the vector $\frac{\partial}{\partial t}$ at point $Q = (0, 1)$ are now not the same, from the perspective of traditional parallelism in $R^{2}$ . Let's see this:

$\frac{\partial}{\partial t}$ at $P$ can be expressed in Cartesian coordinates as

d ϕ^{- 1} (\frac{\partial}{\partial t}) = (\begin{array}{cc} 1 & 0 \\ 2 t & 1 \end{array}) \cdot {(\begin{matrix} 1 \\ 0 \end{matrix}) |}_{P} = {(1 2 t) |}_{P} = \frac{\partial}{\partial x} + 4 \frac{\partial}{\partial y}

$\frac{\partial}{\partial t}$ at $Q$ can be expressed in Cartesian coordinates as

d ϕ^{- 1} (\frac{\partial}{\partial t}) = (\begin{array}{cc} 1 & 0 \\ 2 t & 1 \end{array}) \cdot {(\begin{matrix} 1 \\ 0 \end{matrix}) |}_{Q} = {(1 2 t) |}_{Q} = \frac{\partial}{\partial x}

Therefore, $\frac{\partial}{\partial t}$ is not constant, even though its components are. This implies that if we want to differentiate a vector field in this new coordinate system, we cannot simply differentiate each component. For example, the derivative of $\frac{\partial}{\partial t}$ would be 0, but we have just seen that it is not constant.
The way to fix this is to add correction terms to the traditional component-wise derivative that reflect the deformation of the axes themselves. Let's see this in the specific case of coordinates $(t, a)$ (note that for coordinates $(x, y)$ , it would be sufficient to differentiate each component because we assume that the basis vectors are constant):
Consider the vector field $X = F_{X} \frac{\partial}{\partial t} + G_{X} \frac{\partial}{\partial a}$ and the field $Y = F_{Y} \frac{\partial}{\partial t} + G_{Y} \frac{\partial}{\partial a}$ . A consistent way to differentiate $Y$ with respect to $X$ would be an operation $\nabla$ that should satisfy:

\nabla_{X} Y = F_{X} \frac{\partial}{\partial t} (Y) + G_{X} \frac{\partial}{\partial a} (Y) =

= F_{X} \frac{\partial}{\partial t} (F_{Y} \frac{\partial}{\partial t} + G_{Y} \frac{\partial}{\partial a}) + G_{X} \frac{\partial}{\partial a} (F_{Y} \frac{\partial}{\partial t} + G_{Y} \frac{\partial}{\partial a})

Since the Leibniz rule should also hold for vectors, the above expression becomes:

F_{X} \frac{\partial}{\partial t} (F_{Y}) \frac{\partial}{\partial t} + F_{X} F_{Y} \frac{\partial}{\partial t} (\frac{\partial}{\partial t}) +

+ F_{X} \frac{\partial}{\partial t} (G_{Y}) \frac{\partial}{\partial a} + F_{X} G_{Y} \frac{\partial}{\partial t} (\frac{\partial}{\partial a}) +

+ G_{X} \frac{\partial}{\partial a} (F_{Y}) \frac{\partial}{\partial t} + G_{X} F_{Y} \frac{\partial}{\partial a} (\frac{\partial}{\partial t}) +

+ G_{X} \frac{\partial}{\partial a} (G_{Y}) \frac{\partial}{\partial a} + G_{X} G_{Y} \frac{\partial}{\partial a} (\frac{\partial}{\partial a}) =

= X (F_{Y}) \frac{\partial}{\partial t} + X (G_{Y}) \frac{\partial}{\partial a} + correction terms

These "correction terms" will be fully determined when we calculate $\frac{\partial}{\partial t} (\frac{\partial}{\partial t})$ , $\frac{\partial}{\partial t} (\frac{\partial}{\partial a})$ , $\frac{\partial}{\partial a} (\frac{\partial}{\partial t})$ , and $\frac{\partial}{\partial a} (\frac{\partial}{\partial a})$ . Let's translate everything into Cartesian coordinates, where we can differentiate because parallelism exists.

d ϕ^{- 1} (\frac{\partial}{\partial t}) = (\begin{array}{cc} 1 & 0 \\ 2 t & 1 \end{array}) \cdot (\begin{matrix} 1 \\ 0 \end{matrix}) = \frac{\partial}{\partial x} + 2 x \frac{\partial}{\partial y}

d ϕ^{- 1} (\frac{\partial}{\partial a}) = (\begin{array}{cc} 1 & 0 \\ 2 t & 1 \end{array}) \cdot (\begin{matrix} 0 \\ 1 \end{matrix}) = \frac{\partial}{\partial y}

Therefore:

$\frac{\partial}{\partial t} (\frac{\partial}{\partial t}) = (\frac{\partial}{\partial x} + 2 x \frac{\partial}{\partial y}) (\frac{\partial}{\partial x} + 2 x \frac{\partial}{\partial y}) = 2 \frac{\partial}{\partial y} = 2 \frac{\partial}{\partial a}$
$\frac{\partial}{\partial t} (\frac{\partial}{\partial a}) = (\frac{\partial}{\partial x} + 2 x \frac{\partial}{\partial y}) (\frac{\partial}{\partial y}) = 0$
$\frac{\partial}{\partial a} (\frac{\partial}{\partial t}) = \frac{\partial}{\partial y} (\frac{\partial}{\partial x} + 2 x \frac{\partial}{\partial y}) = 0$
$\frac{\partial}{\partial a} (\frac{\partial}{\partial a}) = \frac{\partial}{\partial y} (\frac{\partial}{\partial y}) = 0$

Thus, the "differentiation" becomes:

\nabla_{X} Y = X (F_{Y}) \frac{\partial}{\partial t} + X (G_{Y}) \frac{\partial}{\partial a} + 2 F_{X} F_{Y} \frac{\partial}{\partial a}

The components of $\frac{\partial}{\partial t} (\frac{\partial}{\partial t})$ , $\frac{\partial}{\partial t} (\frac{\partial}{\partial a})$ , $\frac{\partial}{\partial a} (\frac{\partial}{\partial t})$ , and $\frac{\partial}{\partial a} (\frac{\partial}{\partial a})$ in the basis $⟨ \frac{\partial}{\partial t}, \frac{\partial}{\partial t} ⟩$ are called Christoffel symbols and depend on the notion of parallelism and the chosen coordinates. They are symbolized by $Γ_{i j}^{k}$ , and in our case, $Γ_{11}^{2} = 2$ , and all others are 0.

Abstract Index Notation

This whole approach can be given using Penrose abstract index notation. In this notation, we would denote:

ξ^{c} \nabla_{c} Ψ^{a} = \nabla_{ξ} Ψ

In Penrose notation, it becomes:

And, for example, the Leibniz rule (and more):
notacionpenrose3 1.png
Ultimately, a connection will determine a way to identify tangent vectors at one point $p \in M$ with those at another point $q \in M$ , although it will depend on the curve connecting them (parallel transport).