Differential entropy

The differential entropy of a continuous random variable $X$ with probability density function (pdf) $p (x)$ is

h (X) = - \int_{- \infty}^{\infty} p (x) \log p (x) d x .

It is the continuous analogue of Shannon entropy for discrete distributions,

H (X) = - \sum_{i} p_{i} \log p_{i} .

Characteristics:

Scale dependence:
Shannon entropy $H (X)$ is always nonnegative and invariant under relabeling of outcomes. By contrast, $h (X)$ can be negative and depends on the units of $x$ . If we rescale $x \mapsto a x$ , then

h (X) \mapsto h (X) + \log | a | .

Discretization link:
If a continuous random variable $X$ is discretized into bins of size $Δ$ , then the Shannon entropy of the discretized distribution satisfies

H_{Δ} (X) \approx h (X) + \log \frac{1}{Δ} .

Thus, differential entropy is not absolute: it only gains physical meaning when a fundamental resolution scale is specified.

Planck Length as a Natural Cutoff

In physics, the Planck length

ℓ_{P} = \sqrt{\frac{ℏ G}{c^{3}}} \approx 1.6 \times 10^{- 35} m

is often regarded as the minimal meaningful length scale. When interpreting entropy in continuous systems (fields, spacetime degrees of freedom, black holes), the bin size $Δ$ can be taken to be on the order of $ℓ_{P}$ .
This gives a bridge between differential entropy and a physically grounded discrete Shannon entropy:

H (X) \approx h (X) + \log \frac{1}{ℓ_{P}} .

Without such a cutoff, $h (X)$ remains ambiguous.
With $ℓ_{P}$ as the "ultimate resolution," entropy becomes a dimensionless, absolute quantity.
This perspective is important in quantum gravity and black hole thermodynamics, where entropy counts microscopic states per Planck-scale cell of phase space or spacetime.

Gibbs entropy

Consider a system in Classical Statistical Mechanics with ensemble given by $ρ$ .
The statistical definition of entropy, known as the Gibbs entropy, is

S = - k ⟨ \ln ρ ⟩ = - k \int ρ (q, p) \ln [ρ (q, p)] d q d p

Important caution:

In the continuous case, the numerical value of $S / k$ depends on the coordinates used and on the units of $q, p$ .
Unlike the discrete case, $S = 0$ does not imply determinism. For example, a uniform distribution on $[0, 1]$ has $S / k = 0$ (in natural log units), yet it represents maximum uncertainty given that support.
A completely deterministic macrostate (Dirac delta distribution) has $S = - \infty$ in this formula, not zero.

In some sense, $S$ counts the number of microstates compatible with the macrostate. For a macrostate with ensemble a Dirac delta, there is only 1 compatible microstate, and the entropy diverges to $- \infty$ .

Important relation

Consider a system in thermal equilibrium with a bath at temperature $T$ described by the canonical ensemble, where the probability density is $ρ_{S} (q, p) = \frac{1}{Z} e^{- β H_{S} (q, p)}$ with $β = 1 / (k T)$ .

Express Entropy using the partition function ( $Z$ )
Substituting $ρ_{S}$ into the entropy definition: $\ln ρ_{S} = \ln (\frac{e^{- β H_{S}}}{Z}) = - β H_{S} - \ln Z$ Taking the average: $⟨ \ln ρ_{S} ⟩ = - β ⟨ H_{S} ⟩ - \ln Z$ Plugging into $S = - k ⟨ \ln ρ_{S} ⟩$ : $S = k β ⟨ E ⟩ + k \ln Z$ With $β = 1 / (k T)$ : $S = \frac{⟨ E ⟩}{T} + k \ln Z$
Take the Differential
For fixed Hamiltonian parameters: $\frac{d S}{d β} = k β \frac{d ⟨ E ⟩}{d β}$ Hence: $d S = k β d ⟨ E ⟩$ Substituting $β = 1 / (k T)$ : $d ⟨ E ⟩ = T d S$

Related: first law of thermodynamics.

Boltzmann entropy

See harmonic oscillator in CSM#5) Entropy

Maximum Entropy Distributions

Constraint: only mean fixed.
- On $[0, \infty)$ , the maximum entropy distribution is the exponential distribution (Boltzmann distribution):

p (x) = \frac{1}{μ} e^{- x / μ} .

On $R$ , there is no maximum entropy distribution with only the mean fixed (entropy can be made arbitrarily large).
Constraint: mean and variance fixed.
- On $R$ , the maximum entropy distribution is the Gaussian distribution:

p (x) = \frac{1}{\sqrt{2 π σ^{2}}} \exp (- \frac{(x - μ)^{2}}{2 σ^{2}}) .

Derivation sketch (Lagrange multipliers)

Maximize Shannon entropy

H [p] = - \int p (x) \ln p (x) d x

subject to normalization and the moment constraints. The variational problem

L [p] = - \int p \ln p d x + λ_{0} (\int p d x - 1) + λ_{1} (\int x p d x - μ) + λ_{2} (\int (x - μ)^{2} p d x - σ^{2})

yields (after variation)

p (x) \propto \exp (- λ_{1} x - λ_{2} (x - μ)^{2}) .

If only the mean is constrained: $λ_{2} = 0$ , giving an exponential distribution (normalizable only on $[0, \infty)$ ).
If mean and variance are constrained: $λ_{2} > 0$ , giving a Gaussian.

In general, maximum entropy solutions belong to exponential families, with form determined by the active constraints.